10,000 Matching Annotations
  1. Mar 2026
    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This important study introduces an advance in multi-animal tracking by reframing identity assignment as a self-supervised contrastive representation learning problem. It eliminates the need for segments of video where all animals are simultaneously visible and individually identifiable, and significantly improves tracking speed, accuracy, and robustness with respect to occlusion. This innovation has implications beyond animal tracking, potentially connecting with advances in behavioral analysis and computer vision. The strength of support for these advances is compelling overall, although there were some remaining minor methodological concerns.

      To tackle “minor methodological concerns” mentioned in the Editorial assessment and Reviewer 3, the new version of the manuscript includes the following changes:

      a) The new ms does not anymore use the word “accuracy” but “IDF1 scores”. See, for example, Lines 46, 161, 176, and 522 for our new wording as “IDF1 scores”.

      b) Instead of comparing softwares using mean accuracy over the benchmark, Reviewer 3 proposes to use medians or even boxplots. We now provide boxplot results with mean, median, percentiles and outliers (Figure 1- figure Supplement 2).

      Additionally, we also include in the text the other recommendations from Reviewer 3:

      a) We now more explicitly describe the problems of the original idtracker.ai v4 in the benchmark (lines 66-68). Around half of the videos had a high accuracy in the original dtracker.ai (v4) but the other half of the videos had lower accuracies (Figure 1a, blue). The new idtracker.ai has high accuracy values for all the videos (Figure 1a, magenta).

      Also, the videos with high accuracy in the old idtracker.ai had very long tracking times (Figure 1b, blue) and the new version does not (Figure 1b, magenta). So the benchmark allows us to distinguish the new idtracker.ai as having a better accuracy for all videos and lower tracking times, making it a much more practical system than previous ones. 

      b) We further clarified the occlusion experiment (lines 188-190 and 277-290).

      c) We explain why we measure accuracies with and without animal crossings (lines 49-62).

      d) We added a Discussion section (lines 223-244).

      We believe the new version has clarified the minor methodological concerns.

      Reviewer #3 (Public review):

      The authors have reorganized and rewritten a substantial portion of their manuscript, which has improved the overall clarity and structure to some extent. In particular, omitting the different protocols enhanced readability. However, all technical details are now in appendix which is now referred to more frequently in the manuscript, which was already the case in the initial submission. These frequent references to the appendix - and even to appendices from previous versions - make it difficult to read and fully understand the method and the evaluations in detail. A more self-contained description of the method within the main text would be highly appreciated.

      In the new ms, we have reduced the references to the appendix by having a more detailed explanation in one place, lines 49-62.

      Furthermore, the authors state that they changed their evaluation metric from accuracy to IDF1. However, throughout the manuscript they continue to refer to "accuracy" when evaluating and comparing results. It is unclear which accuracy metric was used or whether the authors are confusing the two metrics. This point needs clarification, as IDF1 is not an "accuracy" measure but rather an F1-score over identity assignments.

      We thank the reviewer for noticing this. Following this recommendation, we changed how we refer to the accuracy measure with “IDF1 score” in the entire ms. See, for example, lines 46, 161, 176, and 522.

      The authors compare the speedups of the new version with those of the previous ones by taking the average. However, it appears that there are striking outliers in the tracking performance data (see Supplementary Table 1-4). Therefore, using the average may not be the most appropriate way to compare. The authors should consider using the median or providing more detailed statistics (e.g., boxplots) to better illustrate the distributions.

      We thank the reviewer for asking for more detailed statistics. We added the requested box plot in Figure 1- figure Supplement 2 to provide more complete statistics in the comparison.

      The authors did not provide any conclusion or discussion section. Including a concise conclusion that summarizes the main findings and their implications would help to convey the message of the manuscript.

      We added a Discussion section in lines 223-244.

      The authors report an improvement in the mean accuracy across all benchmarks from 99.49% to 99.82% (with crossings). While this represents a slight improvement, the datasets used for benchmarking seem relatively simple and already largely "solved". Therefore, the impact of this work on the field may be limited. It would be more informative to evaluate the method on more challenging datasets that include frequent occlusions, crossings, or animals with similar appearances.

      Around half of the videos also had a very high accuracy in the original dtracker.ai (v4) but the other half of the videos had lower accuracies (Figure 1a, blue). For example, we found IDF1 scores of 94.47% for a video of 100 zebrafish with thousands of crossings (z_100_1), 93.77% for a video of 4 mice (m_4_2) and 69.66% for a video of 100 flies (d_100_3). The new idtracker.ai has high accuracy values for all the videos (Figure 1a, magenta).

      Importantly, the tracking times for the majority of videos was very high in the original idtracker.ai (Figure 1b, blue), making the use of the tracking system limited in practice. The new system manages both a high accuracy in all videos (Figure 1a, magenta) and much lower tracking times (Figure 1b, magenta), making it a much more practical system..

      We have added a sentence of the limitations of the original idtracker.ai as obtained from the benchmark, lines 66-68.

      The accuracy reported in the main text is "without crossings" - this seems like incomplete evaluation, especially that tracking objects that do not cross seems a straightforward task. Information is missing why crossings are a problem and are dealt with separately.

      We have now added an explanation on why we measure accuracy without crossings and why we separated it from the accuracy for all the trajectory in lines 49-62. The reason is that the identification algorithm being presented in this ms only identifies animal images outside the crossings. This algorithm makes robust animal identifications through the video despite the thousands of animal crossings typically existing in each of our videos used in the benchmark. It is a second algorithm (that hasn’t changed since the first idTracker in 2014) the one that assigns animal positions during crossings once the first algorithm has made animal identifications before and after the crossings.

      There are several videos with a much lower tracking accuracy, explaining what the challenges of these videos are and why the method fails in such cases would help to understand the method's usability and weak points.

      Some videos had low accuracy on previous versions (Figure 1a, blue), but the new idtracker.ai has high accuracy in all of them (Figure 1a, magenta).

      Reviewer #3 (Recommendations for the authors):

      (1) As described before, the authors claim to use IDF1 as their metric in the whole manuscript (lines 414-436) but only refer to accuracy when presenting the results. It is not clear, whether accuracy was used as a metric instead of IDF1 or the authors are confusing these metrics.

      Following this recommendation, we replaced “accuracy” with “IDF1 score” , see lines 46, 161, 176, and 522.

      (2) In the introduction, a brief explanation why crossings need to be dealt with separately would help to understand the logic of the method design.

      We added such an explanation in lines 49-62.

      (3) Figure 3: We asked about how the tracking accuracy is being assessed with occlusions. The authors responded with that only the GT points inside the ROI are taken into account when computing the accuracy. Does this mean, that the occluded blobs are still part of the CNN training and the clustering? This questions the purpose of this experiment, since the accuracy performance would therefore only change, if the errors, that their approach is doing either way, are outside the ROI and, therefore, not part of the metric evaluation.

      The occluded blobs are not part of any training because they are erased from the video, they do not exist. We made this more clear in lines 188-190 and 277-290.

      (4) Figure 1: The fact that datasets are connected with a line is misleading - there is no connection between the data along the x-axis. A line plot is not an appropriate way to present these results.

      The new ms clarifies that the lines are for ease of visualization, see last line in the caption of Figure 1.

      (5) Lines 38-39: It is not clear how the CNN can be pretrained for the entire video if there are no global segments or only short ones. Here, the distinction between "no segments", "only short segments" and "pretraining on the entire video" is not explained.

      This pretraining protocol is not used in the version of the software we present, so details of this are not as relevant.

      (6) Figure 2a: The authors are showing "individual fragments" and individual fragments in a global fragment." However, it seems there are a few blue borders missing. In the text (l. 73-79), they note, that they are displaying them as "examples" but the absence of correct blue borders is confusing.

      In the new ms, we have replaced the label “Individual fragments in a global fragment” with “Individual fragments in an example global fragment” in the legend of Figure 2.

      (7) Lines 61-63, 148-151, and 162-164: Could the authors clarify why they used the average instead of median when comparing the speedups of the new version and the old ones?

      We thank the reviewer for asking for more detailed statistics. We added the requested box plot in Figure 1- figure Supplement 2 to provide more complete statistics in the comparison of accuracies and tracking times for old and new systems.

      (8) Lines 140-144: The post-processing steps are not clear. The authors should rather state clearly which processes of the old versions they are using. Then the authors could shortly explain them.

      We removed this paragraph and explained in more detail in lines 49-62 which parts of the software are new and which ones are not.

      (9) Lines 239-251: Here, the authors are clarifying on a section 1-2 pages before. This information should be directly in that section instead.

      Following this recommendation, we clarified the occlusion experiment in the main text (lines 188-191) to make it more self-contained. Still, the flow of the main text is better with some details in Methods.

      (10) Line 38: It is not clear how the CNN can be pretrained for the entire video if there are no global segments or only short ones. Here, the distinction between "no segments"

      "only short segments" and "pretraining on the entire video" is a bit misleading/underexplained.

      See number 5.

      (11) Figure 2a: The authors are showing "individual fragments" and individual fragments in a global fragment." However, it seems there are a few blue borders missing. In the text (l. 73-79), they note, that they are displaying them as "examples" but the absence of correct blue borders is confusing.

      See number 6.

      (12) Figure 2c and line 115-118: "Batches" itself is not meaningful without any information of the batch size. The authors should rather depict the batch size and then the number of epochs. The Figure 2 contains the info 400 positive and 400 negative pairs of images per batch. However, there is no information about the total number of images.

      Furthermore, these metrics are inappropriate here, since training is carried out from scratch (or already pre-trained) for every new video, each video has different number of animals, different number of images.

      Following this recommendation, we clarified the number of images in each batch (Figure 1c caption and lines 134-138), why we do not work with epochs (lines 700-702), and the idea that the clusters in Figure 2 represent an example and the number of batches needed for the clusters to form depends on the video details.

      Appendix 1-figure 1: why do the methods fail? It looks that for certain videos the method is fairly unreliable. What is the reason for the methods to crash and how to avoid this?

      Those failures are only for the old idtracker.ai and Trex, not for the method presented here. Our new contrastive algorithm does not fail in any of the videos in the benchmark.

      We thank the reviewer for the detailed suggestions. We believe we have incorporated all of them in the new version of the ms.

    1. Reviewer #2 (Public review):

      The authors aimed to investigate whether gamma synchrony serves a functional role in figure-ground perception. They specifically sought to test whether the stimulus-dependence of gamma synchrony, often considered a limitation, actually facilitates perceptual grouping. Using the theory of weakly coupled oscillators (TWCO), they developed a framework wherein synchronization depends on both frequency detuning (related to contrast heterogeneity) and coupling strength (related to proximity between visual elements). Through psychophysical experiments with texture discrimination tasks and computational modeling, they tested whether human performance follows patterns predicted by TWCO and whether perceptual learning enhances synchrony-based grouping.

      Strengths:

      (1) The theoretical framework connecting TWCO to visual perception is innovative and well-articulated, providing a potential mechanistic explanation for how gamma synchrony might contribute to both feature binding and separation.

      (2) The methodology combines psychophysical measurements with computational modeling, with a solid quantitative agreement between model predictions and human performance.

      (3) In particular, the demonstration that coupling strengths can be modified through experience is remarkable and suggests gamma synchrony could be an adaptable mechanism that improves with visual learning.

      (4) The cross-validation approach, wherein model parameters derived from macaque neurophysiology successfully predict human performance, strengthens the biological plausibility of the framework.

      Likely Impact and Utility:

      This work offers a fresh perspective on the functional role of gamma oscillations in visual perception. The integration of TWCO with perceptual learning provides a novel theoretical framework that could influence future research on neural synchrony.

      The computational model, with parameters derived from neurophysiological data, offers a useful tool for predicting perceptual performance based on synchronization principles. This approach might be extended to study other perceptual phenomena and could inspire designs for artificial vision systems.

      The learning component of the study may have a particular impact, as it suggests a mechanism by which perceptual expertise develops through modified coupling between neural assemblies. This could influence thinking about perceptual learning more broadly, but also raises questions about the underlying mechanism.

      Additional Context:

      Historically, the functional significance of gamma oscillations has been debated, with early theories of temporal binding giving way to skepticism based on gamma's stimulus-dependence. This study reframes this debate by suggesting that stimulus-dependence is exactly what makes gamma useful for perceptual grouping.

      The successful combination of computational neuroscience and psychophysics is a significant strength of this study.

      The field would benefit from future work extending (if possible) these findings to more naturalistic stimuli and directly measuring neural activity during perceptual tasks. Additionally, studies comparing predictions from synchrony-based models against alternative mechanisms would help establish the specificity of the proposed framework.

      Comments on revised version:

      The authors now soften their claim. However, the paper demonstrates that TWCO-derived predictions quantitatively match human figure-ground perception in texture stimuli, and that a synchrony-based readout provides a viable mapping from stimulus to behavior. Given that they cite (and do not show in this paper) the link to synchrony, what they actually establish is that this particular transformation of stimulus features maps better onto behavior. That's meaningful, but it is not a demonstration of mechanism.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper by Karimian et al proposes an oscillator model tuned to implement binding by synchrony (BBS*) principles in a visual task. The authors set out to show how well these BBS principles explain human behavior in figure-ground segregation tasks. The model is inspired by electrophysiological findings in non-human primates, suggesting that gamma oscillations in early visual cortex implement feature-binding through a synchronization of feature-selective neurons. The psychophysics experiment involves the identification of a figure consisting of gabor annuli, presented on a background of gabor annuli. The participants' task is to identify the orientation of the figure. The task difficulty is varied based on the contrast and density of the gabor annuli that make up the figure. The same figures (without the background) are used as inputs to the oscillator model. The authors report that both the discrimination accuracy in the psychophysics experiment and the synchrony of the oscillators in the proposed model follow a similar "Arnold Tongue" relationship when depicted as a function of the texture-defining features of the figure. This finding is interpreted as evidence for BBS/gamma synchrony being the underlying mechanism of the figure-ground segregation.

      Note that I chose to use "BBS" over gamma synchrony (used by the authors) in this review, as I am not convinced that the authors show evidence for synchronization in the gamma-band.

      We thank the reviewer for their careful assessment of our manuscript and useful comments that we believe have served to strengthen our work.

      Strengths:

      The design of the proposed model is well-informed by electrophysiological findings, and the idea of using computational modeling to bridge between intracranial recordings in non-human primates and behavioral results in human participants is interesting. Previous work has criticized the BBS synchrony theory based on the observation that synchronization in the gamma-band is highly localized and the frequency of the oscillation depends on the visual features of the stimulus. I appreciate how the authors demonstrate that frequency-dependence and local synchronization can be features of BBS, and not contradictory to the theory. As such, I feel that this work has the potential to contribute meaningfully to the debate on whether BBS is a biophysically realistic model of feature-binding in visual cortex.

      Weaknesses:

      I have several concerns regarding the presented claims, assessment of meaning and size of the presented effects, particularly with regard to the absence of a priori defined effect sizes.

      Firstly, the paper makes strong claims about the frequency-specificity (i.e., gamma synchrony) and anatomical correlates (early visual cortex) of the observed effects. These claims are informed by previous electrophysiological work in non-human primates but are not directly supported by the paper itself. For instance, the title contains the word "gamma synchrony", but the authors do not demonstrate any EEG/MEG or intracranial data in from their human subjects supporting such claims, nor do they demonstrate that the frequencies in the oscillator model are within the gamma band. I think that the paper should more clearly distinguish between statements that are directly supported by the paper (such as: "an oscillator model based on BBS principles accounts for variance in human behavior") and abstract inferences based on the literature (such as "these effects could be attributed to gamma oscillations in early visual cortex, as the model was designed based on those principles").

      We thank the reviewer for this helpful comment and agree that the scope of our claims should be clearly delineated between what is directly supported by our data and what is theoretically inferred from prior literature.

      We revised the Abstract, Introduction, and early Discussion to moderate the strength of our statements and make the distinction explicit. The revised title now emphasizes that our study tests principles derived from prior work on gamma synchrony rather than directly demonstrating gamma activity in humans. Throughout the text, we use more cautious phrasing that highlights potential mechanisms and theoretical predictions. The intention of our study was not to position synchrony as the only viable mechanism of figure–ground perception. Rather, our goal was to reinvigorate it as a potential contender by showing that features often cited as limitations of synchrony-based binding may in fact be essential properties of the mechanism. We updated phrasing throughout the manuscript to make this clearer and avoid overstating the study’s contribution.

      Importantly, our model is not agnostic with respect to frequency band. Oscillator frequencies exhibited by model units are within the gamma range by design. Frequency emerges directly from the contrast within each oscillator’s receptive field, following an empirically established relationship between stimulus contrast and gamma frequency. To our knowledge, such a robust, quantitative relationship between stimulus features to exact oscillation frequency has not been consistently demonstrated for other frequency bands. This relationship yields gamma-band frequencies for all contrasts used in our simulations. The model is thus indeed a gamma oscillator model of V1, not a generic instantiation of Binding by Synchrony (BBS) principles.

      That said, we fully agree with the reviewer that our study cannot demonstrate a direct link between gamma synchrony in visual cortex and human behavior. Our behavioral and modeling results instead show that synchronization principles derived from gamma-band physiology in V1 can predict perceptual performance patterns. We now make this distinction explicit throughout the revised manuscript.

      Secondly, unlike the human participants, the model strictly does not perform figure-ground segregation, as it only receives the figure as an input.

      We thank the reviewer for the opportunity to clarify our modeling approach. We chose not to model the background to reduce computational cost, since including it requires a substantially larger number of oscillators without changing the model’s predictions. The model thus indeed only receives the figure region as input. We aimed to test the local grouping mechanism predicted by TWCO, rather than to simulate a full figure–ground segregation process including a read-out stage. Our model therefore isolates the conditions under which local synchrony emerges within the figure region, assuming that a downstream read-out mechanism (not explicitly modeled here) would detect regions of coherent activity. The exact nature of such a read-out mechanism was beyond the scope of our work.

      To confirm that our simplified model is a valid proxy, we ran additional simulations including the background and found that a coherent figure assembly reliably emerges, as can be seen in the phase-locking patterns relative to a reference oscillator at the center of the figure. This validates that the principles of local grouping we studied in isolation hold even when the figure is embedded in a noisy surround. We have added an explicit note in the Results (paragraph 2) that we only simulate the figure and added Supplementary Figure S1 showing the additional simulations.

      Finally, it is unclear what effect sizes the authors would have expected a priori, making it difficult to assess whether their oscillator model represents the data well or poorly. I consider this a major concern, as the relationship between the synchrony of the oscillatory model and the performance of the human participants is confounded by the visual features of the figure. Specifically, the authors use the BBS literature to motivate the hypothesis that perception of the texture-defined figure is related to the density and contrast heterogeneity of the texture elements (gabor annuli) of the figure. This hypothesis has to be true regardless of synchrony, as the figure will be easier to spot if it consists of a higher number of high-contrast gabors than the background. As the frequency and phase of the oscillators and coupling strength between oscillators in the grid change as a function of these visual features, I wonder how much of the correlation between model synchrony and human performance is mediated by the features of the figure. To interpret to what extent the similarity between model and human behavior relies on the oscillatory nature of the model, the authors should find a way to estimate an empirical threshold that accounts for these confounding effects. Alternatively, it would be interesting to understand whether a model based on competing theories (e.g., Binding by Enhanced Firing, Roelfsema, 2023) would perform better or worse at explaining the data.

      We thank the reviewer for these insightful and constructive comments, which have prompted additional analyses that we believe substantially strengthen our work. The reviewer raises two main points: (1) the need for a benchmark to assess our model’s performance, and (2) the concern that the relationship between model synchrony and behavior might be a non-causal “confound” of the visual features. We address each point below.

      (1) Benchmarking model performance

      We agree that it is important to assess how well our model performs relative to the data and included this in the original manuscript. We did not predefine an absolute good fit threshold because absolute agreement depends on irreducible noise and inter-subject variability, making a universal cutoff arbitrary. Instead, we had benchmarked model performance in two complementary ways. First, the noise ceiling shown in Figure 5 provides an empirical benchmark for the maximum fit any model could achieve on our data. Simulated Arnold tongues (based on synchrony) approach this ceiling achieving 89% of possible similarity for correlation and 79% of possible similarity for weighted Jaccard similarity, respectively. Second, the parameter sweep (Figure 3) situates our model’s performance within the broader parameter space. It shows that the model, whose key parameters were fixed a priori from independent macaque neurophysiological data, lies close to the optimal regime for explaining the human data. It also provides an estimate of the lower bound (worst-performing point) on the fit that a misspecified model implementing the identical mechanism would achieve. Our model with fixed a priori parameters does 1.41 times better than a misspecified model for the correlation fit metric and 3 times better for weighted Jaccard similarity.

      (2) Synchrony as mechanism vs. potential confound

      We appreciate the reviewer’s suggestion to test whether synchrony explains behavior beyond stimulus features. In our framework, synchrony is a near-deterministic function of the manipulated stimulus features given fixed model parameters. As a result, synchrony and the stimulus features are collinear (R<sup>2</sup>≈0.8) leaving no independent variance for synchrony to explain once stimulus features are included. Adding both into one statistical model yields unstable coefficients and no out-of-sample improvement.

      Mechanistically, we believe the relevant question is not whether synchrony explains behavior beyond stimulus features but whether synchrony is the correct transformation of the stimulus features to reproduce the behavioral pattern. Please note that in our design we ensured that mean contrast and luminance are identical in the figure and the background such that there are not more high-contrast Gabors in the figure than in the background. We did this with the aim to render mean contrast not a relevant feature. However, there are more high-contrast Gabors in the background, and it is conceivable that the absence of such high contrasts in the figure drives the detection/discrimination of the figure. We therefore agree that testing alternative models would further clarify the unique explanatory value of the synchrony mechanism. To that end, we derived two alternative rate-based readouts from the same V1 simulations of our model from which we derived synchrony. First, average firing rates inside the figure and second, the difference between average firing rates inside the figure and average firing rates in the background (rate difference). We analyzed each individually as predictors of behavior and performed a model comparison based on out-of-sample predictions. While rate difference (but not average firing) showed meaningful associations with performance when considered alone, the synchrony readout had a larger effect size and was favored by the model comparison. We added a new subsection comparing synchrony to rate-based alternatives in the Results (paragraphs 7-9), including additional Bayesian analyses and LOO-CV model comparison. Please note that the model comparison we added to the manuscript provides an additional benchmark beyond the map-level ceiling analysis. It indicates that the mapping from stimulus features to behavior via synchrony generalizes best without requiring an a priori good-fit threshold.

      We agree that formally comparing our model to a sophisticated rate-based alternative, such as an instantiation of the Binding by Enhanced Firing model, is an important direction for future work. However, it remains an open and non-trivial question whether such a model could quantitatively reproduce the precise shape of the behavioral Arnold tongue that emerges from the systematic manipulation of our stimulus parameters. Implementing and parameterizing such a model in a comparable, biologically grounded framework is a substantial undertaking that lies beyond the scope of the current study. Therefore, our goal here was not to claim exclusivity for synchrony-based mechanisms, but rather to re-evaluate their plausibility by showing that features often seen as limitations (stimulus dependence and frequency heterogeneity) are, in fact, essential characteristics of the TWCO framework that can predict complex behavioral outcomes.

      We would also like to clarify that our stimulus features were derived from theory rather than psychophysical literature. Starting from the principles of TWCO, we mapped frequency detuning and coupling strength onto known anatomical and physiological properties of early visual cortex, and only then derived the corresponding stimulus manipulations (contrast heterogeneity and grid coarseness). Demonstrating that these features predict behavior is therefore not trivial but constitutes a first empirical confirmation that the core TWCO variables match perception.

      Apart from adding analyses of additional rate-based readouts of our model, we also refined our discussion of the relationship between these and a synchrony-based mechanism.

      Reviewer #2 (Public review):

      The authors aimed to investigate whether gamma synchrony serves a functional role in figure-ground perception. They specifically sought to test whether the stimulus-dependence of gamma synchrony, often considered a limitation, actually facilitates perceptual grouping. Using the theory of weakly coupled oscillators (TWCO), they developed a framework wherein synchronization depends on both frequency detuning (related to contrast heterogeneity) and coupling strength (related to proximity between visual elements). Through psychophysical experiments with texture discrimination tasks and computational modeling, they tested whether human performance follows patterns predicted by TWCO and whether perceptual learning enhances synchrony-based grouping.

      We thank the reviewer for their thoughtful and constructive review. We believe the comments have served to improve our work.

      Strengths:

      (1) The theoretical framework connecting TWCO to visual perception is innovative and well-articulated, providing a potential mechanistic explanation for how gamma synchrony might contribute to both feature binding and separation.

      (2) The methodology combines psychophysical measurements with computational modeling, with a solid quantitative agreement between model predictions and human performance.

      (3) In particular, the demonstration that coupling strengths can be modified through experience is remarkable and suggests gamma synchrony could be an adaptable mechanism that improves with visual learning.

      (4) The cross-validation approach, wherein model parameters derived from macaque neurophysiology successfully predict human performance, strengthens the biological plausibility of the framework.

      Weaknesses:

      (1) The highly controlled stimuli are far removed from natural scenes, raising questions about generalisability. But, of course, control (almost) excludes ecological validity. The study does not address the challenges of natural vision or leverage the rich statistical structure afforded by natural scenes.

      We agree with the reviewer that the insights of the present study are limited to texture stimuli and have made adjustments in the Discussion (final two paragraphs) to avoid claiming generalizability to natural stimuli. We have also adjusted the title to specifically limit our results to texture stimuli. To establish the principles of TWCO, we needed tight control over the stimulus, but are intrigued by the idea to investigate natural scenes. We have added to our Discussion (paragraph 9) that future should evaluate to what extent the principles we investigate here apply to natural scenes. Synchrony-based mechanisms have been successfully used for image segmentation tasks in machine vision, showing that the proposed mechanism can in principle work for natural scenes.

      (2) The experimental design appears primarily confirmatory rather than attempting to challenge the TWCO framework or test boundary conditions where it might fail.

      We thank the reviewer for this important point. Our primary motivation was to address the neurophysiological properties of gamma synchrony that have been suggested to severely challenge the binding by synchrony mechanism. Particularly the strong dependence of gamma oscillations and synchrony on stimulus features. Our goal was to show that from the perspective of TWCO, these challenges become expected components of the mechanism. In essence, we wanted to promote a conceptual shift that converts what pushes a theory to its limit into something that is actually its central tenet. To facilitate this shift, we designed the experiment to directly test this core tenet.

      While our approach was designed to test a central prediction of TWCO rather than explicitly challenge its boundaries, we respectfully argue that it was far from a simple confirmatory experiment. The design incorporated high-risk elements that provided considerable room for both the theory and our model to fail. First, the core prediction itself was non-obvious and highly specific. We did not simply test whether contrast heterogeneity and grid coarseness affect perception. We tested the stronger hypothesis that they would reflect a specific, interactive trade-off (the behavioral Arnold tongue) as specified by TWCO. Second, our modeling approach was deliberately constrained to provide a further stringent test. We did not post-hoc optimize the model's key parameters to fit our behavioral data. Instead, we fixed them a priori based on independent neurophysiological data from macaques. This was a high-risk choice, as a mismatch between a priori model predictions and the human data would have seriously challenged the framework's generalizability.

      We agree that future research should further challenge TWCO. For instance, by using stimuli that require segregating several objects simultaneously or objects that cover more extensive regions of the visual field.

      (3) Alternative explanations for the observed behavioral effects are not thoroughly explored. While the model provides a good fit to the data, this does not conclusively prove that gamma synchrony is the actual mechanism underlying the observed effects.

      We agree that our results do not conclusively show that gamma synchrony is the actual mechanism underlying figure-ground segregation. We admit that the original phrasing used throughout the manuscript was too strong and gave the impression that we wanted to establish exactly that. However, the goal of our work was only to reinvigorate gamma synchrony as a potential contender by showing that features often cited as limitations of synchrony-based binding may in fact be essential properties of the mechanism. We have revised the title and made adjustments throughout the manuscript to better reflect this more moderate goal.

      Additionally, we added tests of alternatives (Results, paragraphs 7–9) to clarify the unique explanatory value of the synchrony mechanism. To that end, we derived two alternative rate-based readouts from the same V1 simulations of our model. First, we extracted average firing rates inside the figure. Second, we computed the difference between average firing rates inside the figure and average firing rates in the background (rate difference). We analyzed each individually as predictors of behavior and performed a model comparison between these two and synchrony based on out-of-sample predictions. While the rate difference (but not average firing) showed meaningful associations with performance when considered alone, the synchrony readout had a larger effect size and was favored by the model comparison.

      (4) Direct neurophysiological evidence linking the observed behavioral effects to gamma synchrony in humans is absent, creating a gap between the model and the neural mechanism.

      We agree that the model only provides a how-possibly account linking stimulus features to performance. Showing that the brain actually relies on this mechanism would require showing that cortical synchrony mediates the effect of stimulus features on behavior beyond firing rates. Collecting such data would constitute a major effort that would go beyond the scope of this study. We acknowledge the need for electrophysiological data and the mediation analysis in the updated Discussion.

      Achievement of Aims and Support for Conclusions:

      The authors largely achieved their primary aim of demonstrating that human figure-ground perception follows patterns predicted by TWCO principles. Their psychophysical results reveal a behavioral "Arnold tongue" that matches the synchronization patterns predicted by their model, and their learning experiment shows that perceptual improvements correlate with predicted increases in synchrony.

      The evidence supports their conclusion that gamma synchrony could serve as a viable neural grouping mechanism for figure-ground segregation. However, the conclusion that "stimulus-dependence of gamma synchrony is adaptable to the statistics of visual experiences" is only partially supported, as the study uses highly controlled artificial stimuli rather than naturalistic visual statistics, or shows a sensitivity to the structure of experience.

      Likely Impact and Utility:

      This work offers a fresh perspective on the functional role of gamma oscillations in visual perception. The integration of TWCO with perceptual learning provides a novel theoretical framework that could influence future research on neural synchrony.

      The computational model, with parameters derived from neurophysiological data, offers a useful tool for predicting perceptual performance based on synchronization principles. This approach might be extended to study other perceptual phenomena and could inspire designs for artificial vision systems.

      The learning component of the study may have a particular impact, as it suggests a mechanism by which perceptual expertise develops through modified coupling between neural assemblies. This could influence thinking about perceptual learning more broadly, but also raises questions about the underlying mechanism that the paper does not address.

      Additional Context:

      Historically, the functional significance of gamma oscillations has been debated, with early theories of temporal binding giving way to skepticism based on gamma's stimulus-dependence. This study reframes this debate by suggesting that stimulus-dependence is exactly what makes gamma useful for perceptual grouping.

      The successful combination of computational neuroscience and psychophysics is a significant strength of this study.

      The field would benefit from future work extending (if possible) these findings to more naturalistic stimuli and directly measuring neural activity during perceptual tasks. Additionally, studies comparing predictions from synchrony-based models against alternative mechanisms would help establish the specificity of the proposed framework.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In a joint discussion to integrate the peer reviews and agree on the eLife recommendations, both reviewers agreed that the work is valuable, but they were on the fence about whether the strength of evidence was incomplete or solid, eventually settling on incomplete. The reviewers make several recommendations for improving these ratings, which I (Reviewing Editor) have organised into 3 points below, with point 1 of particular importance. Underneath the summary, please see the individual recommendations of the reviewers.

      (1) Strengthen evidence for the unique role of gamma synchrony in explaining the data, and ensuring claims are directly supported by relevant data:

      Reviewers 2 and 3 both note the lack of direct evidence for gamma involvement, and reviewer 2 observes that the fit with behaviour may trivially be explained by a relationship between contrast heterogeneity and grid coarseness without need for oscillation. The reviewers felt that the approach of fitting the model to human data could be strengthened to help address this issue - and they offer various solutions, e.g., more principled a-priori criteria around good vs bad fit of the model to both main task and training data, and comparison to alternative binding models (Reviewer 2), identifying and testing boundary conditions of the model (Reviewer 3). There is also the possibility of collecting direct human neurophysiological evidence linking the behavioural data to neural mechanisms. Our discussion also highlighted the need to weaken claims (including in the title) where links are not directly demonstrated by methods from the present study, e.g., resting on indirect comparisons to primate literature.

      We agree with the editor and reviewers that this was a critical point. To address it, we have made several major revisions.

      As suggested, we have weakened claims where the links are not directly demonstrated by our data. The title has been revised to be more specific, and we have carefully edited the abstract, introduction, and discussion to distinguish between our model's predictions and direct neurophysiological evidence.

      To address the concern that our model's fit might be trivially explained by visual features, we have performed a new analysis comparing the synchrony-based readout to two alternative rate-based readouts from the same V1 simulations. This new comparison shows that the synchrony readout provides a superior out-of-sample prediction of human behavior.

      While a full implementation of a competing theory like "Binding by Enhanced Firing" would be a valuable next step, we note that parameterizing such a model in a comparably grounded framework is a substantial undertaking beyond the scope of the present study. Our new analysis provides an important first step in this direction.

      (2) Make explicit and address the limitations of the stimuli:

      Include that the model is not extracting the figure from the background, and the controlled stimuli may limit generalizability.

      To address the concern that our model was not performing true figure-ground extraction, we performed a new set of simulations that included both the figure and the immediate background. The results confirm that synchrony dynamics within the figure region are not affected by the presence of the background. We added these validation results as supplementary materials. We have additionally made the modeling choice and its justification more explicit in the Results and Methods sections.

      We have revised the Discussion to be more explicit about the limitations of using highly controlled texture stimuli. We now clearly state that our findings are specific to this context and that further research is required to determine if these principles generalize to the segregation of objects in natural scenes.

      (3) Some clarifications to make more accessible:

      Include the figure explaining the framework (Reviewers 1&2), and also the model details (Reviewer 2).

      We have revised Figure 1 and its caption to more clearly illustrate the links from TWCO principles to their neural implementation in V1 and the resulting behavioral predictions.

      We have expanded the Methods section to provide a more detailed and accessible description of the model's construction. We now clarify precisely how the oscillator grid was defined in visual space, how eccentricity-dependent receptive field sizes were implemented, and how these were mapped onto a retinotopic cortical surface to determine coupling strengths.

      Reviewer #1 (Recommendations for the authors):

      (A) Major concerns:

      (1) My main concern:

      My main concern is the repeated claims that the observed findings can be attributed to gamma synchrony in the early visual cortex. I find this claim misleading as the authors do not report any electrophysiological data that directly supports such claims. As stated in my public review, I feel that the authors should be clear about direct evidence versus more abstract inferences based on the literature.

      In particular, I recommend changing claims about "gamma synchrony" to "Binding by Synchrony" That being said, the authors can outline that the model was built under the assumption that this synchrony is mediated by gamma in early visual cortex, but I don't think it should be part of their main conclusions.

      We appreciate that TWCO’s general principles are frequency-agnostic and can be viewed as binding by synchrony in a broad sense. Our work, however, specifically instantiates these principles in V1 gamma: the model reflects TWCO dynamics together with V1 anatomy/physiology and the well-established contrast–frequency relationship in the gamma range (which, to our knowledge, has not been demonstrated with comparable specificity for other bands). In that sense, it is a gamma oscillator model of V1, rather than a generic BBS instantiation. Moreover, stimulus dependencies often cited as challenges to BBS have been used in particular to argue against gamma; showing that these very dependencies are integral to the TWCO mechanism is central to our contribution, and we therefore keep our conclusions focused on the gamma-specific instantiation tested here.

      (2) Mediation of the observed effects by the visual features of the figure:

      The authors motivate the hypothesis that BBS predicts that the perception of texture-defined objects depends on the density of texture elements and their contrast heterogeneity. This hypothesis seems trivial as those are the features that distinguish figure from ground. I think it would be important to clarify how this hypothesis is unique to BBS and not explained by competing theories, such as Binding by Enhanced Firing (Roelfsema, 2023). The authors should be clear about what part of the hypothesis is not trivial based on the task and clearly attributable to oscillators and synchrony.

      Our stimulus features were derived from theory rather than psychophysical literature. Starting from the principles of TWCO, we mapped frequency detuning and coupling strength onto known anatomical and physiological properties of early visual cortex, and only then derived the corresponding stimulus manipulations (contrast heterogeneity and grid coarseness). We agree that grid coarseness (element distance) is an established facilitator of figure–ground perception. By contrast, contrast heterogeneity (feature variance) is less commonly emphasized as a figure–ground cue, compared to mean-based cues, but follows directly from TWCO’s frequency detuning. Importantly, mean contrast and luminance were matched exactly between figure and background in our stimuli. Demonstrating that contrast heterogeneity and grid coarseness not only independently affect figure-ground perception, but reflect a trade-off where higher heterogeneity needs to counteracted by reduced grid coarseness in the way TWCO specifies is therefore non-obvious and provides an initial empirical indication that the core TWCO variables might shape perception. We also agree that alternative models would further clarify the unique explanatory value of synchrony. In the revised manuscript, we compare rate-based readouts (mean figure rate; figure–background rate difference) with the synchrony readout from the same simulations. Rate difference indeed constitutes a predictor of performance, but the synchrony readout showed a larger effect and was preferred by out-of-sample model comparison.

      Using a linear model, the authors assess the relationship between discrimination accuracy and synchrony. Did the authors also include the factors grid coarseness and contrast heterogeneity in this model? Again, as both the task performance (as shown by the GEE analysis) and oscillatory synchrony depend on these features, the relationship between model and behavioral performance will be mediated by the visual features.

      Thank you for raising this. In our framework, detuning (via contrast heterogeneity) and coupling (via grid coarseness) are the inputs, synchrony is the proposed mechanistic mediator, and behavior is the output. Because synchrony in our model is a (near-)deterministic function of the manipulated features under fixed parameters, a joint features+synchrony regression is statistically ill-posed (perfect multicollinearity up to numerical error) and cannot add information. A proper mediation test would require trial-wise neural measurements of synchrony in the same task, which we do not have and acknowledge as a limitation in the Discussion. Accordingly, we show that both the features themselves (reflecting TWCO principles) and model-derived synchrony (realizing the proposed pathway) account for behavior.

      We agree this does not establish a unique contribution of synchrony. To probe alternatives, we added rate-based readouts and a model comparison to the revised manuscript. These additional analyses indicate that synchrony outperforms simple rate-based mappings. We do not claim this rules out more sophisticated rate-based mechanisms. Our aim is to demonstrate that synchrony is a viable, behaviorally informative readout for downstream processing. We do not assert it is the only mechanism the brain uses. Synchrony had been discounted due to its stimulus dependence; our results are intended to rule it back in. We have made changes throughout the manuscript to better reflect this more modest aim.

      (3) Goodness of fit measures are not established a prior:

      I have described this concern in my public review. It is hard to assess what the authors would have interpreted as a good or a bad fit, especially without accounting for the confound in the relationship between oscillator synchrony and behavior. Similarly, when assessing the similarity between the behavioral and dynamic Arnold Tongues across different coupling parameters, the authors found that the chosen parameters (based on macaque data) were not optimal. They offer the explanation that the human cortex has a lower coupling decay than the macaque cortex, and the similarity is higher for lower values of coupling decay. While this explanation is not entirely implausible, it is unclear where an oscillator model with human values would be in the presented plot, as the authors didn't estimate those values from the human studies. Moreover, the task used in the Lowet et al., 2017 paper is very different from the task presented here, which could also account for differences. Overall, the explanation appears hand-wavy considering the lack of empirically defined goodness of fit measures.

      Thank you for these concerns.

      We did indeed not provide a priori thresholds for what would be considered good fit. Instead, we used two complementary benchmarks; namely noise ceilings and parameter exploration. The former provides an upper bound on what any model (not just ours but based on completely different mechanisms) could achieve given our data. The parameter sweep provides an indication how well our concrete model can maximally fit the data and how bad it can be based on possible parameters. These benchmarks are more informative than a fixed a-priori cutoff, which would depend on unknown noise and inter-subject variability. Both the noise ceiling and the parameter exploration indicate that our model, using a priori fixed parameters, performs well. Additionally, we redid all our statistical analyses after z-normalizing every predictor to provide easier interpretation of effect sizes.

      Regarding the reason that key model parameters were not optimal, we believe our interpretation to be plausible. We agree that we currently do not have data to estimate the exact human decay factor and hence cannot establish how much model fit would be affected. However, the parameter exploration in Figure 3 shows that small to modest reductions in decay would improve model fit. We discuss this now in the revised manuscript.

      The reviewer’s suggestion is intriguing. While Lowet et al. (2017) used a different task, the parameters we took from their work (decay rate and maximum coupling) are intended to reflect anatomical properties and thus should not be task-dependent. That said, Lowet et al. ‘s data carry uncertainty, so our estimates may not be exact; we note this explicitly in the revised Discussion. Whether a different task would have yielded better parameter estimates is difficult to determine, but we considered Lowet’s paradigm appropriate because it was designed to target the same V1 anatomical and physiological properties that map onto TWCO.

      I have concerns about a similar confound in the training effects. If I'm not mistaken, the Hebbian Learning rule encourages synchronization between the oscillators in the grid. As such, it causes synchronization to increase over several simulations. Clearly, the task performance of the participants also improves over the sessions. Again, an empirical threshold would be required to assess whether the similarity in learning between model and performance goes beyond what is expected based on learning alone. How much of these effects can be attributed to the model being oscillatory?

      The reviewer is correct that, in our framework, learning operates via changes in coupling that increase synchrony. Enhanced synchrony is the proposed (and in our model also the actual) pathway by which learning impacts behavior. We agree that learning could, in principle, act through pathways other than synchrony. Demonstrating this would not be achieved by a mediation analysis here, because that requires independent, trial-level neural measurements of the candidate pathways (synchrony and alternatives). In the absence of such data, the appropriate approach would be model comparison between competing mechanistic readouts. We have added such a model comparison for a synchrony readout versus two rate-based readouts derived from the same simulations for the first session; i.e., focusing on the pathway from stimulus features to behavior. However, a similar model comparison is not possible for learning. As we show in the supplementary materials, rate-based readouts of our V1 model are not at all affected by coupling strength. As such, they are insensitive to changes in coupling and are thus not viable as alternative mechanisms to explain performance changes due to learning. A fair test of rate-based alternatives would require building a detailed rate-based figure–ground segregation model that predicts session-wise changes. We agree that this is an important next step but it is also substantial undertaking beyond the scope of the present study.

      (4) Similarly, for the comparison of the Arnold Tongue in the transfer session and the early session:

      In the first part of the Results section, it says: "Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli. We evaluated whether this assumption holds for our human participants using the transfer session following the main training period. [...] If learning is indeed local, participants' performance in the transfer session should resemble that of early training sessions, indicating a reset in performance for the new retinal location."

      The authors find that a model fit to session 3 explains the data in the transfer session best and consider this as evidence for the above-stated expectation. Again, it is unclear where the cutoff would have been for a session to be declared as early or late. For instance, had the participants only performed 4 sessions, would the performance be best explained by session 3 or session 1?

      A high number of statistical tests are used, which, firstly, need to be corrected for multiple comparisons (did the authors do this?). Secondly, I feel that the regression models could be improved. For instance, the authors fit one model per session and then assess how well each model explains the variance in the transfer session. I think the authors might want to opt for one model with the regressors contrast heterogeneity, grid coarseness, and session (and their interaction). Using this approach, the authors would still be able to assess which session predicts the data best. Similarly, interindividual variability could be accounted for by adding participant-specific random effects to the model (and using a mixed model), instead of fitting individual models per participant.

      We agree the “early vs late” cutoff was underspecified. In the revision, we predefine Session 2 as the early-learning reference, excluding Session 1 to avoid familiarization/response–mapping effects. We then fit a single Bayesian hierarchical model with contrast heterogeneity, grid coarseness, and session, plus a transfer indicator, and participant-level random effects. This allows us to place the transfer session on the same scale as training and to test a) whether the transfer session precedes the state in session 2 via the posterior contrast P(βtransfer<βSess2) and b) whether it is indistinguishable from the state in session two using an equivalence test derived from the fitted model. We find that the transfer session is equivalent to session 2. We added this updated analysis of the transfer session in the Results (paragraph 15).

      In response to the suggestion to use a hierarchical regression model for analyzing the transfer session, we have decided to use such a model for all our analyses in a Bayesian framework. In this Bayesian framework, inference is based on the joint posterior (credible intervals/equivalence) of all predictors in a model and additional post-hoc multiplicity corrections are not required.

      (5) Questions regarding the model:

      What does it mean that the grid was "defined in visual space"? How biologically plausible with regard to the retinotopy and organization of the oscillators do the authors claim the model to be?

      We are happy to clarify this point. We have a total of 400 oscillators reflecting neural assemblies in V1. We start by defining a regular, 20x20, grid of the receptive field (RF) centers of these oscillators inside the figure region. Each oscillator is then also assigned a RF size based on the eccentricity of its RF center. We use the threshold-linear relationship between RF eccentricity and RF size reported in [1] to assign RF sizes. Each oscillator thus has an individual, eccentricity-dependent, RF size.

      For the coupling between oscillators, we need to know their cortical distances. We obtain these by first determining the cortical location of each oscillator through a complex-logarithmic topographic mapping of neuronal receptive field coordinates onto the cortical surface [2,3]. For this mapping, we use human parameter values estimated by [4]. From these cortical locations, we then compute pairwise Euclidean distances.

      The model thus captures realistic retinotopy, eccentricity-dependent RF sizes, and distance-dependent coupling on the cortical surface. We have adjusted our Methods to make these steps clearer.

      (1) Freeman, J., & Simoncelli, E. P. (2011). Metamers of the ventral stream. Nature neuroscience, 14(9), 1195-1201.

      (2) Balasubramanian, M., & Schwartz, E. L. (2002). The isomap algorithm and topological stability. Science, 295(5552), 7. https://doi.org/10.1126/science.1066234

      (3) Schwartz, E. L. (1980). Computational anatomy and functional architecture of striate cortex: a spatial mapping approach to perceptual coding. Vision Research, 20(8), 645–669. http://www.sciencedirect.com/science/article/pii/0042698980900905

      (4) Polimeni, J. R., Hinds, O. P., Balasubramanian, M., van der Kouwe, A. J. W., Wald, L. L., Dale, A. M., & Schwartz, E. L. (2005). Two-dimensional mathematical structure of the human visuotopic map complex in V1, V2, and V3 measured via fMRI at 3 and 7 Tesla. Journal of Vision, 5(8), 898. https://doi.org/10.1167/5.8.898

      Similarly, do the authors claim that each gabor annuli stimulates a single receptive field in V1?

      We hope that with the additional explanation above, it is clearer that there is not a one-to-one mapping. Each oscillator samples the local image by pooling over all Gabor annuli that overlap its receptive field (partially or fully) and computes the average contrast within its RF. Conversely, a single annulus typically overlaps multiple RFs and contributes to each in proportion to the overlap.

      I am unsure how the oscillators were organized, if not retinotopically. How is the retinotopic input fed into the non-retinotopically arranged oscillators?

      We hope that with the additional explanation above, it is clearer that the network is strictly retinotopic.

      The frequency of each oscillator changes according to ω=2πv with ν=25+0.25C. How were the values for the linear regression in v chosen? Reference?

      The slope and intercept parameters for this equation were first reported in [5]. We added the reference to the Methods.

      (5) Lowet, E., Roberts, M., Hadjipapas, A., Peter, A., van der Eerden, J., & De Weerd, P. (2015). Input-dependent frequency modulation of cortical gamma oscillations shapes spatial synchronization and enables phase coding. PLoS computational biology, 11(2), e1004072.

      (6) Hebbian Learning Rule:

      I am confused about how the effective learning rate E= ∈t is calculated. It is said that it is estimated based on the similarity between the second experimental session and the distribution of synchrony after letting the model learn. How can the model learn without knowing epsilon and t?

      We agree with the reviewer that our procedure to estimate the effective learning rate requires further clarification. We performed a nested grid search. Essentially, we let the model learn between session 1 and 2 with each of 25 candidate effective learning rates and evaluate how well each of them allow the model to fit performance in session 2. We then select the best effective learning rate and create a new, smaller, grid around this value and repeat that procedure. In total we perform 5 nested grids to arrive at the final effective learning rate. We expanded the explanation in the Methods.

      (B) Minor concerns:

      (1) Small N: 2/3 of the studies that were cited to justify the small sample were notably different from the current experiment, i.e., Intoy 2020 is an eye movement task, Lange 2020 is a memory task (Tesileanu 2020 is more similar). I think a power analysis would be great to support, as the sample size seems quite low

      Our study uses a within-subject design with ~750 trials per session (≈6,000 total) per participant, analyzed with a hierarchical model that pools information across trials and participants. To assess adequacy, we ran a simulation-based design analysis using the fitted hierarchical model (i.e., post hoc, based on the observed variance components). This analysis indicated a detection probability >90% for all key effects. We now report the results of this design analysis in the (Supplementary Table 1) and note this in the Results (paragraph 1).

      Regarding the literature context, we agree the cited studies are not identical to ours; we referenced them to illustrate a common practice (small N with many trials) when targeting low-level, early-visual mechanisms. Intoy (pattern/contrast sensitivity) and Lange (perceptual learning in early vision) share that focus, while Tesileanu is methodologically closest.

      (2) Figure 1 could be more informative and better described in the text. The authors often don't refer to the panels in Figure 1. Maybe it would help to swap a and b to describe the Arnold tongue first? It might also be a good idea to add the coupling strength and frequency detuning axes

      We have swapped panels a and b and now refer to each panel in the main text to enhance clarity.

      (3) Values of rho (distance - is this degrees visual angle)? Do the authors assume that the size of the stimuli corresponds to receptive fields in V1? If so, how is this justified?

      The center-to-center distance between any pair of neighboring annuli is indeed expressed in degrees of visual angle. Rho is a scaling factor for this distance. With rho=1, the center-to-center distance corresponds to the diameter of the annuli; i.e., they touch but do not overlap each other. We do not assume any relation between the size of receptive fields and the size of the annuli. Receptive field sizes in our model are purely determined by their eccentricity and each oscillator can have several annuli within its receptive field while each annulus can fall within several overlapping receptive fields of different oscillators. We believe that the schematic illustration in Figure 1 might have given the impression that each oscillator sees exactly one annulus and added a note that this is not the case and merely an oversimplification to illustrate the relationship between contrast and intrinsic frequency.

      (4) Some equations are embedded in the text, and some are not. It might be easier to find the respective equation if they all have an index. For instance, the authors mention the psychometric function that relates model synchrony and performance in the results section. It would be easier to find if it had an index that the authors could refer to.

      We moved this equation as well as the contrast intrinsic frequency mapping from inline to displayed and numbered them.

      (5) Is there a reference for "Our model rests on the assumption that learning-induced structural changes in early visual cortex are specific to the retinotopic locations of the trained stimuli"? (If so, it should be cited.)

      We added references supporting this assumption.

      (6) Figure 2b: colorbar missing label.

      We added the label.

      Reviewer #2 (Recommendations for the authors):

      Cool work!

      (1) The reader would benefit from (a single) comprehensive figure that visually explains the entire conceptual framework-from TWCO principles to neural implementation to behavioural predictions-accessible to readers without specialised knowledge of oscillatory dynamics. This will give the paper a greater impact.

      We have adjusted Figure 1 in accordance with suggestions made by reviewer 1 and added further explanations to the caption and the Introduction to enhance clarity on how the principles of TWCO relate to neural implementation.

      (2) I think this paper would benefit from the audience eLife provides, but the paper could move closer to the audience.

      (3) Pride comes before the fall, but I am not the most uninformed reader, and it took me some effort to process everything.

      Thank you, we took this to heart. In the Introduction, we now state more explicitly how each variable is operationalized and how these map onto TWCO with improved reference to relevant panels in the schematic figure. We agree the framework is conceptually dense. TWCO principles reach the stimuli through specific V1 anatomy and physiology, so there are several links to keep in mind. Our goal with the revised introduction and figure is to make those links better visible.

      (4) You could consider discussing potential implications for understanding perceptual disorders characterized by altered neural synchrony (e.g., schizophrenia, autism) and how your learning paradigm might inform perceptual training interventions.

      Thank you for this suggestion. We have added that TWCO might provide a new lens to study perceptual disorders to the Discussion. We provide a concrete example of the relation between grouping, gamma synchrony (in light of TWCO) and lateral connectivity in schizophrenia

      (5) I think this paper has real strength, but rather than dispersing limitations throughout the discussion, create a dedicated section that systematically addresses ecological validity, alternative explanations, and generalisability concerns. This will also preempt criticism.

      We appreciate the suggestion. Our preference is to discuss limitations in context, next to the specific results they qualify, so readers see why each limitation matters and how it affects interpretation. Nevertheless, paragraph 7 on page 20 summarizes most limitations in a single paragraph.

    1. which is 0 the maximum proportion of susceptible animals a population can support to avoid an epidemic when a disease with basic reproductive number R0

      That sentence is definitely worded in a slightly confusing way! To make it clearer, think of it as a "Safety Threshold."

      In simple terms, it describes the tipping point for herd immunity.

      Breaking it down

      The text says $\chi^* = \frac{1}{R_0}$. Here is what that actually represents:

      1. The Threshold ($\chi^*$): This is the specific proportion of the population that is still susceptible (not immune).
      2. The "Safety" Rule: To avoid an epidemic, the actual proportion of susceptible animals in your population must stay below this $\chi^*$ number.
      3. The "0" in the text: The text has a small typo or awkward phrasing where it says "which is 0 the maximum..." It likely meant to say "which is the maximum proportion."

      Why $1/R_0$ matters

      The Basic Reproductive Number ($R_0$) tells you how many new cases one infected animal will cause.

      • If $R_0 = 4$, then $\chi^ = 1/4 = \mathbf{0.25}$ (or 25%*).
      • This means if more than 25% of your animals are susceptible, the disease will spread (an epidemic).
      • If fewer than 25% are susceptible (meaning 75% are immune/vaccinated), the disease will die out because the "effective" reproductive number drops below 1.

      A clearer rewrite of that sentence:

      "This value ($\chi^$) represents the maximum allowed proportion* of susceptible animals. If the population's actual susceptibility stays at or below this limit, a disease with a reproductive power of $R_0$ cannot start an epidemic."

      In your research context

      Since you are looking at epidemiology at BNITM, this formula is the foundation for deciding vaccination targets. If you know the $R_0$ of a disease is 5, you know you must vaccinate until the susceptible proportion is less than $1/5$ ($20\%$).

    2. Proportion is the relationship between two values when the numerator is included in the denominator. Percentage is a proportion multiplied by one hundred. Ratio is the relationship between two values (variables, states of a variable) when the numerator is not included in the denominator. Rate is (i) a measure of the change of a variable per unit change of another variable, on which the first one depends, or (ii) the number of events per unit of time, related to the population at risk.

      Difference between proportion and ratio:

      1. The "Part-to-Whole" vs. "Part-to-Part" Rule

        Proportion (Part-to-Whole): The numerator is "inside" the denominator. You are comparing a slice to the entire pizza.

        Logic: A+BA​
        
        Example: If you have 2 female researchers and 8 male researchers, the proportion of females is 2/10 (or 0.2).
        

        Ratio (Part-to-Part): The numerator is "outside" the denominator. You are comparing one distinct group to another distinct group.

        Logic: BA​
        
        Example: Using the same group, the ratio of females to males is 2/8 (or 1:4).
        
    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Velázquez et al. investigate the transcriptomic and epigenetic consequences of exogenous expression of the Ewing sarcoma fusion oncogene EWSR1∷FLI1 (EF1) in yeast. The study provides compelling evidence that EF1 can bind ETS transcription factor motifs, as well as a single 4xGGAA repeat within the S. cerevisiae genome, despite the absence of several canonical cofactors often implicated in EF1 biology, including CBP/p300 and Polycomb group (PcG) proteins. The authors further show that EF1 expression redistributes RNA Polymerase II toward EF1-bound ETS sites, yet. Strikingly, this relocalization is accompanied by only modest global transcriptional effects relative to those reported in human or insect systems. In addition, EF1 expression reverses repeat-associated epigenetic silencing of synthetic GGAA microsatellites in engineered reporter strains. Taken together, the data support the conclusion that GGAA microsatellite-mediated transcriptional rewiring is a comparatively conserved EF1 property, whereas broader transcriptomic changes arising from individual EF1-bound ETS sites appear to be more context-dependent. Overall, the manuscript is clearly written and logically organized, and the methodological descriptions and data-analysis details appear sufficient to enable reproducibility.

      Major comments

      None.

      Minor comments

      1. Additional citations in the Introduction. A small number of additional references would further support specific statements.
        • End of paragraph 3: consider adding Boulay et al. (2017).
        • Paragraph 4, after "proximal and distal genes": consider adding Tomasou et al. (2015) and Orth et al. (2022).
      2. Potential antibody cross-reactivity in the CoIP experiment. Is there an EWSR1 homolog in S. cerevisiae? To rule out unintended interactions, it would be helpful to exclude binding of endogenous yeast proteins by the anti-EWS antibody used for co-immunoprecipitation, for example via a homology search and/or appropriate specificity controls.
      3. Clarification of EF1-associated toxicity. The authors suggest that part of the observed transcriptional signal may reflect EF1 toxicity. It would strengthen the interpretation to characterize this phenotype more explicitly (e.g., growth rate over time, viability/cell death, or longer-term fitness effects). In addition, it would be informative to test whether toxicity depends on EF1 DNA-binding activity and whether truncation mitigates toxicity, as reported in Drosophila (Mahnoor et al., 2024).
      4. RNA Polymerase II relocalization versus limited transcriptional output. The apparent recruitment/repositioning of RNA Pol II in the absence of substantial transcriptional change is particularly interesting. This point could be strengthened by assessing RNA Pol II "states," for example using phosphorylation-state specific antibodies to distinguish stalled/paused from actively elongating polymerase.
      5. Promoter choice in the GGAA microsatellite reporter. The use of a modified constitutive promoter to monitor GGAA microsatellite-dependent activation is somewhat unconventional. Many studies (e.g., Gangwal et al., Hölting et al.) use minimal promoters to demonstrate EF1-dependent upregulation rather than reversal of heterochromatin-associated silencing. A brief rationale for the chosen design or discussion of how it relates to prior reporter paradigms would help readers contextualize the approach.
      6. Reconstituting minimal cofactor requirements for ETS-site activity. The finding that EF1 binding at single ETS sites does not translate into strong transcriptomic remodeling in yeast is an intriguing aspect of the work. The manuscript could be further enriched by systematic attempts to reconstitute candidate cofactors in this minimal system to define the minimal requirements for ETS-site-dependent gene activation.

      Significance

      This study will be of broad interest because it convincingly separates EF1-driven GGAA microsatellite-dependent activation from the canonical regulatory functions of FLI1, reinforcing the concept that fusion transcription factors can acquire genuinely neomorphic activities-activities that may ultimately be therapeutically actionable. Beyond the biological insight, the successful establishment of exogenous EF1 expression in yeast is itself a notable technical achievement, given the longstanding challenges associated with EF1 expression in heterologous settings. As presented here, yeast offers a valuable platform to interrogate EF1 function in a simplified and more controlled context.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      When do behavioral differences emerge between the task variants? Based on the results and discussion, the cues increase the salience of either the wins or the losses, biasing behavior in favor of either risky or optimal choice. If this is the case, one might expect the cues to expedite learning, particularly in the standard and loss condition. Providing an analysis of the acquisition of the tasks may provide insight into how the cues are "teaching" decision-making and might explain how biases are formed and cemented.

      While considerable differences in decision making emerge in early sessions of training, we do not observe any evidence that cuing outcomes expedites the development of stable choice patterns. Indeed, since the outcomes are cued across all four options, there is no categorical difference in salience between optimal and risky choices. Thus, our interpretation is that cuing wins and/or losses alters the integration of this feedback into choice preference, rather than the rate of the development of choice preference. To quantitatively address this point, we have included the following analysis:

      “To quantitatively examine choice variability during training, we binned sessions 1-5 and 6-10 and analyzed variability in choice patterns across task variants. Analysis of the first five sessions of training revealed a significant shift in decision score across sessions (F(3, 502) = 31.23, p <.0001), which differed between task variants (session x task: F(16, 502) = 2.13, p = .007). Conversely, while significant differences in overall score were observed between task variants in sessions 6-10 (task: F(5, 156) = 6.81, p <.0001), there was no significant variability across sessions (session: F(3, 481) = 2.06, p = .10, task x session: F(15, 481) = 0.78, p = .71). This indicates that the variability in choice preference (and presumably, learning about outcomes) is maximized in the first five sessions, and there are no obvious differences in the rate of development of stable choice patterns between task variants.”

      Does the learning period used for the modeling impact the interpretation of the behavioral results? The authors indicate that computational modeling was done on the first five sessions and used these data to predict preferences at baseline. Based on these results, punishment learning predicts choice preference. However, these animals are not naïve to the contingencies because of the forced choice training prior to the task, which may impact behavior in these early sessions. Though punishment learning may initially predict risk preference, other parameters later in training may also predict behavior at baseline.

      The first five sessions were chosen based on a previously developed method used in Langdon et al. (2019). When choosing the number of sessions to include, there is a balance between including more data points to improve estimation of parameters while also targeting the timeframe of maximal learning. As training continues, the impact of outcomes on subsequent choice should decrease, and the learning rate would trend towards zero. This can be observed in the reduction in inter-session choice variability as training progresses, as demonstrated in the analyses above. Once learning has ceased, presumably other cognitive processes may dictate choice (for example, habitual stimulus-response associations), which would not be appropriately captured by reinforcement learning models. It would be a separate research question to determine the point at which parameters no longer become predictive, requiring a larger dataset to thoroughly assess. We acknowledge that we did not provide sufficient justification for the learning period used for the modeling. In conjunction with the analysis of early sessions outlined above, we have added the following to the text:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions. As in previous work, models were fit to valid choices from the first five sessions. As training continues, the impact of outcomes on subsequent choice should decline, and parameter values may evolve over time (e.g., decreasing learning rate). To target the period of learning during which outcomes have maximal influence over choice, and parameters likely have fixed values, we limited our analyses to the first five sessions.”

      The authors also present simulated data from the models for sessions 18-20, but according to the statistical analysis section, sessions 35-40 were used for analysis (and presumably presented in Figure 1). If the simulation is carried out in sessions 35-40, do the models fit the data?

      Based on our experience, choice patterns are well instantiated by session 20, and training only continues to 30+ sessions to achieve stability in other task variables (e.g., latencies, premature responding, etc.). That being said, the discrepancy between session numbers is confusing, so we’ve extended the simulations to match the same session numbers that were analyzed in the experimental data.

      Finally, though the n's are small, it would be interesting to see how the devaluation impacts computational metrics. These additional analyses may help to explain the nuanced effects of the cues in the task variants. 

      Unfortunately, as the devaluation experiment is only one session, there are insufficient data to run the same models. Furthermore, changes in choice are subtle and not uniform across rats, making it difficult to reliably model this effect at the individual level. A separate experiment could investigate the specific cognitive processes underlying the devaluation effect.

      Reviewer #1 (Recommendations for the authors):

      The authors do not present individual data points for behavior. Including these data points would improve the interpretability of the results. Adding significant notations to the bar graphs would also help the reader. Although the stats are provided and significant comparisons highlighted, it isn't easy to go between the table and the figure to detect significant outcomes. If done, the statistics tables could be moved to the supplement. Including estimates of effect size for main findings in the main text would also benefit the reader. 

      We thank the reviewer for their feedback on our approach to the figures and significance reporting – we have updated the relevant figures to include individual data points. Furthermore, we’ve added significance notations for task variants that are significantly different from the uncued or standard cued tasks on the figures. We’ve also moved some statistics tables to the supplement, as suggested. 

      The authors allude to other metrics of the task (trials, omissions, etc.) but do not present these data anywhere. Including supplementary figures including individual data points and statistical analyses in the supplement is strongly encouraged.

      A supplementary figure visualizing these metrics (choice latency, trials completed, and omissions) has been added, with individual data points included. Statistical analyses are reported in the main text – no significant effect in the ANOVAs were observed for any of these metrics, so post hoc analyses were not performed. 

      Figure 4 is confusing. Presenting the WAIC values for each model rather than compared to the nonlinear model would be easier to understand. It is also unclear if statistical tests were used to assess differences in model fit as no test information is provided.

      Figure 4 has been updated to increase clarity and address feedback from another reviewer. Raw WAIC values are not ideal for visualization, as the task variants have differing amounts of data and thus would be difficult to include on the same Y-axis. Instead, we present each model’s difference in WAIC relative to a basic model with no timeout penalty transform, so that all three models are visible, and the direction of model improvement is clearly indicated. Statistical tests of WAIC differences are not standard, as the numerical differences themselves indicate a better fit.

      The authors do not provide a data availability statement.

      We thank the reviewer for calling our attention to this oversight. A data availability statement has been added. 

      Reviewer 2 (Public review):

      Additional support and evidence are needed for the claims made by the authors. Some of the statements are inconsistent with the data and/or analyses or are only weakly supportive of the claims.

      We appreciate the reviewer’s overarching concern that some claims in the original manuscript were insufficiently supported by the data or analyses. To address this, we have provided further rationale for the devaluation experiment and clarified our interpretation of those results, expanded the computational modeling analyses, and revised figures and wording to improve clarity. Below, we respond to the reviewer’s specific comments in detail.

      Reviewer #2 (Recommendations for the authors):

      Different variants of an RL model were used to understand how loss outcomes impacted choice behavior across the gambling task variants. Did the authors try different variants for rewarded outcomes? I wonder whether the loss specific RL effects are constrained to that domain or perhaps emerged because choice behavior to losses was better estimated with the different RL variants. For example, rewarded outcomes across the different choices may not scale linearly (e.g., 1, 2, 3, 4) so including a model in which Rtr is scaled by a free parameter might improve the fit for win choices.

      We agree that asymmetries in model flexibility could, in principle, contribute to the observed effects. While we are somewhat limited in our ability to develop and validate further models due to the small size of the datasets compared to the high degree of choice variability between rats, we have explored the possibility as far as the data allow by fitting a model that includes a scaling parameter for rewards in addition to punishments:

      “While we restricted our model selection to those previously validated on larger datasets, the specificity of the main finding to the punishment learning rate may be due to the greater flexibility afforded to loss scaling, rather than a true asymmetry in learning. To test this hypothesis, we fit a model featuring a scaling parameter for rewards, in addition to scaled costs:

      where mRew is a linear scaling parameter for reward size. A separate scaling parameter was used for timeout penalty duration (i.e., same as scaled cost model). Group-level parameter estimates (Figure S3) reflected similar differences in the punishment learning rate and reward learning rate as the scaled cost model (Figure S4). Furthermore, all 95% HDIs for the mRew scaling parameter included 1, indicating that at least at the group level, scaling of reward size across the P1-P4 options closely follows the actual number of earned sucrose pellets. Thus, we find no evidence that our results can be simply attributed to the increased parameterization of losing outcomes.”

      Additionally, I would like to see evidence that these alternative models provide a better fit compared to a standard delta-rule updating for unrewarded choices.

      Each model is now compared directly to a standard delta-rule update model in the WAIC figure to demonstrate that the current models are a better fit for the data.

      Could the authors provide some visualization of how variation in the r, m, or b parameters impact choices and/or patterns of choices?

      We have added a figure to the supplementary section to visualize how different values for the r, m, and b parameters could alter the size of updates to Q-values on each trial across the four different options, thereby impacting subsequent choice. 

      It was challenging to understand the impact of the reported effects and interpretation of the authors at various points in the manuscript. For example, the authors state that "only rats trained on tasks without win-paired cues exhibited shifts in risk preference following reinforcer devaluation". Figure 3 however seems to indicate that rats trained on the reverse-cued task show shifts in risk preference. 

      We agree the original wording did not fully capture the nuance apparent in the figure. While not significantly different from baseline, rats in the reverse-cued experiment could have indeed updated their choice patterns and we were underpowered to detect the effect. We have updated the results section to include this point, and to more specifically outline that win-paired cues that scale with reward size lead to insensitivity to reinforcer devaluation:

      “This indicates that pairing audiovisual cues with reward induces some degree of inflexibility in risk-preferring rats. Importantly, pairing cues with losses alone does not elicit rigidity in choice. Thus, in keeping with the observed effect on overall choice patterns, pairing cues with wins has a unique impact on sensitivity to reinforcer devaluation. Although not statistically significant, visual inspection of the reverse-cued task suggests that some choice flexibility may be present, and the study may be underpowered to detect this effect. Nonetheless, win-paired cues that scale with reward size reduce flexibility in choice patterns following reinforcer devaluation.”

      It was not clear to me why the authors did a devaluation test and what was expected. Adding details regarding the motivation for specific analyses and/or experiments would improve understanding of these exciting results.

      Further explanation has been added to the results section for the devaluation test to clarify the rationale and expected results:

      “We next tested whether pairing salient audiovisual cues with outcomes on the rGT impacts flexibility in decision making when outcome values are updated. Reinforcer devaluation, in which subjects are sated on the sugar pellet reinforcer prior to task performance (presumably devaluing the outcome), is a common test of flexibility of decision making (Adams & Dickinson, 1981). We have previously employed this method to demonstrate that rats trained on the standard-cued task are insensitive to reinforcer devaluation (i.e., choice patterns do not shift despite devaluation of the sugar pellet reward; Hathaway et al., 2021).”

      Some rats in the rGT become risk takers and some do not, but whether this is an innate phenomenon or emerges with training is not known. The authors report some correlations between the RL parameters and subsequent risk scores but this may be an artifact because the risk scores and many of the parameters differ between the experimental groups. Restricting these analyses to the rats in the standard procedure (or even conducting it in other rats that have been run in the rGT standard task) would alleviate this concern. The authors should also expand upon this result in the discussion. (if it holds up) and provide graphs of this relationship in the manuscript.

      In a previous paper on which these analyses were based (Langdon et al., 2019), analyses of the relationship between RL parameter estimates and final decision score were conducted separately for rats trained on either the uncued or standard cued task, as the reviewer has suggested here. Those analyses showed that parameters controlling the learning from negative outcomes were specifically related to final score in both tasks. While we don’t have the appropriate n per group to split the analyses by task variant in the current study, we have highlighted these previous findings in the results section to address this concern:

      “In Langdon et al. (2019), analyses were conducted to test whether parameters controlling sensitivity to punishment predicted final decision score at the end of training in the uncued and standard cued task variants. These analyses showed that across both task variants, there was evidence of reduced punishment sensitivity (i.e., lower m parameter or punishment learning rate) in risky versus optimal rats. We conducted similar analyses here to examine whether parameter estimates covary with decision score at end of training. To accomplish this, we fit simple linear regression models for each parameter and assessed whether the slopes were significantly different from zero.”

      I don't see a b parameter in the nonlinear cost model, but is presented in Figure 6 and also in the "Parameters predicting risk preference on the rGT". The authors either need to update the formula or clarify what the b parameter quantifies in the nonlinear model.

      We thank the reviewer for pointing out this oversight; the equation has been updated to include the b parameter.

      The risk score is very confusing as high numbers or % indicate less risk and lower (more negative numbers) indicate greater risk. I've had to reread the text multiple times to remind myself of this, so I anticipate the same will be true for other readers. Perhaps the authors can add a visual guide to their y-axis indicating more positive numbers are less risky choices.

      We acknowledge that this measure can be confusing – the calculation of this score is standard for the Iowa Gambling Task conducted in humans, on which the rGT is based, and was therefore adopted here. We’ve changed the name from “risk score” to “decision score”, along with including a visual guide to the y-axis in Figure 2, to address this point.

      Negative learning rate is confusing as it almost implies that the learning was a negative value, rather than being a learning rate for negative outcomes. Please revise in the figures and in the text.

      We have updated the text and figures where appropriate from “negative learning rate” to “punishment learning rate”. We have also changed the text from “positive learning rate” to “reward learning rate” to match this terminology.

      Reviewer 3 (Public review):

      There is a very problematic statistical stratagem that involves categorising individuals as either risky or optimal based on their choice probabilities. As a measurement or outcome, this is fine, as previously highlighted in the results, but this label is then used as a factor in different ANOVAs to analyse the very same choice probabilities, which then constitutes a circular argument (individuals categorised as risky because they make more risky choices, make more risky choices...).

      Risk status was included as a factor to test whether the effects of the cue paradigms differed between risky versus optimal rats (i.e., interaction effects), not as an independent predictor of choice preference. We focus on results showing a significant task x risk status interaction, and conducted follow-up analyses separately within each group, at which point risk status was no longer included as a factor. We do not interpret main effects or choice x status interactions, which would indeed be circular for the reason noted by the reviewer.

      A second experiment was done to study the effect of devaluation on risky choices in the different tasks. The results, which are not very clear to understand from Figure 3, would suggest that reward devaluation affects choices in tasks where the win-cue pairing is not present. The authors interpret this result by saying that pairing wins with cues makes the individuals insensitive to reward devaluation. Counter this, if an individual is prone to making risky choices in a given task, this points to an already distorted sense of value as the most rewarding strategy is to make optimal non-risky choices.

      We have included significance notations in Figure 3 and included further detail in the text to improve clarity of the findings for the devaluation test. The reviewer raises an interesting point that risk-preferring rats have a distorted sense of value, since they do not follow the optimal strategy. However, we believe that this is at least partially separable from insensitivity to devaluation, since risk-preferring rats trained on tasks that don’t feature win-paired cues still exhibit flexibility in choice. We have added the following point to the discussion to address this:

      “While risk-preferring rats exhibit some degree of distortion in reward valuation, as they do not follow the most rewarding strategy (i.e., selecting optimal options), we believe this to be at least partially separable from choice inflexibility, as risk-preferring rats on tasks that don’t feature win-paired cues remain sensitive to devaluation.”

      While the overall computational approach is excellent, I believe that the choice of computational models is poor. Loss trials come at a double cost, something the authors might want to elaborate more upon, firstly the lost opportunity of not having selected a winning option which is reflected in Q-learning by the fact that r=0, and secondly a waiting period which will affect the overall reward rate. The authors choose to combine these costs by attempting to convert the time penalty into "reward currency" using three different functions that make up the three different tested models. This is a bit of a wasted opportunity as the question when comparing models is not something like "are individuals in the paired win-cue tasks more sensitive to risk? or less sensitive to time? etc" but "what is the best way of converting time into Q-value currency to fit the data?" Instead, the authors could have contrasted other models that explicitly track time as a separate variable (see for example "Impulsivity and risk-seeking as Bayesian inference under dopaminergic control" (Mikhael & Gershman 2021)) or give actions an extra risk bonus (as in "Nicotinic receptors in the VTA promote uncertainty seeking" (Naude et al 2016)).

      We thank the reviewer for their thoughtful suggestions and agree that alternative modeling frameworks that explicitly track time or incorporate uncertainty bonuses would be highly informative for understanding the mechanisms underlying risky choice. However, the models employed here are drawn from previous work that required >100 rats per group for model development and validation. Due to the high degree of variability in decision making within the groups and the relatively small number of rats, this dataset is not well suited for substantial model innovation. Indeed, the most complex model from previous work had to be simplified to achieve model convergence. Testing models that greatly diverge from the previously validated RL models would make it difficult to determine whether poor model fit reflects a misspecified model or insufficient data.

      We’d also like to note that the driving question for this study is to investigate the impact of different cue variants on choice patterns – untangling the relationship between timing, uncertainty, and risky choice is an important and interesting question, but beyond the scope of the present work. 

      To address this limitation, we have expanded our justification of model choice in the results section to emphasize that we are applying previously developed models, with minor extensions:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions.”

      Another weakness of the computational section is the fact, that despite simulations having been made, figure 5 only shows the simulated risk scores and not the different choice probabilities which would be a much more interesting metric by which to judge model validity. 

      We have expanded Figure 5 to show the simulated choice of each option.

      In the last section, the authors ask whether the parameter estimates (obtained from optimisation on the early sessions) could be used to predict risk preference. While this is an interesting question to address, the authors give very little explanation as to how they establish any predictive relationship. A figure and more detailed explanation would have been warranted to support their claims.

      We have expanded this section to provide clearer detail on the methods used to conduct this analysis and added a figure. To address a point raised by another reviewer, the statistical approach has been revised to more closely align with that used in Langdon et al. (2019), and the results have been updated appropriately:

      “We next tested whether any of the subject-level parameter estimates in the nonlinear or scaled + offset model could reliably predict risk preference scores at the end of training. In Langdon et al. (2019), analyses were conducted to test whether parameters controlling sensitivity to punishment predicted final decision score at the end of training in the uncued and standard cued task variants. These analyses showed that across both task variants, there was evidence of reduced punishment sensitivity (i.e., lower m parameter or punishment learning rate) in risky versus optimal rats. We conducted similar analyses here to examine whether parameter estimates covary with decision score at end of training. To accomplish this, we fit simple linear regression models for each parameter and assessed whether the slopes were significantly different from zero.”

      Why were the simulated risk scores calculated for sessions 18-20 and not 35-39 as in the experimental data, and why were the models optimised only on the first sessions?

      These points were addressed in response to reviewer #1:

      Based on our experience, choice patterns are well instantiated by session 20, and training only continues to 30+ sessions to achieve stability in other task variables (e.g., latencies, premature responding, etc.). That being said, the discrepancy between session numbers is confusing, so we’ve extended the simulations to match the same session numbers that were analyzed in the experimental data.

      The first five sessions were chosen based on a previously developed method used in Langdon et al. (2019). When choosing the number of sessions to include, there is a balance between including more data points to improve estimation of parameters while also targeting the timeframe of maximal learning. As training continues, the impact of outcomes on subsequent choice should decrease, and the learning rate would trend towards zero. This can be observed in the reduction in inter-session choice variability as training progresses, as demonstrated in the analyses above. Once learning has ceased, presumably other cognitive processes may dictate choice (for example, habitual stimulus-response associations), which would not be appropriately captured by reinforcement learning models. It would be a separate research question to determine the point at which parameters no longer become predictive, requiring a larger dataset to thoroughly assess. We acknowledge that we did not provide sufficient justification for the learning period used for the modeling. In conjunction with the analysis of early sessions outlined above, we have added the following to the text:

      “We investigated differences in the acquisition of each task variant by fitting several reinforcement learning (RL) models to early sessions. Our modeling approach closely follows methods outlined in Langdon et al. (2019), in which a much larger dataset (>100 rats per task) was used to develop the RL models applied here. Due to the comparatively small n per group in the current study, we limited our model selection to those previously validated in Langdon et al. (2019), with minor extensions. As in previous work, models were fit to valid choices from the first five sessions. As training continues, the impact of outcomes on subsequent choice should decline, and parameter values may evolve over time (e.g., decreasing learning rate). To target the period of learning during which outcomes have maximal influence over choice, and parameters likely have fixed values, we limited our analyses to the first five sessions.”

      Concerning the figures, could you consider replacing or including with the bar plots, the full distribution of individual dots, or a violin plot, something to better capture the distribution of the data. This would be particularly beneficial for Figure 2B the risk score which, without a distribution suggests all individuals are optimal, something which in the text claim is not the case. 

      Individual data points have been added to the relevant figures.

      Is this not a case of compositional data where ANOVA is definitely not an appropriate method (compositional data consist in reporting proportions of different elements in a whole, eg this rock is 60% silicate, 20% man-made cement, etc.) because of violation of normality and mostly dependence between measurements (the sum must be 100% as in your case where knowing the proportions of P1, P2 and P3, I automatically deduce P4). I leave to you the care of finding a potential alternative. In any case, I also had difficulties understanding the varying degrees of freedom of the different reported F statistics which worry me that this has not been done properly.

      This is a fair criticism, as choice proportions across P1-P4 are not fully independent. While alternative approaches do exist, there is no widely adopted or straightforward method that has been validated for this task. Accordingly, ANOVA remains the standard analytical approach for this task, as it facilitates comparison with previous work and is readily understood by readers. As mentioned in the methods, an arcsine transformation was applied to the proportional data to mitigate issues associated with bounded measures (i.e., summing to 100%). We thank the reviewer for drawing our attention to the discrepancies in the degrees of freedom – these have now been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides a useful analysis of the changes in chromatin organization and gene expression that occur during the differentiation of two cell types (anterior endoderm and prechordal plate) from a common progenitor in zebrafish. Although the findings are consistent with previous work, the evidence presented in the study appears to be incomplete and would benefit from more rigorous interpretation of single-cell data, more in-depth lineage tracing, overexpression experiments with physiological levels of Ripply, and a clearer justification for using an explant system. With these modifications, this paper will be of interest to zebrafish developmental biologists investigating mechanisms underlying differentiation.

      We sincerely thank the editor and the reviewers for their valuable time and efforts. Their insightful comments were greatly appreciated and have been largely addressed in the revised manuscript. We are confident that these revisions have enhanced the overall quality and clarity of our paper.

      Reviewer #1 (Public review):

      Summary:

      During vertebrate gastrulation, mesendoderm cells are initially specified by morphogens (e.g. Nodal) and segregate into endoderm and mesoderm in part based on Nodal concentrations. Using zebrafish genetics, live imaging, and single-cell multi-omics, the manuscript by Cheng et al presents evidence to support a claim that anterior endoderm progenitors derive primarily from prechordal plate progenitors, with transcriptional regulators goosecoid (Gsc) and ripply1 playing key roles in this cell fate determination. Such a finding would represent a significant advance in our understanding of how anterior endoderm is specified in vertebrate embryos.

      We would like to thank reviewer #1 for his/her comments and positive feedbacks about our manuscript.

      Strengths:

      Live imaging-based tracking of PP and endo reporters (Figure 2) is well executed and convincing, though a larger number of individual cell tracks will be needed. Currently, only a single cell track (n=1) is provided.

      We thank the reviewer for the positive comments and the valuable suggestion. As the reviewer suggested, we re-performed live imaging analyses on the embryos of Tg(gsc:EGFP;sox17:DsRed). We tracked dozens of cells during their transformation from gsc-positive to sox17-positive. Furthermore, we performed quantification of the RFP/GFP signal intensity ratio in these cells over the course of development (Please see the revised Figure 2D and MovieS4).

      Weaknesses:

      (1) The central claim of the paper - that the anterior endoderm progenitors arise directly from prechordal plate progenitors - is not adequately supported by the evidence presented. This is a claim about cell lineage, which the authors are attempting to support with data from single-cell profiling and genetic manipulations in embryos and explants. The construction of gene expression (pseudo-time) trajectories, while a modern and powerful approach for hypothesis generation, should not be used as a substitute for bona fide lineage tracing methods. If the authors' central hypothesis is correct, a CRE-based lineage tracing experiment (e.g. driving CRE using a PP marker such as Gsc) should be able to label PP progenitor cells that ultimately contribute to anterior endoderm-derived tissues. Such an experiment would also allow the authors to quantify the relative contribution of PP (vs non-PP) cells to the anterior endoderm, which is not possible to estimate from the indirect data currently provided. Note: while the present version of the manuscript does describe a sox17:CRE lineage tracing experiment, this actually goes in the opposite direction that would be informative (sox:17:CRE-marked descendants will be a mixture of PP-derived and non-PP derived cells, and the Gsc-based reporter does not allow for long-term tracking the fates of these cells).

      We sincerely thank the reviewer for the professional comments and the constructive suggestions. As the reviewer indicated, utilizing the single-cell transcriptomic trajectory analyses on zebrafish embryos and Nodal-injected explants system, along with the live imaging analyses on Tg(gsc:EGFP;sox17:DsRed) embryos, we revealed that anterior endoderm progenitors arise from prechordal plate progenitors. To further verify this observation, we conducted two sets of lineage-tracing assays. Initial evidence came from the results of co-injecting sox17:Cre and gsc:loxp-STOP-loxp-mcherry plasmids. We observed RFP-positive cells at 8 hpf, demonstrating the presence of cells that had expressed both genes. To explicitly follow the proposed lineage, we then implemented a reciprocal strategy, as suggested by the reviewer, that constructed and co-injected sox17:loxp-STOP-loxp-mcherry and gsc:Cre plasmids. The appearance of RFP-positive cells in the anterior dorsal region at 8 hpf provides direct evidence for a transition from gsc-positive to sox17-positive identity. These results are now included in the revised manuscript (Please see Author response image 1 and Figure S4E). However, in accordance with the reviewer's caution, we acknowledge that this does not prove this is the sole origin of anterior endoderm. Consequently, we have revised the text to clarify that our findings demonstrate that anterior endoderm can be specified from prechordal plate progenitors, without claiming that it is the only source.

      Author response image 1.

      Characterization of anterior endoderm lineage by Cre-Lox recombination system.

      (2) The authors' descriptions of gene expression patterns in the single-cell trajectory analyses do not always match the data. For example, it is stated that goosecoid expression marks progenitor cells that exist prior to a PP vs endo fate bifurcation (e.g. lines 124-130). Yet, in Figure 1C it appears that in fact goosecoid expression largely does not precede (but actually follows) the split and is predominantly expressed in cells that have already been specified into the PP branch. Likewise, most of the cells in the endo branch (or prior) appear to never express Gsc. While these trends do indeed appear to be more muddled in the explant data (Figure 1H), it still seems quite far-fetched to claim that Gsc expression is a hallmark of endoderm-PP progenitors.

      We thank the reviewer for pointing out this issue. Our initial analysis proposed that the precursors of the prechordal plate (PP) and anterior endoderm (endo) more closely resemble a PP cell fate, as their progenitor populations highly express PP marker genes, such as gsc. The gsc gene is widely recognized as a PP marker[1]. The reviewer pointed out that in our analysis, these precursor cells do not initially exhibit high gsc expression; rather, gsc expression gradually increases as PP fate is specified.

      The reason for this observation is as follows: First, for the in vivo data, we used the URD algorithm to trace back all possible progenitor cells for both the PP and anterior endo trajectory. As mentioned in the manuscript, the PP and anterior endo are relatively distant in the trajectory tree of the zebrafish embryonic data. Consequently, this approach likely included other, confounding progenitor cells that do not express gsc (like ventral epiblast, Author response image 2). However, we further investigated the expression of gsc and sox17 along these two trajectories. The conclusion remains that gsc expression is indeed higher than sox17 in the progenitor cells common to both trajectories (Author response image 2). Combined with the live imaging analysis presented in this study, which shows that gsc expression increases progressively in the PP, this supports the notion that the progenitor cells for both PP and anterior endoderm initially bias towards a PP cell fate.

      On the other hand, in our previously published work using the Nodal-injected explant system, which specifically induces anterior endo and PP, the cellular trajectory analysis also revealed that the specifications of PP and anterior endo follow very similar paths. Therefore, we proceeded to analyze the Nodal explant data. Similarly, when using URD to trace the differentiation trajectories of PP and anterior endo cells, a small number of other progenitor cells were also captured. This explains why a minority of cells do not express gsc—these are likely ventral epiblast cells (Author response image 2). However, based on the Nodal explant data, gsc is specifically highly expressed in the progenitor cells of the PP and anterior endo. Its expression remains high in the PP trajectory but gradually decreases in the endoderm trajectory (Figure 1H).

      Author response image 2.

      (A) The expression of ventral epiblast markers in PP and anterior Endo URD trajectory. (B) The expression of gsc, sox32 and sox17 in the progenitors of PP and anterior endo in embryos and Nodal explants.

      (3) The study seems to refer to "endoderm" and "anterior endoderm" somewhat interchangeably, and this is potentially problematic. Most single-cell-based analyses appearing in the study rely on global endoderm markers (sox17, sox32) which are expressed in endodermal precursors along the entire ventrolateral margin. Some of these cells are adjacent to the prechordal plate on the dorsal side of the gastrula, but many (most in fact) are quite some distance away. The microscopy-based evidence presented in Figure 2 and elsewhere, however, focuses on a small number of sox17-expressing cells that are directly adjacent to, or intermingled with, the prechordal plate. It, therefore, seems problematic for the authors to generalize potential overlaps with the PP lineage to the entire endoderm, which includes cells in ventral locations. It would be helpful if the authors could search for additional markers that might stratify and/or mark the anterior endoderm and perform their trajectory analysis specifically on these cells.

      We thank the reviewer for these comments and suggestions. We fully agree with the reviewer's point that the expression of sox32 and sox17 cannot be used to distinguish dorsal endoderm from ventral-lateral endoderm cells. However, during the gastrulation stage, all endodermal cells express sox32 and sox17, and there are currently no specific marker genes available to distinguish between them.

      After gastrulation ends, the dorsal endoderm (i.e., the anterior endoderm) begins to express pharyngeal endoderm marker genes, such as pax1b. Therefore, in the analysis of embryonic data in vivo, when studying the segregation of the anterior endoderm and PP trajectory, we specifically used the pharyngeal endoderm as the subject to trace its developmental trajectory.

      In the case of Nodal explants, Nodal specifically induces the fate of the dorsal mesendoderm, which includes both the PP and pharyngeal endoderm (anterior endoderm). Precisely for this reason, we consider the Nodal explant system as a highly suitable model for investigating the mechanisms underlying the cell fate separation between anterior endoderm and PP. Thus, in the Nodal explant data, we included all endodermal cells for downstream analysis.

      To avoid any potential confusion for readers, we have revised the term "endoderm" in the manuscript to "anterior endoderm" as suggested by the reviewer.

      (4) It is not clear that the use of the nodal explant system is allowing for rigorous assessment of endoderm specification. Why are the numbers of endoderm cells so vanishingly few in the nodal explant experiments (Figure 1H, 3H), especially when compared to the embryo itself (e.g. Figures 1C-D)? It seems difficult to perform a rigorous analysis of endoderm specification using this particular model which seems inherently more biased towards PP vs. endoderm than the embryo itself. Why not simply perform nodal pathway manipulations in embryos?

      We sincerely thank the reviewer for raising this important question. In our study of the fate separation between the PP and anterior endoderm, we initially analyzed zebrafish embryonic data. However, when reconstructing the transcriptional lineage tree using URD, we observed that these two cell trajectories were positioned relatively far apart on the tree. Yet, existing studies have shown that the anterior endoderm and PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells[2-4], and they share transcriptional similarities[5]. Therefore, as the reviewer pointed out, when tracing all progenitor cells of these two trajectories using the URD algorithm, it is easy to include other cell types, such as ventral epiblast cells (Author response image 2). For this reason, we concluded that directly using embryonic data to dissect the mechanism of fate separation between PP and anterior endoderm might not yield highly accurate results.

      In contrast, our group’s previous work, published in Cell Reports, demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including anterior endoderm, PP, and notochord[5]. Thus, we considered the Nodal explant system to be a highly suitable model for investigating the mechanism of fate separation between PP and anterior endoderm. Ultimately, by analyzing both in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further validated by live imaging experiments.

      Regarding the reviewer’s concern about the relatively low number of endodermal cells in the Nodal explant system, we speculate that this is because the explants predominantly induce anterior endoderm. Since endodermal cells constitute only a small proportion of cells during gastrulation, and anterior endoderm represents an even smaller subset, the absolute number is naturally limited. Nevertheless, the anterior endodermal cells captured in our Nodal explants were sufficient to support our analysis of the fate separation mechanism between anterior endoderm and PP. Finally, to further strengthen the findings from scRNA-seq analyses, we subsequently performed live imaging validation experiments using both zebrafish embryos and the explant system.

      (5) The authors should not claim that proximity in UMAP space is an indication of transcriptional similarity (lines 207-208), especially for well-separated clusters. This is a serious misrepresentation of the proper usage of the UMAP algorithm. The authors make a similar claim later on (lines 272-274).

      We would like to extend our gratitude to the reviewer for their insightful comments. We have revised the descriptions regarding UMAP throughout the manuscript as suggested (Please see the main text in revised manuscript).

      Reviewer # 1 (Recommendations For The Authors):

      - Pseudotime trajectories constructed from single-cell snapshots are not true "lineage" measurements. Authors should refrain from referring to such data as lineage data (e.g. lines 99, 100, 103, 109, 112, 127, etc). Such models should be referred to as "trajectories", "hypothetical lineages", or something else.

      We are grateful to the reviewer for this comment. Following their recommendation, we have revised the terminology from "transcriptional lineage tree" to "trajectory" across the entire manuscript (Please see main text in revised manuscript).

      - The live imaging data presented in Figure 2 (and supplemental figures) are compelling and do seem to show that some cells can switch between PP and endo states. However, the number of cells reported is still too low to be able to ascertain whether or not this is just a rare/edge-case phenomenon. Tracks for just a single cell are reported in Figure 2C-D. This is insufficient. Tracks for many more cells should be collected and reported alongside this current sole (n=1) example. The choice of time window for these live imaging experiments should also be better explained. These live imaging experiments are being performed at or after 6hpf, but authors claim in the text that "... the segregation between PP and Endo has already occurred by 6hpf." (lines 126-127). Why not perform these live imaging experiments earlier, when the initial fate decision between PP and endo is supposedly occurring?

      We sincerely appreciate the reviewer’s insightful questions and constructive feedback. In response, we have made several important revisions. First, the reviewer noted that our original manuscript tracked only a single cell and suggested increasing the number of tracked cells. Following this recommendation, we repeated the live-imaging experiments and expanded the number of tracked endodermal cells (Please see the revised Movie S4 and Figure 2D). The experimental conditions were kept identical to the previous setup, and these cells consistently exhibited a gradual transition from a gsc+ fate to a sox17+ endodermal fate. In addition, the reviewer recommended performing live imaging at an earlier time point (Movie S5). Accordingly, we conducted additional experiments initiating live imaging at around 5.7 hours and observed the onset of a sox17 expression in gsc+ cells at approximately 6 hpf, which is consistent with our single-cell transcriptomic analysis.

      - The sections devoted to lengthy descriptions of GO terms (lines 131-146, 239-254) and receptor-ligand predictions (lines 170-185) are largely speculative. Consider streamlining.

      Thanks for the reviewer's comment. We have streamlined the content related to the GO analysis as suggested (Please see Lines 128-132, 157-167, 221-225).

      - The use of a "Nodal Activity Score" (lines 212-226) is clever but might actually be less informative than showing contributions from individual nodal target genes. The combining of counts data from 29 predicted nodal targets means that the contribution (or lack of contribution) from each gene becomes masked. The authors should include supplementary dot plots that break down the score across all 29 genes, allowing the reader to assess overall contributions and/or sub-clusters of gene co-expression patterns, if present.

      Thank you very much for the reviewer's positive feedback on our use of the "Nodal Activity Score" and the valuable suggestions provided. Following the recommendation, we analyzed the expression of the 29 Nodal direct targets used in our study across the WT, ndr1 knockdown (kd), and lft1 knockout (ko) groups. We found that the known axial mesoderm genes, such as chrd, tbxta, noto, and gsc, contributed significantly to the Nodal score. The newly conducted analysis has been included in the Supplementary Information (Please see Figure S7L).

      - The differential expression trends being reported for srcap (line 251) do not appear to be significant. Are details and P-values for these DEG tests reported somewhere in the manuscript?

      We thank the reviewer for raising this question. Based on the reviewer's comment, we performed statistical tests (Wilcoxon test) to compare the expression of srcap in PP and Endo. Our analysis revealed that while srcap expression is slightly higher in PP than in Endo, this difference is not statistically significant. The specific p-value and fold change have been indicated in the revised figure (Please see Figure 4J and S7H). Based on this analysis, we revised our description to state that srcap expression is slightly higher in the PP compared to in the anterior endoderm.

      - Following the drug experiments with the drug AU15330 (lines 254-263), authors have only reported #s of endodermal cells, which seem to have increased, which the authors suggest indicates a fate switch from PP to endo. However, the authors have not reported whether the numbers of PP cells decreased or stayed the same in these embryos. This would be helpful information to include, as it is very difficult to discern quantitative trends from the images presented in Fig 4H and 4L.

      Thank the reviewer for his/her comments and suggestions. Following the reviewer's suggestions, we performed Imaris analysis on the HCR staining results from the DMSO (control), 1μM AU15330-treated, and 5μM AU15330-treated groups. Our analysis focused on the number of frzb-positive cells (PP), and the comparison revealed that treatment with AU15330 significantly reduces the PP cell number. These findings have been incorporated into the revised manuscript and supplementary information (Please see Figures S7J and S7K).

      Reviewer #2 (Public review):

      Summary:

      During vertebrate gastrulation, the mesoderm and endoderm arise from a common population of precursor cells and are specified by similar signaling events, raising questions as to how these two germ layers are distinguished. Here, Cheng and colleagues use zebrafish gastrulation as a model for mesoderm and endoderm segregation. By reanalyzing published single-cell sequencing data, they identify a common progenitor population for the anterior endoderm and the mesodermal prechordal plate (PP). They find that expression levels of PP genes Gsc and ripply are among the earliest differences between these populations and that their increased expression suppresses the expression of endoderm markers. Further analysis of chromatin accessibility and Ripply cut-and-tag is consistent with direct repression of endoderm by this PP marker. This study demonstrates the roles of Gsc and Ripply in suppressing anterior endoderm fate, but this role for Gsc was already known and the effect of Ripply is limited to a small population of anterior endoderm. The manuscript also focuses extensively on the function of Nodal in specifying and patterning the mesoderm and endoderm, a role that is already well known and to which the current analysis adds little new insight.

      We would like to thank the reviewer #2 for the constructive comments and positive feedback regarding our manuscript.

      Strengths:

      Integrated single-cell ATAC- and RNA-seq convincingly demonstrate changes in chromatin accessibility that may underlie the segregation of mesoderm and endoderm lineages, including Gsc and ripply. Identification of Ripply-occupied genomic regions augments this analysis. The genetic mutants for both genes provide strong evidence for their function in anterior mesendoderm development, although these phenotypes are subtle.

      We thank the reviewer for recognizing our work, and we greatly appreciate the constructive suggestions from the reviewer.

      Weaknesses:

      The use of zebrafish embryonic explants for cell fate trajectory analysis (rather than intact embryos) is not justified. In both transcriptomic comparisons between the two fate trajectories of interest and Ripply cut-and-tag analysis, the authors rely too heavily on gene ontology which adds little to our functional understanding. Much of the work is focused on the role of Nodal in the mesoderm/endoderm fate decision, but the results largely confirm previous studies and again provide few new insights. Some experiments were designed to test the relationship between the mesoderm and endoderm lineages and the role of epigenetic regulators therein, but these experiments were not properly controlled and therefore difficult to interpret.

      We sincerely thank the reviewer for the comments. As we previously answered, in our study of the fate differentiation between the PP and the anterior endoderm, we initially analyzed zebrafish embryonic data. However, when we used URD to reconstruct the transcriptional trajectory tree, we found that these two cell trajectories were distantly located on the tree. Existing studies have shown that the anterior endoderm and the PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells and share transcriptional similarities[2-4]. Therefore, when tracing all progenitor cells of these two trajectories using the URD algorithm, it is easy to include other cell types, such as ventral mesendodermal cells (Please see Author response image 2A). Based on this, we believe that directly using embryonic data to decipher the mechanism of fate differentiation between the PP and the anterior endoderm may not yield sufficiently precise results. In contrast, our group’s previous study published in Cell Reports demonstrated that the Nodal-induced explant system can specifically enrich dorsal mesendodermal cells, including the anterior endoderm, PP, and notochord[5]. Thus, we consider the Nodal explant system as an ideal model for studying the fate differentiation mechanism between the PP and the anterior endoderm. Ultimately, through comprehensive analysis of in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further validated by live imaging experiments.

      Regarding the GO analysis, we have streamlined it as suggested by the reviewers. In the revised manuscript, we analyzed the expression of specific genes contributing to key GO functions. Additionally, in the revised version, we conducted more live imaging experiments and quantitative cell assays. We designed gRNA for srcap using the CRISPR CAS13 system to knock down srcap, which further corroborated the morpholino knockdown results, showing consistency with the morpholino data. We also performed Western blot validation of the SWI/SNF complex's response to the drug AU15330, confirming the drug's effectiveness. We hope these additional experiments adequately address the reviewers' concerns.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the introduction, the authors state that mesendoderm segregates into mesoderm and endoderm in a Nodal-concentration dependent manner. While it is true that higher Nodal signaling levels are required for endoderm specification, A) this is also true for some mesoderm populations, and B) Work from Caroline Hill's lab has shown that Nodal activity alone is not determinative of endoderm fate. Although the authors cite this work, it is conclusions are not reflected in this over-simplified explanation of mesendoderm development. The authors also state that it is not clear when PP and endoderm can be distinguished transcriptionally, but this was also addressed in Economou et al, 2022, which found that they can be distinguished at 60% epiboly but not 50% epiboly.

      We sincerely thank the reviewer for raising this question and reminding us of the conclusions drawn from that excellent study. As the reviewer pointed out, Economou et al. demonstrated that Nodal signaling alone is insufficient to determine the cell fate segregation of mesendoderm[6]. However, their study primarily focused on the fate segregation of the ventral-lateral mesendoderm lineage. In contrast, we believe that the mechanisms underlying dorsal mesendoderm specification may differ.

      First, it is well-studied that in zebrafish embryos, the most dorsal mesendoderm is initially specified by the activity of the dorsal organizer. Notably, the Nodal signaling ligands ndr1 and ndr2 begin to be expressed in the dorsal organizer as early as the sphere stage[7]. In our study, through single-cell transcriptomic trajectory analysis and live imaging analysis, we observed that the cell fate segregation of the dorsal mesendoderm can be traced back to the shield stage.

      Second, the regulatory mechanisms governing dorsal mesendoderm fate differentiation may differ from those of the ventral-lateral mesendoderm. For instance, the gsc gene is exclusively expressed in the dorsal mesendoderm and is absent in the ventral-lateral mesendoderm. Given that gsc is a critical master gene, its overexpression in the ventral side can induce a complete secondary body axis. Similarly, ripply1, identified in our study, is also expressed early and specifically in the dorsal mesendoderm. Overexpression of ripply1 in the ventral side similarly induces a secondary body axis, albeit with the absence of the forebrain[5]. In this study, we found that gsc and ripply1 as the repressor, collectively inhibited dorsal (anterior) endoderm specified from PP progenitors.

      In summary, our study focuses on the regulatory mechanisms of fate segregation in the dorsal (anterior) mesendoderm, which differs from the mechanisms of ventral-lateral mesendoderm lineage segregation reported by Economou et al. We believe that this distinction represents a key novelty of our work.

      (2) As noted in the manuscript, Warga and Nusslein-Volhard determined long ago that PP and anterior endoderm share a common precursor. It is surprising that this close relationship is not apparent from the lineage trees in whole embryos but is apparent in lineage trees from explants. The authors speculate that the resolution of the whole embryo dataset is insufficient to detect this branch point and propose explants as the solution, but it is not clear why the explant dataset is higher resolution and/or more appropriate to address this question.

      We sincerely thank the reviewer for their thoughtful comments. As we mentioned previously, our investigation of fate differentiation between the PP and the anterior endoderm initially involved the analysis of zebrafish embryonic data. However, when we used URD to reconstruct the transcriptional trajectory tree, we observed that these two cell trajectories were located far apart. Previous elegant studies, as the reviewer mentioned, have shown that the anterior endoderm and the PP are not only spatially adjacent but also both originate from mesendodermal progenitor cells and share transcriptional similarities[2,3,8]. Consequently, when tracing all progenitor cells of these two trajectories using the URD algorithm, other cell types—such as ventral mesendodermal cells—are easily included. Based on this, we believe that directly using embryonic data to elucidate the mechanism of fate differentiation between the PP and the anterior endoderm may lack sufficient precision.

      In contrast, our group’s previous study published in Cell Reports demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including the anterior endoderm, PP, and notochord[5]. Therefore, we consider the Nodal explant system as an ideal model for studying the mechanism underlying fate differentiation between the PP and the anterior endoderm. Through comprehensive analyses of both in vivo embryonic and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitor cells—a conclusion further supported by live imaging experiments.

      (3) Much of the analysis of DEGs between the lineages of interest is focused on GO term enrichment. But this logic is circular. The endoderm lineage is defined as such because it expresses endoderm-enriched genes, therefore the finding that the endoderm lineage is enriched for endoderm-related GO terms adds no new insights.

      We thank the reviewer for these comments. As the reviewers suggested, in the revised manuscript, we indicated specific genes associated with key GO terms (Please see Figure 4B). Additionally, we have streamlined the content related to the GO analysis as suggested.

      (4) The authors describe the experiment in Figure S4 as key evidence that Gsc+ cells can give rise to endoderm, but no controls are presented. Only a few cells are shown that express mCherry upon injection of sox17:cre constructs. Is mCherry also expressed in the occasional cell injected with Gsc:lox-stop-lox-mCherry in the absence of cre? Although they report 3 independent replicates, it appears that only 2 individual embryos express mCherry. This very small number is not convincing, especially in the absence of appropriate controls.

      We thank the reviewer for raising this question. Following the reviewer's suggestion, we injected gsc:loxp-stop-loxp-mCherry into zebrafish embryos at the 1-cell stage as a control. After performing at least three independent replicates and analyzing no fewer than 100 embryos, we did not observe any mCherry-positive cells. Additionally, we co-injected gsc:loxp-stop-loxp-mCherry with sox17:cre and increased the sample size. Furthermore, we constructed plasmids of sox17:loxp-stop-loxp-mCherry and gsc:cre, and upon injection at the 1-cell stage, we observed RFP-positive cells at 8 hpf (Please see Author response image 1 and Figure S4E). Together with our live imaging data, these experiments collectively demonstrate that anterior endodermal cells can originate from PP progenitors.

      (5) The authors spend a lot of effort demonstrating that PP and anterior endoderm are Nodal dependent. First, these data (especially Figures 3E and 3I) are not very convincing, as the differences shown are very small or not apparent. Second, this is already well-known and adds nothing to our understanding of mesoderm-endoderm segregation.

      We sincerely thank the reviewer for their insightful questions. First, the reviewer mentioned that in the initial version of our manuscript, the effects of ndr1 knockdown and lefty1 knockout on Nodal signaling and cell fate—particularly prechordal plate (PP) and anterior endoderm (endo)—in Nodal-induced explants were not very pronounced. We recognize that the negative feedback mechanism between Nodal and Lefty signaling may explain why Nodal acts as a morphogen, regulating pattern formation through a Turing-like model[9]. Therefore, knocking down a Nodal ligand gene, such as ndr1 in this study, or knocking out a Nodal inhibitor, such as lft1, may only have a subtle impact on Nodal signaling[10].

      Accordingly, in this study, we performed extensive pSmad2 immunofluorescence analysis and observed that although the overall intensity of Nodal activity did not change dramatically, there was a statistically significant difference. Importantly, this subtle variation in Nodal signaling strength is precisely what we intended to capture, since PP and anterior endoderm are highly sensitive to Nodal signaling[11], and even minor differences may bias their fate segregation.

      This leads directly to the reviewer’s second concern. While numerous studies suggest that the strength of Nodal signaling influences mesendodermal fate—with high Nodal promoting endoderm and lower concentrations inducing mesoderm—most of these studies focus on ventral-lateral mesendoderm development[4,6,10]. In contrast, the mechanisms underlying dorsal mesendoderm fate specification differ, which is a key innovation of our study.

      Previous work by Bernard Thisse and colleagues demonstrated that even a slight reduction in Nodal signaling, achieved by overexpressing a Nodal inhibitor, is sufficient to cause defects in the specification of PP and endoderm[11]. This indicates that PP and endoderm require the highest levels of Nodal signaling for proper specification. Moreover, the most dorsal mesendoderm, PP and anterior endoderm are not only spatially adjacent but also share similar transcriptional states, making the regulation of their fate separation particularly challenging to study.

      The Dr. C.P. lab made important contributions to this issue, showing that the duration of Nodal exposure is critical for segregating PP and anterior endoderm fates: prolonged Nodal signaling promotes expression of the transcriptional repressor Gsc, which directly suppresses the key endodermal transcription factor Sox17, thereby inhibiting anterior endoderm specification[3]. They also found that tight junctions among PP cells facilitate Nodal signal propagation[8]. However, their studies revealed that Gsc mutants do not exhibit endodermal phenotypes, suggesting that additional factors or mechanisms regulate PP versus anterior endoderm fate separation[3].

      In our study, we first observed that subtle differences in Nodal concentration may bias the fate choice between PP and anterior endoderm. Given that ndr1 knockdown and lft1 knockout mildly reduce or enhance Nodal signaling, respectively, we reasoned that using these two perturbations in a Nodal-induced explant system combined with single-cell RNA sequencing could generate transcriptomic profiles under slightly reduced and enhanced Nodal signaling. This approach may help identify key decision points and transcriptional differences during PP and anterior endoderm segregation, ultimately uncovering the molecular mechanisms downstream of Nodal that govern their fate separation.

      (6) The authors claim that scrap expression differs between the 2 lineages of interest, but this is not apparent from Figure 4J-K. Experiments testing the role of SWI/SNF and scrap also require additional controls. Can scrap MO phenotypes be rescued by scrap RNA? Is there validation that SWI/SNF components are degraded upon treatment with AU15330?

      We are very grateful for the reviewers' questions. Using single-cell data from zebrafish embryos and Nodal explants, we compared the expression of srcap in the PP and anterior Endo cell populations. We found that srcap expression showed a slight increase in PP compared to anterior Endo, but the difference was not statistically significant (Please see Figure 4J and S7H). Therefore, we modified our description in the revised manuscript. However, we speculate that this slight difference might influence the distinct cell fate specification between PP and anterior endo. In the original version of the manuscript, we reported that either treatment with AU15330, an inhibitor of the SWI/SNF complex, or injection of morpholino targeting srcap—a key component of the SWI/SNF complex—enhanced anterior endo fate while reducing PP cell specification. During this round of revision, we initially attempted to follow the reviewer’s suggestion to co-inject srcap mRNA along with srcap morpholino to rescue the phenotype. However, we found that the length of srcap mRNA exceeds 10,000 bp, and despite multiple attempts, we were unable to successfully obtain the srcap mRNA. Therefore, we were unable to perform the rescue experiment and instead adopted an alternative approach to validate the function of srcap. We aimed to use anthor knockdown approach (CRISPR/Cas system) to determine whether a phenotype similar to that observed with morpholino knockdown could be achieved. Using the CRISPR/Cas13 system, we designed gRNA targeting srcap, knocked down srcap, and examined the cell specification of PP and anterior endo. We found that, consistent with our previous results, knocking down srcap obviously reduced PP cell fate while increasing anterior endo cell fate (Author response image 3). Additionally, the reviewer raised the question of whether the SWI/SNF complex is degraded after AU15330 treatment. Following the reviewer’s suggestion, we attempted to perform Western blot analysis on BRG1, one of the components of the SWI/SNF complex. However, despite multiple attempts, we were unable to achieve successful detection of the BRG1 protein by the antibody in zebrafish. Several studies have reported that knockdown or knockout of brg1 leads to defects in neural crest cell specification in zebrafish[12,13]. Therefore, alternatively, we treated zebrafish embryos at the one-cell stage with 0 μM (DMSO), 1 μM, and 5 μM AU15330, and examined the expression of sox10 and pigment development around 48 h. We found that treatment with 1 μM AU15330 reduced sox10 expression and pigment production, though not significantly, whereas treatment with 5 μM AU15330 significantly disrupted neural crest cell development. Thus, this experiment demonstrates that AU15330 is functional in zebrafish. (Author response image 3).

      Author response image 3.

      (A) Characterization of anterior endoderm and PP cells following CRISPR-Cas13d-mediated srcap knockdown. (B) Validation of srcap mRNA expression by RT‑qPCR following CRISPR‑Cas13d knockdown. (C) RT‑qPCR shows the expression of sox10 after treatment with increasing concentrations of AU15300. (D) Morphology of zebrafish embryos at 48 hpf after treatment with increasing concentrations of AU15300.

      (7) The authors conclude from their chromatin accessibility analysis that variations in Nodal signaling are responsible for expression levels of PP and endoderm genes, but they do not consider the alternative explanation that FGF signaling is playing this role. Such a function for FGF was established by Caroline Hill's lab, and the authors also show in Figure S5G that FGF signaling in enriched between these cell populations.

      Thank you very much for raising this issue. As the reviewer pointed out, Caroline Hill's lab has conducted elegant work demonstrating that FGF signaling plays a crucial role in the separation of ventral-lateral mesendoderm cell fates[4,6]. In contrast, our study primarily focuses on studying the mechanisms underlying the separation of dorsal mesendoderm cell fates. However, our research also reveals that FGF signaling significantly regulates the fate separation of the dorsal mesendoderm, as inhibiting FGF signaling suppresses PP cell specification while promoting anterior Endo fate. In our previously published work, we found that Nodal signaling can directly activate the expression of FGF ligand genes[5]. Therefore, we hypothesize that Nodal signaling, acting as a master regulator, activates various downstream target genes—including FGF—and how FGF signaling regulates the cell fate separation of the dorsal mesendoderm warrants further investigation in our further studies.

      (8) When interpreting the results of their Ripply cut-and-run experiment, the authors again rely heavily on GO term analysis and claim that this supports a role for Ripply as a transcriptional repressor. GO term enrichment does not equal functional analysis. It would be more convincing to intersect DEGs between WT and ripply-/- embryos with Ripply-enriched loci.

      Thanks for raising this important issue and the constructive suggestion. In response to the reviewer's valid concern regarding the GO term analyses from our CUT&Tag data, we implemented a more stringent filtering strategy. We identified peaks enriched in the treatment group and applied differential analysis, selecting genes with a log<sub>2</sub>FoldChange > 3, padj < 0.05, and baseMean > 30 as high-confidence Ripply1 binding targets. A GO enrichment analysis of these genes revealed significant terms related to muscle development, consistent with Ripply1's established role in somite development, thereby validating our approach. We supplemented the related gene list in the revised manuscript. Moreover, within this refined analysis, we found that sox32 met our binding threshold, while sox17 did not. Furthermore, as suggested, we examined mespbb—a known Ripply1-repressed gene—which was present, and gsc, a Nodal target used as a negative control, which was absent. This confirms the specificity of our analysis (Figure 6 and Figure S11). Consequently, our revised analyses support a model in which Ripply1 directly binds the sox32 promoter. Given that Sox32 is a known upstream regulator of sox17, this binding provides a plausible direct mechanism for the observed regulation of sox17 expression. We have updated the figures and text accordingly. We attempted to generate ripply1<sup>-/-</sup> mutants but found that homozygous loss results in embryonic lethality.

      (9) The way N's are reported is unconventional. N= number of embryos used in the experiment, n= number of embryos imaged. If an embryo was not imaged or analyzed in any way, it cannot be considered among the embryos in an experiment. If only 4 embryos were imaged, the N for that experiment is 4 regardless of how many embryos were stained. Authors should also report not only the number of embryos examined but also the number of independent trials performed for all experiments.

      Thank you very much for the reviewer's suggestion. As suggested, we have revised the description regarding the number of embryos and experimental replicates in the figure legends.

      (10) The authors should avoid the use of red-green color schemes in figures to ensure accessibility for color-blind readers.

      Thanks for the suggestions. We have updated the figures in our revised manuscript and adjusted the color schemes to avoid red-green combinations.

      Reviewer #3 (Public Review):

      Summary:

      Cheng, Liu, Dong, et al. demonstrate that anterior endoderm cells can arise from prechordal plate progenitors, which is suggested by pseudo time reanalysis of published scRNAseq data, pseudo time analysis of new scRNAseq data generated from Nodal-stimulated explants, live imaging from sox17:DsRed and Gsc:eGFP transgenics, fluorescent in situ hybridization, and a Cre/Lox system. Early fate mapping studies already suggested that progenitors at the dorsal margin give rise to both of these cell types (Warga) and live imaging from the Heisenberg lab (Sako 2016, Barone 2017) also pretty convincingly showed this. However, the data presented for this point are very nice, and the additional experiments in this manuscript, however, further cement this result. Though better demonstrated by previous work (Alexander 1999, Gritsman 1999, Gritsman 2000, Sako 2016, Rogers 2017, others), the manuscript suggests that high Nodal signaling is required for both cell types, and shows preliminary data that suggests that FGF signaling may also be important in their segregation. The manuscript also presents new single-cell RNAseq data from Nodal-stimulated explants with increased (lft1 KO) or decreased (ndr1 KD) Nodal signaling and multi-omic ATAC+scRNAseq data from wild-type 6 hpf embryos but draws relatively few conclusions from these data. Lastly, the manuscript presents data that SWI/SNF remodelers and Ripply1 may be involved in the anterior endoderm - prechordal plate decision, but these data are less convincing. The SWI/SNF remodeler experiments are unconvincing because the demonstration that these factors are differentially expressed or active between the two cell types is weak. The Ripply1 gain-of-function experiments are unconvincing because they are based on incredibly high overexpression of ripply1 (500 pg or 1000 pg) that generates a phenotype that is not in line with previously demonstrated overexpression studies (with phenotypes from 10-20x lower expression). Similarly, the cut-and-tag data seems low quality and like it doesn't support direct binding of ripply1 to these loci.

      In the end, this study provides new details that are likely important in the cell fate decision between the prechordal plate and anterior endoderm; however, it is unclear how Nodal signaling, FGF signaling, and elements of the gene regulatory network (including Gsc, possibly ripply1, and other factors) interact to make the decision. I suggest that this manuscript is of most interest to Nodal signaling or zebrafish germ layer patterning afficionados. While it provides new datasets and observations, it does not weave these into a convincing story to provide a major advance in our understanding of the specification of these cell types.

      We sincerely thank the reviewer for their thorough and thoughtful assessment of our work. The reviewer acknowledged several strengths of our study, such as the use of multiple technical approaches to demonstrate that anterior endoderm differentiates from PP progenitor cells, and recognized the value of the newly added single-cell omics data. The reviewer also raised some concerns regarding the initial version of our work, including the SWI/SNF remodeler experiments and the Ripply1 gain-of-function experiment. In the revised manuscript, we have supplemented these parts with additional control experiments to better support our conclusions. We hope that our updated manuscript adequately addresses the points raised by the reviewer.

      Major issues:

      (1) UMAPs: There are several instances in the manuscript where UMAPs are used incorrectly as support for statements about how transcriptionally similar two populations are. UMAP is a stochastic, non-linear projection for visualization - distances in UMAP cannot be used to determine how transcriptionally similar or dissimilar two groups are. In order to make conclusions about how transcriptionally similar two populations are requires performing calculations either in the gene expression space, or in a linear dimensional reduction space (e.g. PCA, keeping in mind that this will only consider the subset of genes used as input into the PCA). Please correct or remove these instances, which include (but are not limited to):

      p.4 107-110

      p.4 112

      p.8 207-208

      p.10 273-275

      We would like to thank the reviewer for raising this question. The descriptions of UMAP have been revised throughout the manuscript in accordance with the reviewer's suggestion (Please see the main text in the revised manuscript).

      (2) Nodal and lefty manipulations: The section "Nodal-Lefty regulatory loop is needed for PP and anterior Endo fate specification" and Figure 3 do not draw any significant conclusions. This section presents a LIANA analysis to determine the signals that might be important between prechordal plate and endoderm, but despite the fact that it suggests that BMP, Nodal, FGF, and Wnt signaling might be important, the manuscript just concludes that Nodal signaling is important. Perhaps this is because the conclusion that Nodal signaling is required for the specification of these cell types has been demonstrated in zebrafish in several other studies with more convincing experiments (Alexander 1999, Gritsman 1999, Gritsman 2000, Rogers 2017, Sako 2016). While FGF has recently been demonstrated to be a key player in the stochastic decision to adopt endodermal fate in lateral endoderm (Economou 2022), the idea that FGF signaling may be a key player in the differentiation of these two cell types has strangely been relegated to the discussion and supplement. Lastly, the manuscript does not make clear the advantage of performing experiments to explore the PP-Endo decision in Nodal-stimulated explants compared to data from intact embryos. What would be learned from this and not from an embryo? Since Nodal signaling stimulates the expression of Wnts and FGFs, these data do not test Nodal signaling independent of the other pathways. It is unclear why this artificial system that has some disadvantages is used since the manuscript does not make clear any advantages that it might have had.

      We sincerely thank the reviewers for their valuable comments. As mentioned in our manuscript, although a substantial number of studies have reported on the mechanisms governing the segregation of mesendoderm fate in zebrafish embryos—including the Dr. Hill laboratory’s work cited by the reviewers, which demonstrated the involvement of FGF signaling in the ventral mesendoderm fate specification—research on the regulatory mechanisms underlying anterior mesendoderm differentiation remains relatively limited. This is largely due to the challenges posed by the close physical proximity and similar transcriptional states of anterior mesendoderm cells, as well as their shared dependence on high levels of Nodal signaling for specification.

      Several studies from the Dr. C.P. Heisenberg’s laboratory have attempted to elucidate the fate segregation between anterior mesendoderm cells, namely the prechordal plate (PP) and anterior endoderm (endo) cells. They found that PP cells are tightly connected, facilitating the propagation of Nodal signaling[8]. Prolonged exposure to Nodal activates the expression of Gsc, which acts as a transcriptional repressor to inhibit sox17 expression, thereby suppressing endodermal fate[3]. However, they also noted that Gsc mutants do not exhibit endoderm developmental defects, suggesting the involvement of additional factors in this process.

      The reviewer inquired about our rationale for using the Nodal-injected explant system. In our investigation of the fate separation between the PP and the anterior endo, we initially analyzed zebrafish embryonic data. Using URD to reconstruct the transcriptional lineage tree, we found that these two cell types were positioned distantly from each other. However, existing literature indicates that the anterior endoderm and PP are not only spatially adjacent but also derive from common mesendodermal progenitors and exhibit transcriptional similarities[2,8]. As the reviewer noted, when tracing all progenitor cells of these two lineages using URD, it is easy to inadvertently include other cell types—such as ventral epiblast cells—which may compromise the accuracy of the analysis. We therefore concluded that directly using embryonic data to dissect the mechanism of fate separation between PP and anterior endoderm might not yield highly precise results.

      By contrast, our group’s earlier study published in Cell Reports demonstrated that the Nodal-induced explant system specifically enriches dorsal mesendodermal cells, including anterior endo, PP, and notochord[5]. This makes the Nodal explant system a highly suitable model for studying the fate separation between PP and anterior endo. Ultimately, by analysing in vivo embryonic data and Nodal explant data, we consistently found that the anterior endoderm likely originates from PP progenitors—a conclusion further supported by live imaging experiments.

      As we answered above, we first used the analyses of single-cell RNA sequencing and live imaging to demonstrate that anterior endoderm can originate from PP progenitor cells. Understanding the mechanism underlying the fate segregation between these two cell populations became a key focus of our research. We began by applying cell communication analysis to our single-cell data to identify signaling pathways that may be involved. This analysis specifically highlighted the Nodal-Lefty signaling pathway. Since Lefty acts as an inhibitor of Nodal signaling, we hypothesized that differences in Nodal signaling strength might regulate the fate of these two cell populations. By overexpressing different concentrations of Nodal mRNA and examining the fates of PP and anterior Endo cells, we confirmed this hypothesis.

      Thus, we propose that even subtle differences in Nodal signaling levels may influence anterior mesendoderm fate decisions. To test this, we generated systems with slightly reduced Nodal signaling (via ndr1 knockdown) and slightly elevated Nodal signaling (via lft1 knockout). Using these models, we precisely captured the critical stage of fate segregation between PP and anterior endo cells and identified a novel transcriptional repressor, Ripply1, which works in concert with Gsc to suppress anterior endoderm differentiation.

      (3) ripply1 mRNA injection phenotype inconsistent with previous literature: The phenotype presented in this manuscript from overexpressing ripply1 mRNA (Fig S11) is inconsistent with previous observations. This study shows a much more dramatic phenotype, suggesting that the overexpression may be to a non-physiological level that makes it difficult to interpret the gain-of-function experiments. For instance, Kawamura et al 2005 perform this experiment but do not trigger loss of head and eye structures or loss of tail structures. Similarly, Kawamura et al 2008 repeat the experiment, triggering a mildly more dramatic shortening of the tail and complete removal of the notochord, but again no disturbance of head structures as displayed here. These previous studies injected 25 - 100 pg of ripply1 mRNA with dramatic phenotypes, whereas this study uses 500 - 1000 pg. The phenotype is so much more dramatic than previously presented that it suggests that the level of ripply1 overexpression is sufficiently high that it may no longer be regulating only its endogenous targets, making the results drawn from ripply1 overexpression difficult to trust.

      We sincerely thank the reviewer for raising this question. First, we apologize for not providing a detailed description of the amount of HA-ripply1 mRNA injected in our previous manuscript. We injected 500 pg of HA-ripply1 mRNA at the 1-cell stage and allowed the embryos to develop until 6 hpf for the CUT&Tag experiment. In the supplementary materials, we included a bright-field image of an 18 hpf-embryo injected with HA-ripply1 mRNA, which morphologically exhibited severe developmental abnormalities. The reviewer pointed out that the amount of ripply1 mRNA we injected might be excessive, potentially leading to non-specific gain-of-function effects. The injection dose of 500 pg was determined based on conclusions from our previous study. In that study, injecting 24 pg of ripply1 mRNA into one cell of zebrafish embryos at the 16–32 cell stage was sufficient to induce a secondary axis lacking the forebrain[5]. From this, we estimated that an injection concentration of approximately 500–1000 pg would be appropriate at the 1-cell stage, so that after several rounds of cell division, each cell gained 20-30 pg mRNA at 32 cell stage. Additionally, we conducted supplementary experiments injecting 100 pg, 250 pg, and 500 pg of ripply1 mRNA, and observed 500 pg of ripply1 mRNA led to a dramatic suppression of endoderm formation (Author response image 4).

      Finally, our study focuses on the mechanism of cell fate segregation in the anterior mesendoderm, primarily during gastrulation. The embryos injected with ripply1 mRNA underwent normal gastrulation, and our CUT&Tag experiment was performed at 6 hpf. Therefore, we believe that the amount of ripply1 mRNA injected in this study is appropriate for addressing our research question.

      Author response image 4.

      Different concentrations of ripply1 mRNA were injected into zebrafish embryos at the one-cell stage, with RFP fluorescence labeling sox17-positive cells.

      (4) Ripply1 binding to sox17 and sox32 regulatory regions not convincing: The Cut and Tag data presented in Fig 6J-K does not seem to be high quality and does not seem to provide strong support that Ripply 1 binds to the regulatory regions of these genes. The signal-to-noise ratio is very poor, and the 'binding' near sox17 that is identified seems to be even coverage over a 14 kb region, which is not consistent with site-specific recruitment of this factor, and the 'peaks' highlighted with yellow boxes do not appear to be peaks at all. To me, it seems this probably represents either: (1) overtagmentation of these samples or (2) an overexpression artifact from injection of too high concentration of ripply1-HA mRNA. In general, Cut and Tag is only recommended for histone modifications, and Cut and Run would be recommended for transcriptional regulators like these (see Epicypher's literature). Given this and the previous point about Ripply1 overexpression, I am not convinced that Ripply1 regulates endodermal genes. The existing data could be made somewhat more convincing by showing the tracks for other genes as positive and negative controls, given that Ripply1 has known muscle targets (how does its binding look at those targets in comparison) and there should be a number of Nodal target genes that Ripply1 does not bind to that could be used as negative controls. Overall this experiment doesn't seem to be of high enough quality to drive the conclusion that Ripply1 directly binds near sox17 and sox32 and from the data presented in the manuscript looks as if it failed technically.

      We sincerely thank the reviewer for raising this question. We apologize that the binding regions of sox17 marked in our previous analysis were incorrect, and we have made the corresponding revisions in the latest version of the manuscript.

      The reviewer noted that our CUT&Tag data contain considerable noise. To address this, we further refined our data processing: we annotated all peaks enriched in the treatment group and performed differential analysis, selecting genes with log<sub>2</sub>FoldChange > 3, padj < 0.5, and baseMean > 30 as candidate targets of Ripply1 binding. Subsequent GO enrichment analysis of these genes revealed significant enrichment of muscle development-related GO terms, which is consistent with previously reported roles of Ripply1 in regulating somite development. Therefore, we believe our filtering method effectively removes a large number of noise peaks and their associated genes.

      Under these screening criteria, we found that sox32 meets the threshold, while sox17 does not. In addition, following the reviewer’s suggestion, we examined mespbb—a known gene repressed by Ripply1—and gsc, a Nodal target gene, as a negative control.

      Based on these new analyses, we have revised our figures and text accordingly. Our data now support the possibility that Ripply1 may directly bind to the promoter region of sox32. Since sox32 acts as a direct upstream regulator of sox17, this binding could influence sox17 expression (Figure 6 and Figure S11).

      Finally, we would like to note that studies have reported Ripply1 as a transcriptional repressor, which may function by recruiting other co-factors, such as Groucho, to form a complex[14,15]. This might explain why our CUT&Tag data detected Ripply1 binding to a broad set of genes.

      (5) "Cooperatively Gsc and ripply1 regulate": I suggest avoiding the term "cooperative," when describing the relationship between Ripply1 and Gsc regulation of PP and anterior endoderm - it evokes the concept of cooperative gene regulation, which implies that these factors interact with each biochemically in order to bind to the DNA. This is not supported by the data in this manuscript, and is especially confusing since Ripply1 is thought to require cooperative binding with a T-box family transcription factor to direct its binding to the DNA.

      We sincerely thank the reviewer for raising this important issue. The reviewer pointed out that the term "Cooperatively" may not be entirely appropriate in the context of our study. In accordance with the reviewer's suggestion, we have replaced "Cooperatively" with "Collectively" in the relevant sections.

      (6) SWI/SNF: The differential expression of srcap doesn't seem very remarkable. The dot plots in the supplement S7H don't help - they seem to show no expression at all in the endoderm, which is clearly a distortion of the data, since from the violin plots it's obviously expressed and the dot-size scale only ranges from ~30-38%. Please add to the figure information about fold-change and p-value for the differential expression. Publicly available scRNAseq databases show scrap is expressed throughout the entire early embryo, suggesting that it would be surprising for it to have differential activity in these two cell types and thereby contribute to their separate specification during development. It seems equally possible that this just mildly influences the level of Nodal or FGF signaling, which would create this effect.

      Thank the Reviewer for this question. As suggested, we performed Wilcoxon tests to compare srcap expression between PP and Endo populations. The analysis shows that while srcap expression is moderately elevated in PP compared to in Endo, this difference is not statistically significant. The corresponding p-value and fold change have now been included in the revised figure (Please see Figure 4J and S7H). Although the transcriptional level of srcap shows no significant difference between PP and anterior endoderm, our subsequent experiments—using AU15330 (an inhibitor of the SWI/SNF complex) and injecting morpholino targeting srcap, a key component of the SWI/SNF complex—demonstrated that its inhibition indeed promotes anterior endoderm fate while reducing PP cell specification. Therefore, we propose that subtle differences in the SWI/SNF complex may regulate the fate specification of PP and anterior endoderm through two mechanisms. First, as mentioned in our study, these chromatin remodelers modulate the expression of master regulators such as Gsc and Ripply1, thereby influencing cell fate decisions. Second, as noted by the reviewer, these chromatin remodelers may affect the interpretation of Nodal signaling, ultimately contributing to the divergence between PP and anterior endoderm fates.

      The multiome data seems like a valuable data set for researchers interested in this stage of zebrafish development. However, the presentation of the data doesn't make many conclusions, aside from identifying an element adjacent to ripply1 whose chromatin is open in prechordal plate cells and not endodermal cells and showing that there are a number of loci with differential accessibility between these cell types. That seems fairly expected since both cell types have several differentially expressed transcriptional regulators (for instance, ripply1 has previously been demonstrated in multiple studies to be specific to the prechordal plate during blastula stages). The manuscript implies that SWI/SNF remodeling by Srcap is responsible for the chromatin accessibility differences between these cell types, but that has not actually been tested. It seems more likely that the differences in chromatin accessibility observed are a result of transcription factors binding downstream of Nodal signaling.

      We thank the reviewer for recognizing the value of our newly generated data. Through integrative analysis of single-cell data from wild-type, ndr1 kd, and lft1 ko groups of Nodal-injected explants at 6 hours post-fertilization (hpf), we identified a critical branching point in the fate segregation of the prechordal plate (PP) and anterior endoderm (Endo), where chromatin remodelers may play a significant role. Based on this finding, we performed single-cell RNA and ATAC sequencing on zebrafish embryos at 6 hpf. Analysis of this multi-omics dataset revealed that transcriptional repressors such as Gsc, Ripply1, and Osr1 exhibit differences in both transcriptional and chromatin accessibility levels between the PP and anterior Endo. Subsequent overexpression and loss-of-function experiments further demonstrated that Gsc and Ripply1 collaboratively suppress endodermal gene expression, thereby inhibiting endodermal cell fate. Previous studies have reported that for the activation of certain Nodal downstream target genes, the pSMAD2 protein of the Nodal signaling pathway recruits chromatin remodelers to facilitate chromatin opening and promote further transcription of target genes[16]. Therefore, our data provide chromatin accessibility profiles for Gsc and Ripply1, offering a valuable resource for future investigations into their pSMAD2 binding sites.

      Minor issues:

      Figure 2 E-F: It's not clear which cells from E are quantitated in F. For instance, the dorsal forerunner cells are likely to behave very differently from other endodermal progenitors in this assay. It would be helpful to indicate which cells are analyzed in Fig F with an outline or other indicator of some kind. Or - if both DFCs and endodermal cells are included in F, to perhaps use different colors for their points to help indicate if their fluorescence changes differently.

      Thank you for the reviewer's suggestion. In the revised version of the figure, we have outlined the regions of the analyzed cells.

      Fig 3 J: Should the reference be Dubrulle et al 2015, rather than Julien et al?

      Thanks, we have corrected.

      References:

      Alexander, J. & Stainier, D. Y. A molecular pathway leading to endoderm formation in zebrafish. Current biology : CB 9, 1147-1157 (1999).

      Barone, V. et al. An Effective Feedback Loop between Cell-Cell Contact Duration and Morphogen Signaling Determines Cell Fate. Dev. Cell 43, 198-211.e12 (2017).

      Economou, A. D., Guglielmi, L., East, P. & Hill, C. S. Nodal signaling establishes a competency window for stochastic cell fate switching. Dev. Cell 57, 2604-2622.e5 (2022).

      Gritsman, K. et al. The EGF-CFC protein one-eyed pinhead is essential for nodal signaling. Cell 97, 121-132 (1999).

      Gritsman, K., Talbot, W. S. & Schier, A. F. Nodal signaling patterns the organizer. Development (Cambridge, England) 127, 921-932 (2000).

      Kawamura, A. et al. Groucho-associated transcriptional repressor ripply1 is required for proper transition from the presomitic mesoderm to somites. Developmental cell 9, 735-744 (2005).

      Kawamura, A., Koshida, S. & Takada, S. Activator-to-repressor conversion of T-box transcription factors by the Ripply family of Groucho/TLE-associated mediators. Molecular and cellular biology 28, 3236-3244 (2008).

      Sako, K. et al. Optogenetic Control of Nodal Signaling Reveals a Temporal Pattern of Nodal Signaling Regulating Cell Fate Specification during Gastrulation. Cell Rep. 16, 866-877 (2016).

      Rogers, K. W. et al. Nodal patterning without Lefty inhibitory feedback is functional but fragile. eLife 6, e28785 (2017).

      Warga, R. M. & Nüsslein-Volhard, C. Origin and development of the zebrafish endoderm. Development 126, 827-838 (1999).

      References:

      (1) Steinbeisser, H., and De Robertis, E.M. (1993). Xenopus goosecoid: a gene expressed in the prechordal plate that has dorsalizing activity. C R Acad Sci III 316, 959-971.

      (2) Warga, R.M., and Nusslein-Volhard, C. (1999). Origin and development of the zebrafish endoderm. Development (Cambridge, England) 126, 827-838. 10.1242/dev.126.4.827.

      (3) Sako, K., Pradhan, S.J., Barone, V., Inglés-Prieto, Á., Müller, P., Ruprecht, V., Čapek, D., Galande, S., Janovjak, H., and Heisenberg, C.P. (2016). Optogenetic Control of Nodal Signaling Reveals a Temporal Pattern of Nodal Signaling Regulating Cell Fate Specification during Gastrulation. Cell reports 16, 866-877. 10.1016/j.celrep.2016.06.036.

      (4) van Boxtel, A.L., Economou, A.D., Heliot, C., and Hill, C.S. (2018). Long-Range Signaling Activation and Local Inhibition Separate the Mesoderm and Endoderm Lineages. Developmental cell 44, 179-191.e175. 10.1016/j.devcel.2017.11.021.

      (5) Cheng, T., Xing, Y.Y., Liu, C., Li, Y.F., Huang, Y., Liu, X., Zhang, Y.J., Zhao, G.Q., Dong, Y., Fu, X.X., et al. (2023). Nodal coordinates the anterior-posterior patterning of germ layers and induces head formation in zebrafish explants. Cell reports 42, 112351. 10.1016/j.celrep.2023.112351.

      (6) Economou, A.D., Guglielmi, L., East, P., and Hill, C.S. (2022). Nodal signaling establishes a competency window for stochastic cell fate switching. Developmental cell 57, 2604-2622 e2605. 10.1016/j.devcel.2022.11.008.

      (7) Schier, A.F., and Talbot, W.S. (2005). Molecular genetics of axis formation in zebrafish. Annual review of genetics 39, 561-613. 10.1146/annurev.genet.37.110801.143752.

      (8) Barone, V., Lang, M., Krens, S.F.G., Pradhan, S.J., Shamipour, S., Sako, K., Sikora, M., Guet, C.C., and Heisenberg, C.P. (2017). An Effective Feedback Loop between Cell-Cell Contact Duration and Morphogen Signaling Determines Cell Fate. Developmental cell 43, 198-211.e112. 10.1016/j.devcel.2017.09.014.

      (9) Muller, P., Rogers, K.W., Jordan, B.M., Lee, J.S., Robson, D., Ramanathan, S., and Schier, A.F. (2012). Differential diffusivity of Nodal and Lefty underlies a reaction-diffusion patterning system. Science (New York, N.Y.) 336, 721-724. 10.1126/science.1221920.

      (10) Rogers, K.W., Lord, N.D., Gagnon, J.A., Pauli, A., Zimmerman, S., Aksel, D.C., Reyon, D., Tsai, S.Q., Joung, J.K., and Schier, A.F. (2017). Nodal patterning without Lefty inhibitory feedback is functional but fragile. eLife 6. 10.7554/eLife.28785.

      (11) Thisse, B., Wright, C.V., and Thisse, C. (2000). Activin- and Nodal-related factors control antero-posterior patterning of the zebrafish embryo. Nature 403, 425-428. 10.1038/35000200.

      (12) Eroglu, B., Wang, G., Tu, N., Sun, X., and Mivechi, N.F. (2006). Critical role of Brg1 member of the SWI/SNF chromatin remodeling complex during neurogenesis and neural crest induction in zebrafish. Developmental dynamics : an official publication of the American Association of Anatomists 235, 2722-2735. 10.1002/dvdy.20911.

      (13) Hensley, M.R., Emran, F., Bonilla, S., Zhang, L., Zhong, W., Grosu, P., Dowling, J.E., and Leung, Y.F. (2011). Cellular expression of Smarca4 (Brg1)-regulated genes in zebrafish retinas. BMC developmental biology 11, 45. 10.1186/1471-213X-11-45.

      (14) Kawamura, A., Koshida, S., Hijikata, H., Ohbayashi, A., Kondoh, H., and Takada, S. (2005). Groucho-associated transcriptional repressor ripply1 is required for proper transition from the presomitic mesoderm to somites. Developmental cell 9, 735-744. 10.1016/j.devcel.2005.09.021.

      (15) Kawamura, A., Koshida, S., and Takada, S. (2008). Activator-to-repressor conversion of T-box transcription factors by the Ripply family of Groucho/TLE-associated mediators. Mol Cell Biol 28, 3236-3244. 10.1128/MCB.01754-07.

      (16) Ross, S., Cheung, E., Petrakis, T.G., Howell, M., Kraus, W.L., and Hill, C.S. (2006). Smads orchestrate specific histone modifications and chromatin remodeling to activate transcription. EMBO J 25, 4490-4502. 10.1038/sj.emboj.7601332.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to elucidate the recruitment order and assembly of the Cdv proteins during Sulfolobus acidocaldarius archaeal cell division using a bottom-up reconstitution approach. They employed liposome-binding assays, EM, and fluorescence microscopy with in vitro reconstitution in dumbbellshaped liposomes to explore how CdvA, CdvB, and the homologues of ESCRT-III proteins (CdvB, CdvB1, and CdvB2) interact to form membrane remodeling complexes.

      The study sought to reconstitute the Cdv machinery by first analyzing their assembly as two subcomplexes: CdvA:CdvB and CdvB1:CdvB2ΔC. The authors report that CdvA binds lipid membranes only in the presence of CdvB and localizes preferentially to membrane necks. Similarly, the findings on CdvB1:CdvB2ΔC indicate that truncation of CdvB2 facilitates filament formation and enhances curvature sensitivity in interaction with CdvB1. Finally, while the authors reconstitute a quaternary CdvA:CdvB:CdvB1:CdvB2 complex and demonstrate its enrichment at membrane necks, the mechanistic details of how these complexes drive membrane remodeling by subcomplexes removal by the proteasome and/or CdvC remain speculative.

      Although the work highlights intriguing similarities with eukaryotic ESCRT-III systems and explores unique archaeal adaptations, the conclusions drawn would benefit from stronger experimental validation and a more comprehensive mechanistic framework.

      Strengths:

      The study of machinery assembly and its involvement in membrane remodeling, particularly using bottom-up reconstituted in vitro systems, presents significant challenges. This is particularly true for systems like the ESCRT-III complex, which localizes uniquely at the lumen of membrane necks prior to scission. The use of dumbbell-shaped liposomes in this study provides a promising experimental model to investigate ESCRT-III and ESCRT-III-like protein activity at membrane necks.

      The authors present intriguing evidence regarding the sequential recruitment of ESCRT-III proteins in crenarchaea-a close relative of eukaryotes. This finding suggests that the hierarchical recruitment characteristic of eukaryotic systems may predate eukaryogenesis, which is a significant and exciting contribution. However, the broader implications of these findings for membrane remodeling mechanisms remain speculative, and the study would benefit from stronger experimental validation and expanded contextualization within the field.

      We thank the Referee for his/her appreciation of our work.

      Weaknesses:

      This manuscript presents several methodological inconsistencies and lacks key controls to validate its claims. Additionally, there is insufficient information about the number of experimental repetitions, statistical analyses, and a broader discussion of the major findings in the context of open questions in the field.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #2 (Public review):

      Summary:

      The Crenarchaeal Cdv division system represents a reduced form of the universal and ubiquitous ESCRT membrane reverse-topology scission machinery, and therefore a prime candidate for synthetic and reconstitution studies. The work here represents a solid extension of previous work in the field, clarifying the order of recruitment of Cdv proteins to curved membranes.

      Strengths:

      The use of a recently developed approach to produce dumbbell-shaped liposomes (De Franceschi et al. 2022), which allowed the authors to assess recruitment of various Cdv assemblies to curved membranes or membrane necks; reconstitution of a quaternary Cdv complex at a membrane neck.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      The manuscript is a bit light on quantitative detail, across the various figures, and several key controls are missing (CdvA, B alone to better interpret the co-polymerisation phenotypes and establish the true order of recruitment, for example) - addressing this would make the paper much stronger. The authors could also include in the discussion a short paragraph on implications for our understanding of ESCRT function in other contexts and/or in archaeal evolution, as well as a brief exploration of the possible reasons for the discrepancy between the foci observed in their liposome assays and the large rings observed in cells - to better serve the interests of a broad audience.

      We have now added more controls, information about repetitions, and discussion.

      Reviewer #3 (Public review):

      Summary:

      In this report, De Franceschi et al. purify components of the Cdv machinery in archaeon M. sedula and probe their interactions with membrane and with one-another in vitro using two main assays - liposome flotation and fluorescent imaging of encapsulated proteins. This has the potential to add to the field by showing how the order of protein recruitment seen in cells is related to the differential capacity of individual proteins to bind membranes when alone or when combined.

      Strengths:

      Using the floatation assay, they demonstrate that CdvA and CdvB bind liposomes when combined. While CdvB1 also binds liposomes under these conditions, in the floatation assay, CdvB2 lacking its C-terminus is not efficiently recruited to membranes unless CdvAB or CdvB1 are present. The authors then employ a clever liposome assay that generates chained spherical liposomes connected by thin membrane necks, which allows them to accurately control the buffer composition inside and outside of the liposome. With this, they show that all four proteins accumulate in necks of dumbbell-shaped liposomes that mimic the shape of constricting necks in cell division. Taken altogether, these data lead them to propose that Cdv proteins are sequentially recruited to the membrane as has also been suggested by in vivo studies of ESCRT-III dependent cell division in crenarchaea.

      We thank the Referee for his/her appreciation of the work.

      Weaknesses:

      These experiments provide a good starting point for the in vitro study the interaction of Cdv system components with the membrane and their consecutive recruitment. However, several experimental controls are missing that complicate their ability to draw strong conclusions. Moreover, some results are inconsistent across the two main assays which make the findings difficult to interpret:

      (1) Missing controls.

      Various protein mixtures are assessed for their membrane-binding properties in different ways. However, it is difficult to interpret the effect of any specific protein combination, when the same experiment is not presented in a way that includes separate tests for all individual components. In this sense, the paper lacks important controls. For example, Fig 1C is missing the CdvB-only control. The authors remark that CdvB did not polymerise (data not shown) but do not comment on whether it binds membrane in their assays. In the introduction, Samson et al., 2011 is cited as a reference to show that CdvB does not bind membrane. However, here the authors are working with protein from a different organism in a different buffer, using a different membrane composition and a different assay. Given that so many variables are changing, it would be good to present how M. sedula CdvB behaves under these conditions.

      We thank the referee for raising this point. We have now added these data in Figure 1C. Indeed it turns out that CdvB from M. sedula exhibits clear membrane binding on its own in a flotation assay.

      Similarly, there is no data showing how CdvB alone or CdvA alone behave in the dumbbell liposome assay.

      Without these controls, it's impossible to say whether CdvA recruits CdvB or the other way around. The manuscript would be much stronger if such data could be added.

      We have now added these data in Figure 1E, 1F and 1G. Overall, we can confirm that CdvA binds the membrane better in the presence of CdvB (although both proteins can bind the membrane on their own). Both proteins appear to recognize the curved region of the membrane neck.

      (2) Some of the discrepancies in the data generated using different assays are not discussed.

      The authors show that CdvB2∆C binds membrane and localizes to membrane necks in the dumbbell liposome assay, but no membrane binding is detected in the flotation assay. The discrepancy between these results further highlights the need for CdvB-only and CdvA-only controls.

      We have now added these controls in Figure 1. In addition, we would like to clarify that the flotation assay and the SMS dumbbell assay serve different purposes and are not directly comparable in quantitative terms. In the flotation assay, all the protein present as input is eventually recovered and visualized. Thus, quantitative information on the proportion of the fraction of the total protein bound to lipids can be inferred from this assay. The SMS assay, in contrast, provides a very different kind of information. Because of the particular protocol required to generate dumbbells (De Franceschi, 2022), the total amount of protein in the inner buffer in dumbbells is not accurately defined, because protein that is not correctly reconstituted (e.g. which aggregates while still in the droplet phase) will interfere with vesicle generation, with the result that dumbbell with such aggregates is generally not formed in the first place. This renders it impossible to draw any quantitative conclusions about the proportion of the sample bound to lipids. The SMS is therefore not directly comparable to the flotation assay, and it is rather complementary to it. Indeed, the purpose of the SMS is to provide information about curvature selectivity of the protein.

      (3) Validation of the liposome assay.

      The experimental setup to create dumbbell-shaped liposomes seems great and is a clever novel approach pioneered by the team. Not only can the authors manipulate liposome shape, they also state that this allows them to accurately control the species present on the inside and outside of the liposome. Interpreting the results of the liposome assay, however, depends on the geometry being correct. To make this clearer, it would seem important to include controls to prove that all the protein imaged at membrane necks lie on the inside of liposomes. In the images in SFig3 there appears to be protein outside of the liposome. It would also be helpful to present data to show test whether the necks are open, as suggested in the paper, by using FRAP or some other related technique.

      We thank the Referee for his/her appreciation. The proteins are encapsulated inside the liposomes, not outside of them. While Figure S3 might give the appearance that there is some protein outside, this is actually just an imaging artifact. Author response image 1 (below) explains this: When the membrane and protein channel are shown separately, it is clear that the protein cluster that appeared to be ‘outside’ actually colocalizes with an extra small dumbbell lobe (yellow arrowhead). The protein appeared to be outside of it because (1) the protein fluorescent signal is stronger than the signal from the membrane, and (2) there is a certain time delay in the acquisition of the two channels (0.5-1 second), thus the membrane may have slightly shifted out of focus when the fluorescence was being acquired. We are confident that the protein is inside in these dumbbells because the procedure for preparing the dumbbells requires extensive emulsification by pipetting, which requires ≈ 1 minute. This time is more than sufficient for proteins with high affinity for the membrane, like ESCRT and Cdv, to bind the membrane. For an example of how fast binding under confinement can be, please see movie 2 from this paper: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.

      Moreover, in many instances, we observed that the protein is inside because, by increasing the gain in the images post-acquisition, a clear protein signal appear in the lumen (see Author response image 2).

      Author response image 1.

      Separate channels showing colocalization of protein and lipids (adapted from Figure S3). The zoom-in shows separate channels, highlighting that the CdvB2 cluster that seems to be ‘outside the dumbbell’ actually colocalizes with the small terminal lobe of the dumbbell, indicating that the protein is encapsulated within that lobe.

      Author response image 2.

      Residual protein present inside lumen of dumbbells as visualized by increasing the brightness post-acquisition.

      We are not sure what the referee means by “test whether the necks are open, as suggested in the paper”. We are confident that the lobes of dumbbells originated from a single floppy vesicle, and were therefore mutually connected with an open neck (at least at the onset of the experiment). We have performed extensive FRAP assays on dumbbells in previous papers (De Franceschi et al., ACS nano 2022 and De Franceschi et al., Nature Nanotech 2024) which unequivocally proved that these chains of dumbbells are connected with open necks. We now also performed a few FRAP assay with reconstituted Cdv proteins, which confirmed this point. We have added a movie of such an experiment to the manuscript (Movie 1).

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (4) Quantification of results from the liposome assay.

      The paper would be strengthened by the inclusion of more quantitative data relating to the liposome assay. Firstly, only a single field of view is shown for each condition. Because of this, the reader cannot know whether this is a representative image, or an outlier? Can the authors do some quantification of the data to demonstrate this? The line scan profiles in the supplemental figures would be an example of this, but again in these Figures only a single image is analyzed.

      The images that we showed are indeed representative. The dumbbells that are generated by the SMS approach contain an “internal control”: in each dumbbell, the protein has the option of localizing at the neck or localizing elsewhere in the region of flat membrane. We see consistently that Cdv proteins have a strong preference for localizing at the neck.

      We would recommend that the authors present quantitative data to show the extent of co-localization at the necks in each case. They also need a metric to report instances in which protein is not seen at the neck, e.g. CdvB2 but not CdvB1 in Fig2I, which rules out a simple curvature preference for CdvB2 as stated in line 182.

      While the request for better quantitation is reasonable, this would require carrying out very significant new experiments at the microscope, which is rendered near-impossible since both first authors left the lab on to new positions.

      Secondly, the authors state that they see CdvB2∆C recruited to the membrane by CdvB1 (lines 184-187, Fig 2I). However, this simple conclusion is not borne out in the data. Inspecting the CdvB2∆C panels of Fig 2I, Fig3C, and Fig3D, CdvB2∆C signal can be seen at positions which don't colocalize with other proteins. The authors also observe CdvB2∆C localizing to membrane necks by itself (Fig 2E). Therefore, while CdvB1 and CdvB2∆C colocalize in the flotation assay, there is no strong evidence for CdvB2∆C recruitment by CdvB1 in dumbbells. This is further underscored by the observation that in the presented data, all Cdv proteins always appear to localize at dumbbell necks, irrespective of what other components are present inside the liposome. Although one nice control is presented (ZipA), this suggests that more work is required to be sure that the proteins are behaving properly in this assay. For example, if membrane binding surfaces of Cdv proteins are mutated, does this lead to the accumulation of proteins in the bulk of the liposome as expected?

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have an affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then? We estimate that the simple answer is that, in this particular case, there are more clusters than there are necks, so some of the clusters must necessarily localize somewhere else.

      Author response image 3.

      Current Figure 2H, where clusters that are double-positive for both CdvB1 and CdvB2ΔC are indicated by yellow arrowheads, while cluster that apparently only contain CdvB2ΔC are indicated by red arrowheads. It is observed that all the double-positive clusters are localized at necks.

      (5) Rings.

      The authors should comment on why they never observe large Cdv rings in their experiments. In crenarchaeal cell division, CdvA and CdvB have been observed to form large rings in the middle of the 1 micron cell, before constriction. Only in the later stages of division are the ESCRTs localized to the constricting neck, at a time when CdvA is no longer present in the ring. Therefore, if the in vitro assay used by the authors really recapitulated the biology, one would expect to see large CdvAB rings in Figs 1EF. This is ignored in the model. In the proposed model of ring assembly (line 252), CdvAB ring formation is mentioned, but authors do not discuss the fact that they do not observe CdvAB rings - only foci at membrane necks. The discussion section would benefit from the authors commenting on this.

      The referee is correct: it is intriguing that we don’t see micron-sized rings for CdvA and CdvB. We do note that our EM data (Fig.S1) show that CdvA in its own can form rings of about 100-200nm diameter, well below the diffraction limit, that could well correspond to the foci that we optically resolve in Figure 1. We now added a brief comment on this to the manuscript on lines 256-264.

      (6) Stoichiometry

      It is not clear why 100% of the visible CdvA and 100% of the the visible CdvB are shifted to the lipid fraction in 1C. Perhaps this is a matter of quantification. Can the authors comment on the stoichiometry here?

      We agree that this was unclear. Since that particular gel was stained by coumassie, the quantitative signals might be unreliable, and hence we have repeated this experiment using fluorescently labelled proteins, which show indeed a less extreme distribution. This was also done to make the data more uniform, as requested by the referees.

      (7) Significance of quantification of MBP-tagged filaments.

      Authors use tagging and removal of MBP as a convenient, controllable system to trigger polymerisation of various Cdv proteins. However, it is unclear what is the value and significance of reporting the width and length of the short linear filaments that are formed by the MBP-tagged proteins. Presumably they are artefactual assemblies generated by the presence of the tag?

      Providing a measure of the changes induced by MBP removal, in fact, validates that this actually has an effect. But perhaps this places too much emphasis on the short filaments. We now opted for a compromise, removing the quantification of the width and length of short filaments formed by MBPtagged protein from the text, but keeping the supplementary figure showing their distribution as compared to the other filaments (Figure S2E, SF).

      Similar Figure 2C doesn't seem a useful addition to the paper.

      We removed panel 2C, and now merely report these values in the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would suggest the authors perform a deeper discussion about their findings, such as what are the evolutionary implications, how they think lipids from these archaea may affect the recruitment process,...

      Because there is no exact homology between Archaea Cdv proteins and Eukaryotic ESCRT-III proteins, we do not feel our work brings new evolutionary implications beyond what we already state in the manuscript. We also dis not perform experiments using Archaea lipids, thus we would rather not speculate on how they may potentially affect the recruitment of Cdv proteins.

      In general, the manuscript lacks information regarding some scale bars, number of experimental repetitions (n or N), statistical analysis when needed, information about protein concentrations used in their assays.

      We have now added this information in the manuscript.

      Below, I provide a list of comments that I think the authors should address to improve the manuscript:

      (1) Line 113-114: The authors test protein-membrane interactions using flotation assays with positively curved SUV membranes but encapsulate proteins in dumbbell-shaped liposomes with negative curvature at the connecting necks. Might the use of membranes with opposite curvatures affect the recruitment process? Since the proteins are fluorescently labeled, I suggest testing recruitment using flat giant unilamellar vesicles or supported lipid bilayers (with zero curvature) to validate their findings.

      We thank the referee for this suggestion. Please do note that we are not claiming in our paper that Cdv proteins recognize negative curvature. We merely observe that they localize at necks. The neck of a dumbbell exhibits the so-called “catenoid” geometry, which is characterized by having both positive and negative curvature.

      Experimentally, on the SUVs, we now realize there was a mistake in the method section: In the flotation assay we in fact used multilamellar vesicles, not SUVs, precisely for the reason mentioned by the referee. We apologize for the oversight and have now corrected this in the methods. Multilamellar vesicles are not characterized by a strong positive curvature as SUVs do, but we do agree that they likely don’t have negative curvature there either. Because of the heterogeneous nature of the multilamellar vesicles, they provide a binding assay that was rather independent of the curvature. Complementary to the flotation assay, the SMS approach was employed to reveal the curvature preference of proteins.

      Finally, we performed the experiment on large GUVs suggested by the referee using CdvB as an example, but this turned out to be inconclusive because the protein forms clusters: these clusters may be creating local curvature at the nanometer scale, which cannot be resolved by optical microscopy (Author response image 4). This is quite typical for proteins that recognize curvature (cf. for instance: De Franceschi N, Alqabandi M, Miguet N, Caillat C, Mangenot S, Weissenhorn W, Bassereau P. The ESCRT protein CHMP2B acts as a diffusion barrier on reconstituted membrane necks. J Cell Sci. 2018 Aug 3;132(4):jcs217968.)

      Author response image 4.

      Fluorescently labelled CdvB bound to giant unilamellar vesicle. The protein was added in the outer buffer. CdvB forms distinct clusters, which may generate a local region of high membrane curvature.

      (2) Line 138-139: How is His-ZipA binding the membrane? Wouldn't Ni<sup>2+</sup>-NTA lipids be required? If not, how is the binding achieved?

      Indeed, NTA-lipids were present. This is now stated both in the legend and in the methods.

      (3) In the encapsulated protein assays, why does the luminal fluorescence intensity of the encapsulated protein sometimes appear similar to the bulk fluorescence signal? Since only a small fraction of the protein assembles at membrane necks, shouldn't the luminal pool of unbound protein show higher fluorescence intensity inside the liposomes?

      We thank the referee for raising this point and giving us the opportunity to explain this. The reason is that Cdv proteins have a very high affinity for the neck, and when they cluster at the neck the fluorescence intensity of the cluster is many times higher than the background fluorescence. Because we were interested in imaging the clusters and avoiding overexposing them, we adjusted the imaging conditions accordingly, with the result that the fluorescence from both the lumen and the bulk is at very low level.

      By choosing different imaging conditions, however, it can be actually seen that the signal inside the lumen is clearly higher than the bulk: this can be seen for instance in Author response image 2, where the brightness has been properly adjusted.

      (4) Line 184-185: In Fig. 2I, some CdvB2ΔC puncta seem independent of CdvB1 and are not localized at membrane necks. How many such puncta exist? For example, in the provided micrograph, 2 out of 5 clusters are independent of CdvB1. This proportion is significant. Could the authors quantify the prevalence of these structures and discuss why they form?

      We thank the referee for giving us the opportunity to explain this apparent discrepancy. We’ll like to stress the fact that CdvB2ΔC and CdvB1 form an obligate heterodimer: in all our experiments, without exception, we find that they form a strong complex when we mix the two proteins. This is true both in dumbbells and in flotation assays.

      In the particular example of Figure 2I, it indeed appears that there are some clusters of CdvB2ΔC that do not contain CdvB1 (we indicated them in Author response image 3 by red arrowheads), while the yellow arrowheads indicate clusters that contain both proteins. It can be clearly seen that the clusters that do contain both proteins (yellow arrows) are localized at necks, while those that only contain CdvB2ΔC (red arrows) are not localized at necks. This is no coincidence. The clusters indicated by the red arrow do contain CdvB1. However, these clusters rapidly diffuse on the membrane plane because they are not fixed at the neck: therefore, they constantly shift in and out of focus. Because there is a time delay in the acquisition of each channel (between 0.5 and 1 second), these cluster were in focus when the CdvB2ΔC signal was being acquired, but sifted out of focus when the CdvB1 signal was being acquired. This implies that the clusters indicated by the yellow arrowheads are stably localized at necks, which is precisely the point we wished to make with this experiment: because Cdv proteins have affinity for curved geometry, they preferentially and stably localize at necks. Why don’t all the clusters localize at necks then?

      (5) Figure 1E and 1F: Why do lipids accumulate and colocalize with the proteins? How can the authors confirm lumen connectivity between vesicles? Performing FRAP assays could validate protein localization and enrichment at the lumen of the membrane necks.

      At first sight, indeed some lipid enrichment seems to be observed at the neck between lobes of dumbbells.

      This is, however, an imaging artifact due to the fact that the neck is diffraction limited. As shown in the Author response image 5, we are acquiring the membrane signal from both lobes at the neck region, and therefore the signal is roughly double, hence the apparent lipid enrichment.

      Author response image 5.

      Schematic illustrating that the neck between two lobes is smaller than the diffraction limit of optical microscopy (the size of a typical pixel is indicated by the green square). Because of this technical limitation, the fluorescence intensity of the membrane at the neck is twice that of a single membrane.

      The referee is correct in pointing out that these images do not prove that the lobes are connected, and that FRAP assays is the only way to prove this point. However, in previous papers we have confirmed extensively that in chains of dumbbells the lobes are connected:

      - De Franceschi N, Pezeshkian W, Fragasso A, Bruininks BMH, Tsai S, Marrink SJ, Dekker C. Synthetic Membrane Shaper for Controlled Liposome Deformation. ACS Nano. 2022 Nov 28;17(2):966–78. doi: 10.1021/acsnano.2c06125.

      - De Franceschi N, Barth R, Meindlhumer S, Fragasso A, Dekker C. Dynamin A as a one-component division machinery for synthetic cells. Nat Nanotechnol. 2024 Jan;19(1):70-76. doi: 10.1038/s41565023-01510-3.

      Random sticking of liposomes would also generate clusters of vesicles, not linear chains. We now provide also a Movie (Movie 1) supporting this point.

      Investigating whether the necks are open or closed after Cdv reconstitution is indeed a very relevant question, that could be rephrased as “verify whether Cdv proteins or their combination can induce membrane scission”. This is however beyond the scope of this manuscript, as the current work merely addressed the question of hierarchical recruitment of Cdv proteins at the membrane. We plan to examine this in future work.

      (6) Why didn't the authors use the same lipid composition, particularly the same proportion of negatively charged lipids, on the SUVs of the flotation assays and on the dumbbell-shaped liposomes?

      In flotation assays, it is typical to use a relatively large proportion of negatively charged lipids, to promote protein binding. This is because the aim is to maximize membrane coverage by the protein. The SMS procedure to generate dumbbell-shaped GUVs is completely different, however. Rather than covering the membrane with protein, the idea is to reduce the amount of protein to a minimum, so that any curvature preference can be best visualized. This is e.g. routinely done in tube pulling experiments, for the same reason (See for instance Prévost C, Zhao H, Manzi J, Lemichez E, Lappalainen P, Callan-Jones A, Bassereau P. IRSp53 senses negative membrane curvature and phase separates along membrane tubules. Nat Commun. 2015 Oct 15;6:8529. doi: 10.1038/ncomms9529).

      (7) Line 117-119: The suggestion that polymer formation between CdvA and CdvB facilitates membrane recruitment is intriguing. However, fluorescence microscopy experiments could better elucidate whether there is sequential recruitment of CdvB followed by CdvA, or if these proteins form a heteropolymer composite for membrane binding. Can CdvB bind membranes independently, or does this require synergy between CdvA and CdvB.

      We thank the referee for prompting us to perform this experiment. As we now show in Figure 1C, CdvB indeed is able to bind the membrane independently of CdvA. Whether this happens sequentially or simultaneously is an interesting question, but one that is impossible to address with either the SMS or the flotation assay, because in both cases we can only observe the endpoint of the recruitment.

      We would also like to clarify one specific experimental detail. Perhaps unsurprisingly, the results from the flotation assay are dependent on the way the assay is performed. In particular, we observed that the same protein can exhibit a different binding profile depending on whether it is being loaded either at the top or at the bottom of the gradient. This can be seen in Author response image 6. This is counterintuitive, since once the equilibrium is reached, the result should only depend on the density of the sample. We performed an overnight centrifugation (> 16 hours) on a short tube (< 3 cm tall), thus equilibrium is being reached (which is corroborated by the fact that CdvB1 and CdvB2 can float to the top of the gradient within this timespan, as shown in Figure 2C, 2E, 2G). We ascribe the difference between top and bottom loading to the fact that, when the sample is loaded at the bottom, it has to be mixed with a concentrated sucrose solution, while in the case of loading from the top, this is not done.

      In literature, both loading from top and from bottom have been used:

      - Lata S, Schoehn G, Jain A, Pires R, Piehler J, Gottlinger HG, Weissenhorn W. Helical structures of ESCRTIII are disassembled by VPS4. Science. 2008 Sep 5;321(5894):1354-7. doi: 10.1126/science.1161070

      - Moriscot C, Gribaldo S, Jault JM, Krupovic M, Arnaud J, Jamin M, Schoehn G, Forterre P, Weissenhorn W, Renesto P. Crenarchaeal CdvA forms double-helical filaments containing DNA and interacts with ESCRT-III-like CdvB. PLoS One. 2011;6(7):e21921. doi: 10.1371/journal.pone.0021921.

      - Senju Y, Lappalainen P, Zhao H. Liposome Co-sedimentation and Co-flotation Assays to Study LipidProtein Interactions. Methods Mol Biol. 2021;2251:195-204. doi: 10.1007/978-1-0716-1142-5_14. In performing the flotation assay for CdvB1 and CdvB2ΔC, or when using all 4 proteins together, we loaded the sample at the bottom, and we could detect reproducible binding to liposomes (Figures 2D, 2F, 2H, 3A). However, CdvB does not bind the membrane when loaded at the bottom. Thus, for the experiments shown in figure 1C, we loaded the proteins at the top. This experimental setup allowed us to highlight that CdvB indeed induce a stronger interaction between CdvA and the membrane.

      Author response image 6.

      CdvB binding to multilamellar vesicles in a flotation assay. In the left panel, the sample was loaded at the top of the sucrose gradient; in the right panel it was loaded at the bottom.

      (8) Line 165-173: The authors claim that filament curvature differs between CdvB2ΔC alone and the CdvB1:CdvB2ΔC complex. Are these differences statistically significant? What is the sample size (N)? Furthermore, how do the authors confirm interactions between these proteins in the absence of membranes based solely on EM micrographs?

      We can confirm that the filaments are composed by both proteins, because the filaments have different curvature when both proteins are present. However, as requested by referee 3, point (7), we removed the quantification of curvature from panel 2C. We report the N number in the text.

      (9) Line 121-123: Are the authors referring to positive or negative membrane curvatures? The cited literature suggests ESCRT-III proteins either lack curvature preferences (e.g., Snf7, CHMP4B) or prefer high positive curvature (e.g., late ESCRT-III subunits). This is confusing since the authors later test recruitment to negatively curved necks.

      We do not claim that Cdv proteins prefer positive or negative curvature, because the necks present in dumbbells have a catenoid geometry, which include both positive and negative curvature. We have now clarified this in the discussion.

      (10) Since the conclusions rely on the oligomeric state of the proteins, providing SEC-MALS spectra to show the protein oligomeric state right after the purification would strengthen the claims.

      While such SEC-MALDI experiments may be interesting, practical implementation of this is not possible since both first authors left the lab on to new positions.

      (11) Line 157-160: Suppl. Fig. 2 shows only a single EM micrograph of a small filament. Could the authors provide lower magnification images showing more filaments?

      As requested by Referee 3, point (7), we have toned down the importance of these short filaments.

      Also, why are the sample sizes for filament length (N=161) and width (N=129) different?

      Protein filaments formed by Cdv tend to stick to each other side by side, so that for some filaments the width could not be accurately assessed, and accordingly those were removed from the analysis.

      (12) The introduction states that CdvA binds membranes while CdvB does not. However, the results suggest CdvB facilitates membrane binding, helping CdvA attach. This discrepancy needs further explanation.

      We thank the referee for raising this point. We have now performed additional experiments (both SMS assay and flotation assays) showing that indeed CdvB from M. sedula is (unlike CdvB from Sulfolobus) able to bind the membrane on its own (Figure 1C, 1F).

      Reviewer #2 (Recommendations for the authors):

      Best practice would be to show single fluorescence channels in grayscale or inverted grayscale, retaining pseudocolouring only for the merged multichannel image.

      We decided to retain and standardize the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. We believe this improves readability, and this was also a request from Referee 3. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      It would be great to include a quantification of liposome curvature vs focal intensity of the various Cdv components - across figures.

      Quantification of liposome curvature at the neck can be done (De Franceschi et al., Nature Nanotech. 2024). However, in practice, this requires transferring of the sample post-preparation into a new chamber in order to increase the signal-to-noise ratio of the encapsulated dye, a procedure that drastically reduces the yield of dumbbells. The very sizeable amount of work required to obtain reliable measurements, especially considering all the proteins and protein combinations used in this study, indicates that this represents a project in itself, which goes well beyond the scope of this manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) We would encourage the authors to consider including the length of the scale bar next to the scale bar in each image and not in the figure description. This would greatly aid in clarity and interpretation of figures.

      We have now written the length of the scale bar in the figures.

      (2) In a similar vein, could the authors consider labeling panels throughout the manuscript, writing that sample is being presented? This goes mainly for the negative stain and the dumbbell fluorescence images, as having to continuously consult the figure legend again hinders clarity.

      We have now labelled the EM images as requested by the referee.

      (3) Lines 254-256: would the statement hold not only for CdvB2∆C, but for all imaged proteins? They all seem to localize to membrane necks, presumably favoring membrane binding to a specific membrane topology.

      We agree with the referee, and changed the phrasing accordingly.

      (4) CdvB2∆C construct - presumably this was a truncation of helix 5 of the ESCRT-III domain? Figure 1A shows that the ESCRT-III domain spans residues 34-170 and therefore implies that all five ESCRT-III helices (which make up the ESCRT-III domain) are present in the C-terminal truncation. Could the authors clarify?

      Indeed, the truncation was done at residue 170.

      (5) Results of the liposome flotation assays are presented inconsistently across the three figures (Figs 1C, 2DFH, and 3A). This makes it more difficult than it needs to be to interpret and compare results. Could the authors consider presenting the three gels in a more similar, standardized way across the three figures?

      To improve readability, we now standardized the colors, both for gels and for microscopy images, in order to have the same color-code for each protein. Thus, throughout the manuscript, CdvA is in grayscale, CdvB in yellow, CdvB1 in green, CdvB2ΔC in cyan and the membrane in magenta.

      (6) From the data presented in Fig 1EF, it cannot be concluded whether CdvB and CdvA colocalize, as only one protein is labelled. Is there a technical reason for this?

      We have now repeated the same experiment by having both proteins labelled, confirming that there is co-localization at the neck (Figure 1G).

      (7) Fig 2C: is the difference between the two samples significant

      As requested by Referee 3, we have removed Figure 2C.

      (8) Fig 2I is missing a 'merged' panel.

      We have now added the merged panel.

      (9) The fluorescence intensity plots in Supp Figs 1C and 3C would be easier to interpret if the lipid and protein signal would be plotted on the same plot (say, with normalized fluorescence intensity)

      It is not immediately obvious to us what the signal should be normalized to. What we wished to convey with these plots was that the intensity of proteins spikes at the neck region. In an attempt to improve clarity, we have now aligned the plots vertically, and highlighted the position of the neck.

      (10) CdvA should have a capital "A" in Figure 3A, panel 3.

      We have now corrected this.

      (11) The discussion doesn't comment on the need to truncate CdvB2.

      This is explained in the result session.

    1. Reviewer #2 (Public review):

      Summary:

      This study systematically characterizes the activity patterns of a lateral supramammillary nucleus (SuM)-medial septum (MS)-hippocampus circuit across sleep-wake cycles and its role in memory consolidation. The authors demonstrate that the lateral SuM-MS projection is specifically active during REM sleep, and that REM-selective inhibition of this circuit, and of its downstream MS-CA2 pathway, impairs the consolidation of social memory. The work is well-designed, and the data are robust in supporting clear and significant conclusions. It provides important new insights into how distinct memory modalities could be processed by parallel, sleep-active subcortical-hippocampal circuits. The manuscript is of high quality overall, with some points to address as detailed below.

      Strengths:

      (1) Novel finding:<br /> The identification of a REM-specialized subpopulation within the lateral SuM-MS pathway and its specific role in social memory consolidation via the lateral SuM-MS-CA2 projection is a significant advance. It effectively complements the previously described direct SuM-CA2 pathway and supports a model of the SuM as a "REM-hub" routing information through dedicated downstream targets.

      (2) Technical rigor:<br /> The combination of retrograde tracing, in vivo calcium imaging, single-unit identification via optrode recording, and temporally precise (REM-sleep-specific) optogenetic manipulation provides strong correlative and causal evidence.

      (3) Appropriate controls:<br /> Behavioral experiments include crucial controls for optogenetic inhibition (GtACR1 group, NREM/Wake inhibition control, mCherry control), effectively ruling out nonspecific effects of light or timing.

      Weaknesses:

      (1) Figure titles/descriptions:<br /> For clarity, the authors should consider specifying the recording method in the figure titles or legends. For instance, Figure 2: "Bulk Ca2+ activity (fiber photometry) of lateral SuM-MS projecting neurons..." and Figure 3: "Single-unit activity patterns (optrode recordings) of lateral SuM-MS projecting neurons...".

      (2) Statistical details:<br /> The use of "LSD post-hoc comparison" following ANOVA is noted. LSD is sensitive but can increase Type I error risk with multiple comparisons. Please justify its use or consider employing a more conservative post-hoc test (e.g., Tukey's or Bonferroni) for key comparisons like the social preference index in Figure 4h to bolster robustness.

      (3) Data presentation:<br /> When reporting statistical results in figure legends (e.g., Figures 2d, 3i-k), please provide the specific test statistic values (e.g., F, χ²) and exact P values where possible, rather than only significance asterisks.

      (4) Deepening mechanistic insight:<br /> The study excellently demonstrates "what" the circuit does. The discussion could be strengthened by further exploring "how" it might work. The finding that SuM-MS inhibition does not affect CA1 theta power is interesting and distinguishes it from other MS/hippocampal pathways. The suggestion of a theta-independent mechanism is plausible. Could the authors hypothesize more specifically? For example, might this circuit modulate reactivation events in the local CA2 network, neurochemical milieu (e.g., acetylcholine), or synapse-specific plasticity during REM sleep to facilitate social memory consolidation?

      (5) Implications of regional heterogeneity:<br /> The functional divergence between lateral (90% REM-active) and medial SuM-MS neurons is intriguing. A brief discussion on the potential anatomical basis (differential inputs/outputs) and functional significance (e.g., integration of specific affective or arousal signals) of this subdivision would be valuable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kroeg et al. describe a novel method for 2D culture human induced pluripotent stem cells (hiPSCs) to form cortical tissue in a multiwell format. The method claims to offer a significant advancement over existing developmental models. Their approach allows them to generate cultures with precise, reproducible dimensions and structure with a single rosette; consistent geometry; incorporating multiple neuronal and glial cell types (cellular diversity); avoiding the necrotic core (often seen in free-floating models due to limited nutrient and oxygen diffusion). The researchers demonstrate the method's capacity for long-term culture, exceeding ten months, and show the formation of mature dendritic spines and considerable neuronal activity. The method aims to tackle multiple key problems of in vitro neural cultures: reproducibility, diversity, topological consistency, and electrophysiological activity. The authors suggest their potential in high-throughput screening and neurotoxicological studies.

      Strengths:

      The main advances in the paper seem to be: The culture developed by the authors appears to have optimal conditions for neural differentiation, lineage diversification, and long-term culture beyond 300 days. These seem to me as a major strength of the paper and an important contribution to the field. The authors present solid evidence about the high cell type diversity present in their cultures. It is a major point and therefore it could be better compared to the state of the art. I commend the authors for using three different IPS lines, this is a very important part of their proof. The staining and imaging quality of the manuscript is of excellent quality.

      We thank the reviewer for the positive comments on the potential of our novel platform to address key problems of in vitro neural culture, highlighting the longevity and reproducibility of the method across multiple cell lines.

      Weaknesses:

      (1) The title is misleading: The presented cultures appear not to be organoids, but 2D neural cultures, with an insufficiently described intermediate EB stage. For nomenclature, see: doi: 10.1038/s41586-022-05219-6. Should the tissue develop considerable 3D depth, it would suffer from the same limited nutrient supply as 3D models - as the authors point out in their introduction.

      We appreciate the opportunity to clarify this point. We respectfully disagree that the cultures do not meet the consensus definition of an organoid. In fact, a direct quote from the seminal nomenclature paper referenced by the reviewer states: “We define organoids as in vitro-generated cellular systems that emerge by self-organization, include multiple cell types, and exhibit some cytoarchitectural and functional features reminiscent of an organ or organ region. Organoids can be generated as 3D cultures or by a combination of 3D and 2D approaches (also known as 2.5D) that can develop and mature over long periods of time (months to years).” (Pasca et al, 2022 doi10.1038/s41586-022-05219-6). Therefore, while many organoid types indeed have a more spherical or globular 3D shape, the term organoid also applies to semi-3D or nonglobular adherent organoids, such as renal (Czerniecki et al 2018, doi.org/10.1016/j.stem.2018.04.022) and gastrointestinal organoids (Kakni et al 2022, doi.org/10.1016/j.tibtech.2022.01.006). Accordingly, the adherent cortical organoids described in the manuscript exhibit self-organization to single radial structures consisting of multiple cell layers in the z-axis, reaching ~200um thickness (therefore remaining within the limits for sufficient nutrient supply), with consistent cytoarchitectural topology and electrophysiological activity, and therefore meet the consensus definition of an organoid.

      (2) The method therefore should be compared to state-of-the-art (well-based or not) 2D cultures, which seems to be somewhat overlooked in the paper, therefore making it hard to assess what the advance is that is presented by this work.

      It was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods. Compared to stateof-the-art 2D neural network cultures, adherent cortical organoids provide distinct advantages in:

      (1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.

      (2) Longevity: adherent cortical organoids can be successfully kept in culture for at least 1 year, whereas 2D cultures typically deteriorate after 8-12 weeks.

      (3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.

      (4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks), and the emergence of oligodendrocyte lineage cells.

      On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures include:

      (1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.

      (2) Whole cell patch clamping is not easily feasible in adherent cortical organoids because of the restrictive geometry of 384-well plates.

      (3) Reproducibility is prominently claimed throughout the manuscript. However, it is challenging to assess this claim based on the data presented, which mostly contain single frames of unquantified, high-resolution images. There are almost no systematic quantifications presented. The ones present (Figure S1D, Figure 4) show very large variability. However, the authors show sets of images across wells (Figure S1B, Figure S3) which hint that in some important aspects, the culture seems reproducible and robust.

      We made considerable efforts to establish quantitative metrics to assess reproducibility. We applied a quantitative scoring system of single radial structures at different time points for multiple batches of all three lines as indicated in Figure S1C. This figure represents a comprehensive dataset in which each dot represents the average of a different batch of organoids containing 10-40 organoids per batch. To emphasize this, we have adapted the graph to better reflect the breadth of the dataset. Additional quantifications are given in Figure S2 for progenitor and layer markers for Line 1 and in Figure 2 for interneurons across all three lines, showing relatively low variability. That being said, we acknowledge the reviewer’s concerns and have modified the text to reduce the emphasis of this point, pending more extensive data addressing reproducibility across an even broader range of parameters.

      (4) What is in the middle? All images show markers in cells present around the center. The center however seems to be a dense lump of cells based on DAPI staining. What is the identity of these cells? Do these cells persist throughout the protocol? Do they divide? Until when? Addressing this prominent cell population is currently lacking.

      A more comprehensive characterization of the cells in the center remains a significant challenge due to the high cell density hindering antibody penetration. However, dyebased staining methods such as DAPI and the LIVE/DEAD panel confirm a predominance of intact nuclei with very minimal cell death. The limited available data suggest that a substantial proportion of the cells in the center are proliferative neural progenitors, indicated by immunolabeling for SOX2 (Figure 2A,D;Figure S4C). Furthermore, we are currently optimizing the conditions to perform single cell / nuclear RNA sequencing to further characterize the cellular composition of the organoids.

      (5) This manuscript proposes a new method of 2D neural culture. However, the description and representation of the method are currently insufficient. (a) The results section would benefit from a clear and concise, but step-by-step overview of the protocol. The current description refers to an earlier paper and appears to skip over some key steps. This section would benefit from being completely rewritten. This is not a replacement for a clear methods section, but a section that allows readers to clearly interpret results presented later.

      We have revised the manuscript to include a more detailed step-by-step overview of the protocol.

      (b) Along the same lines, the graphical abstract should be much more detailed. It should contain the time frames and the media used at the different stages of the protocol, seeding numbers, etc.

      As suggested, we have adapted the graphical abstract to include more detail.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, van der Kroeg et al have developed a method for creating 3D cortical organoids using iPSC-derived neural progenitor cells in 384-well plates, thus scaling down the neural organoids to adherent culture and a smaller format that is amenable to high throughput cultivation. These adherent cortical organoids, measuring 3 x 3 x 0.2 mm, self-organize over eight weeks and include multiple neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.

      Strengths:

      (1) The organoids can be cultured for up to 10 months, exhibiting mature dendritic spines, axonal myelination, and robust neuronal activity.

      (2) Unlike free-floating organoids, these do not develop necrotic cores, making them ideal for high-throughput drug discovery, neurotoxicological screening, and brain disorder studies.

      (3) The method addresses the technical challenge of achieving higher-order neural complexity with reduced heterogeneity and the issue of necrosis in larger organoids. The method presents a technical advance in organoid culture.

      (4) The method has been demonstrated with multiple cell lines which is a strength.

      (5) The manuscript provides high-quality immunostaining for multiple markers.

      We appreciate the reviewer’s acknowledgement of the strengths of this novel platform as a technical advance in organoid culture that reduces heterogeneity and shows potential for higher throughput experiments.

      Weaknesses:

      (1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.

      In our opinion, it would be extremely difficult to directly compare methods. Most notably, whole brain organoids grow to large and irregular globular shapes, while adherent cortical organoids have a more standardized shape confined by the geometry of a 384well. Moreover, it was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods, as addressed in response to comment 2 of Reviewer 1 above.

      (2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate?

      Figure S1 shows the success rate of organoid formation and stability of the organoid structures over time. In addition, we have added the number of wells that were filled per plate.

      (3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs.

      Figure S1 provides the relationship between proliferation rate and seeding density, allowing estimation of seeding densities based on the proliferation rate of the NPCs. However, we appreciate the reviewers' feedback and have modified the methods to provide more detail.

      Reviewer #3 (Public review):

      Summary:

      Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.

      Strengths:

      Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems.

      We thank the reviewer for highlighting the strengths of our novel platform. We appreciate that all three reviewers agree that the adherent cortical organoids presented in this manuscript reliably demonstrate increased reproducibility and longevity. They also commend its potential for higher throughput drug discovery and neurotoxicological/phenotype screening purposes.

      Weaknesses:

      While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Mainly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.

      We appreciate the feedback and have added more detail on consistency and standardization of functional outputs.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points

      (1) As the preprint is officially part of the eLife review, I have to remark that the preprint which is made available on bioarxiv, suffers from some serious compatibility or format problem: one cannot highlight sentences as in a regular PDF and when trying to copypaste sentences from it jumbled characters are copied to the clipboard.

      The updated version of the paper on bioRxiv should not suffer from these compatibility issues.

      (2) Since the paper is presenting a new method it should briefly describe how each step, including the hiPSC culture was done, the reference to an earlier publication in this case is not sufficient, and this practice is generally best to avoid for methods papers.

      Each step in the culturing process has now been described in the methods.

      (3) The EB stage is insufficiently described. The "2D - 3D - 2D" transitions should be clearly explained.

      The methods section has been rewritten and expanded to include these processes in more detail.

      (4) Is there one FACS sorting in the protocol, or multiple (additional at IPS culture)? What markers each? What is the motivation for sorting and purifying the neural progenitors? Was the culture impure? What was purity? What cell types are expected after sorting, and what is removed?

      Only one FACS sorting step is performed at the NPC stage. This was added as an improvement to our original neural network protocol (Günhanlar et al 2018) to ensure consistency over different hiPSC source cell lines that can yield variable amounts of frontal cortical patterned NPCs. Positive sorting for neural lineage markers CD184 and CD24, and negative sorting for mesenchymal/neural crest CD217 and CD44 glial progenitor markers, according to Yuan et al 2011, ensures frontal-patterned cortical NPCs as confirmed for all batches by immunohistochemistry for SOX2, Nestin and FOXG1. We have added new text to the Methods section to clarify this more explicitly.

      (5) Seeding protocol and parameters are insufficiently described, and from what I read they are poorly defined: "Specifically, the optimal seeding density was determined by visual inspection of the organoids between 28 to 42 days after seeding a range of cell densities in the 384-well plate wells." For a new method, precise, actionable instructions are needed. I may have overlooked those elsewhere, in this case, please clarify these sections.

      The Methods section was rewritten and expanded to describe the methodology in greater detail with more actionable instructions.

      (6) The timeline in Figure 1 is not clearly delineated; I found it hard to understand which figure corresponds to which stage (e.g. facs sorting is not mentioned in the first part of the results but it is part of Figure 1A, neural rosette formation can happen both before and after facs sorting, simply referring to rosettes is not clear). Later parts of the manuscript 
> clearly introduce the terms sorting and seeding in the context of this method, and how ages (days) refer to these time points.

      Figure 1 was adapted to clarify the generation of Neural Progenitor Cells (NPCs) and subsequent seeding of NPCs to generate Adherent Cortical Organoids (ACOs).

      (7) The authors define: "cortical organized defined as a single radial structure." This is not a commonly used definition of organoids, for nomenclature, please see: doi: 10.1038/s41586-022-05219-6 (Pasca et al 2022).

      To clarify, the statement is not meant to reflect a definition of organoids in general, but rather the scoring of proper structure formation for Figure S1C. For discussion on nomenclature, see our response to point 1 of Reviewer 1 in the public review. We changed the wording to be more accurate.

      (8) In Figure S1d, the authors write: "the fraction of structurally intact cultures decreased to 50%", but I'm looking at that graph there seems to be no notable decrease, but huge variability. The authors should quantify claims of decrease by linear regression and an R square. Variation within and the cross-cell lines seem to be large. Also, it is unclear if dots are corresponding to the same wells/plates, in other words: is this a longitudinal experiment? What is the overall success rate? How is success determined? Are there clear criteria? to the same wells/plates, in other words: is this a longitudinal experiment? What is the overall success rate? How is success determined? Are there clear criteria?

      We agree with the reviewer that the claim on fraction of intact cultures decreasing over time to 50% is an overinterpretation due the large variability. We changed the wording in the manuscript to: While some later batches show moderately reduced success rates compared with the earliest batches, properly formed single-structure organoids were still obtained at 40–90% success across all examined time points (Figure S1C), indicating that long-term culture is feasible albeit with variable efficiency. The data are not longitudinal as each dot represents an endpoint of a different batch of organoids, totaling 18 independent batches across the three lines. We have clarified this in the figure legend. Success was defined at the well level as the presence of a single, continuous radial structure occupying the well, without obvious fragmentation or fusion events, as assessed by LIVE/DEAD that also confirmed viability. Wells were scored as successful only when the radial structure showed predominantly live signal with no large necrotic areas. Wells containing multiple radial structures, fused aggregates, or predominantly dead tissue were scored as unsuccessful.

      (9) Figure s1c: the numbering to this panel should be swapped, because it is referenced after other panels in the text. The reference is confusing: "Plotting the interaction between proliferation and the amount of NPCs required to be seeded for the successful generation of adherent cortical organoids" - success is not present in this graph at all? How is that measured?

      Figures S1C and S1D have been adapted to clarify the measure of ‘successful organoid formation’.

      (a) The description of this plot is confusing: "The doubling time of the NPCs explains more than half the variation (r2 = 0.67) of the required seeding density." What else is there? I thought that this was the formula the authors suggested to determine seeding density, but it seems not. Or is "manual inspection" the determinant, and that seems to correlate with this metric?

      Even though the rate of proliferation, measured as doubling time, is the main determinant of the seeding density, it is not the only determinant of the seeding density. For instance, intrinsic differences in differentiation potential could also play a role. Therefore, NPC lines with similar doubling times might still have slightly different optimal seeding densities. We have added clarification of this conclusion to the Results section.

      (b) Seeding density is a key parameter in many in vitro differentiation and culture protocols. This importance however does not mean that this density is attributable to differences in cell proliferation rate. Alternatively, the amount of cells determines the amount of secreted molecules and cell-to-cell contacts.

      Here, when we refer to the cell density, we specifically refer to the cell density needed to generate the ACO. We show that the most important contributor to the variation in ACO formation is the proliferation, measured here as the doubling time. We agree that there are other factors involved such as the secreted molecules, cell-to-cell contacts as well as the ability of a given NPC line to differentiate into a post-mitotic cell.

      (c) Is it mentioned which cell line this experiment corresponds to?

      The data in Figure S1D is from the 3 reported cell lines, as well as 2 clones from a fourth IPS cell line. This is detailed in the Methods section of the proliferation assay.

      (d) Without a more detailed explanation, seeding density and doubling time could be independent variables.

      These two variables are highly correlated as shown in Figure S1D, but it is true that there can be other variables that account for the observed variance, as discussed above in Point 9b.

      (e) In this figure the success rate is not visible at all so I have no idea how the autors arrive at a conclusion about success rate.

      We have adapted the figure legend to reflect which cell lines the dots in Fig. S1D represent. NPC lines can have substantial variation in proliferation rates. The figure reflects data of NPCs of 5 clones of 4 different hiPSC lines (as indicated in the Methods) with different proliferation rates. Also, the ACO success rate (operationally defined uniformly to the data shown in Fig. S1C) was also included.

      (10) Figure 2: Clean spatial segregation seems to be a strength of the system and therefore I would recommend putting more of the relevant microscopy images to the main figure, which are now currently in Figure S4.

      We have adapted Figure 2 accordingly, and included additional representative cortical layering images in Figure S4.

      (11) The variability in interneuron content seems to be significant, as currently presented in the figure. However, this may be due to a special organization. It would first quantify in consecutive rings around the centers whether interneurons have a tendency to be enriched towards the center or the edge of the culture. Maybe this explains the variability that is currently present in Figure s5b.

      We agree that spatial organization of interneurons could, in principle, contribute to variability. In our analysis, however, images were acquired from positions selected by a random sampling grid across the entire culture, rather than from specific central or peripheral regions. Each field contained on average 130.6 ± 16.1 NeuN+ nuclei, which provided a relatively large sampling volume per position. If interneurons were strongly enriched at the center or edge, we would expect systematic differences in interneuron fraction between fields assigned to central versus peripheral grid positions. We did not observe such a pattern in our dataset, suggesting that spatial organization is not the main driver of the observed variability.

      (12) Because in previous figures it seems like there is considerable variability across individual cultures and images here are coming from separate cultures, please use different shapes of the points coming from different cultures/wells, to see if maybe there is a culture-to-culture difference that explains the variability present in the figure.

      We have added different symbols per organoid for the interneuron quantifications and moved this quantification to main Figure 2.

      (13) I believe it is currently the standard error of the mean which is displayed in the figure, which is not an appropriate representation for variability, or the reproducibility across individual data points. SEM quantifies the reproducibility of the mean, not the reproducibility of the individual data points, which matters here. Mean refers to the mean of this quantification experiment and therefore it's not a biological entity. A box plot showing the interquartile range besides the individual data points would be an accurate representation of the spread of the data.

      We agree and have adapted the data, now in Figure 5, accordingly.

      (14) Again, in general, the main figures should contain much more of the quantification, as opposed to just raw images.

      Quantifications have been added in Figure 2 for the GAD67/NeuN for all cell lines as well as a time course quantification of GAD67/NeuN for 1 of the cell lines. In Figure 4, we have added excitatory and inhibitory synaptic quantifications.

      (15) Figure 2F-I the location of the center of the rosette should be marked with a star so that the conclusion about the direction of processes can be established.

      The suggested addition of a marker at the center of each rosette was evaluated but not implemented, because it reduced rather than improved figure clarity.

      (16) Figure 3 b and c:

      High magnification images of single cells, can't show changes in cell type morphology, and one cannot conclude that these cells are present in significant numbers across time. Zoomed-out images or quantification would be necessary for such a claim. The authors already have such images as presented in the next panels, so quantification without new experiments.
> I am uncertain about the T3 supplement here - do these images correspond to the same conditions?

      (a) It is unclear to me why different markers are used in the different panels, namely why NG2 is not used in any of the other images.

      NG2 was used at early developmental time points to show the presence of Oligodendrocyte Precursor Cells (OPCs). At later time points, the focus switched to MBP staining to indicate more mature oligodendrocyte lineage cells. Although NG2 and MBP are not in the same panels, the staining was performed for both antibodies at the same developmental time point (Day 119) as seen in Figure 3C and 3D.

      (b) Color coding in Figure 3G is ambiguous; the use of two blues should be avoided, and the Sub-sub panels should be individually labeled for the color code.

      We agree, and have now used different colors.

      (c) It is unclear if the presence of the t3 molecule is part of the standard procedure or if it was a side experiment to enhance the survival of oligodendrocytes. Are there no oligodendrocytes without? How does T3 affect other cell types, and the general health and differentiation of the cultures?

      Indeed, T3 is essential for oligodendrocyte formation. We did not observe obvious effects on the general health or differentiation potential of the cultures.

      (d) Is the 2ng/ml t3 from day one to the final day?

      Indeed, in the organoids cultured to study oligodendrocyte formation, T3 was added from Day 1. These details have now been clarified in the Methods and Results sections.

      (17) Figure 4:

      (a) Microscopy in this figure is high quality and very convincing about neural maturity.

      (b) The term "cluster" should be avoided. Unclear what it means here, but my best guess is "cells in a frame of view." Cluster is used with a different meaning in electrophysiology.

      This was adapted to ‘neurons in a field of view (FOV)’.

      (c) Panel J: I assume each row corresponds to a single cell? Could this be clarified? Are these selected cells from each frame, or all active cells are represented?

      Indeed, each row corresponds to a single cell, showing all active cells in the frame. This is now clarified in the legend.

      (d) How many Wells do these data correspond to, and in which line it was measured?

      As reported in the legend for Figure 5, these data correspond to 2 wells at Day 61 to which we have now added calcium imaging data from 3 wells from a different batch at Day 100. We have included in the legend that these recordings were from Line 1.

      (e) Panels G to I, again, the use of standard error of the mean is inappropriate and misleading: looking at the error bar one must conclude that there is minimal variation, which is the exact opposite of the conclusions, when one would look at the variability of the raw data points.

      As suggested, the graphs have been adapted as boxplots with interquartile ranges to highlight the distribution of data points.

      (f) It is unclear how many neurons and how many total actively firing neurons are present in the videos analyzed

      All neurons that were active in the field of view and showed at least one calcium event during the ~10 minute recording were included in the analysis. Using this method, we cannot comment on the proportion of neurons that were active from the total amount of neurons present, since the AAV virus we used does not transduce all neurons.

      (g) This figure shows the strength of the method in achieving neural maturity and function. There seems to be that there is considerable activity in the neuronal cultures analyzed. To conclude how reliably the method leads to such mature cultures one would need to measure at least a dozen wells (even if with some simpler and low-resolution method). Concluding reproducibility from one or two hand-picked examples is not possible.

      We agree with the reviewer that the number of wells used for calcium imaging analysis was limited. We are currently working on more advanced methods to increase the throughput of this analysis. However, we’ve now added another timepoint to the calcium imaging data in Figure 5 from an independent batch of 3 adherent cortical organoids, which demonstrates continued robust activity at Day 100, as well as Day 61.

      Methods:

      (1) Stem cell culture. The artist described that line 3 is grown on MEFs. Is this true for the other two lines, furthermore were they cultured in identical conditions?

      Line 2 and 3 were not grown on MEFs. We specifically chose different sources of NPCs to reflect the robust nature of the differentiation protocol. We have recently also adapted the protocol from Line 3 NPCs to confirm that the protocol also works starting from hiPSCs grown in feeder-free conditions in StemFlex medium, by adapting NPC differentiation according to our recent publication in Frontiers in Cellular Neuroscience (Eigenhuis et al 2023).

      (2) "NPCs were differentiated to adherent cortical organoids between passages 3 and 7 after sorting." Please clarify this sentence. I assume it refers to the first facs sorting of the protocol, but a section is not sufficiently detailed.

      We have adapted the methods to clarify that the FACS purification step occurs at the NPC stage.

      (3) I didn't fully understand: It seems to be that there are two steps of fact sorting involved, one after passage 3 and one after week 4. This should be represented in the graphical abstract of Figure 1.

      As outlined above, there is only 1 FACS sorting step at NPC stage. We have adapted this in the Methods and in the graphical abstract.

      (4) Neural differentiation: The authors write that optimal seeding density was determined by visual inspection of the organoids - this is.

      We have clarified the Methods section to better explain the process of optimizing the seeding density for each NPC line to generate the ACOs.

      (5) What does the following sentence mean: "Cells were refreshed every 2-3 days." Does it mean in replacement of the complete media? How much Media was added to the Wells?

      This is a very good point that we have now clarified in the Methods, as full replenishment of media is neither feasible, nor desirable. From the total volume of 110 µl per well, 80 µl is taken out and replaced with 85 µl to compensate for evaporation.

      (6) Calcium imaging: can the authors explain the decision to move the cultures one day before imaging into brainphys neural differentiation medium? In 3D organoid protocols, brainphys is gradually introduced to avoid culture shock (very different composition), and used for multiple months to enhance neural differentiation. For recording electrophysiological activity, artificial CSF is the most common choice.

      Indeed, for whole cell recordings of 2D neural networks as performed in Günhanlar et al 2018, we used gradual transition to aCSF. For the current ACOs, we found that using BrainPhys from the start of organoid differentiation prevents structure formation, probably because of increased speed of maturation disrupting proliferation and organization of radial glia differentiation. However, by changing the media to BrainPhys just one day before recording (reflecting a gradual change as not all medium is fully replenished and easier than switching to aCSF during recording), we saw greatly improved neuronal activity.

      (7) Statistical analysis : As I pointed out before, the standard error of the mean is not an appropriate metric to represent the variability of the data. It is meant to represent the variability of the estimated average. The following thought experiment should make it clear: I measured the expression of a gene in my system. 50 times I measured 0 and 50 times I measured 100. The average is 50, but of course it is a very bad representation of the data because no such data points exist with that value. Yet the standard error of the mean would be plus minus 5.

      We have revised Figures 5C–5D to boxplots displaying the interquartile range with all individual data points overlaid, which more accurately represents the variability in the dataset.

      Discussion

      (1) The discussion focuses on human cortical development, however, the methods presented by the authors entail dissociation and replating through multiple stages not part of brain development. I see the approach as more valuable as a possibly reliable method that generates both diverse and mature neural cultures.

      We have revised the Discussion to avoid explicitly invoking an in vitro recapitulation of human cortical development. Nevertheless, given that the NPCs from which the organoids originate exhibit frontal cortical identity, coupled with the timely emergence of cortical neuronal markers and rudimentary cortical layering, we are increasingly confident that the development of these cultures most likely mirrors that of the frontal cortex. To further substantiate this hypothesis, single-cell RNA sequencing experiments will be conducted in the future to provide additional insights.

      (2) One of the major claims of the authors is that the method is very reproducible. However, there is almost no data on reproducibility throughout the paper. Mostly single, high magnification images are presented, which therefore represent a small region of a single well of a single batch of a single cell line. Based on the data presented it is not possible to evaluate the reproducibility of the method.

      We agree that the original version did not sufficiently document reproducibility. To address this, we have refined and expanded our presentation of reproducibility data. The previous success-rate panel (original Figure S1D) has been moved and adapted as the new Figure S1C. In this updated version, each dot still represents the endpoint success rate of an independent batch, but dot size now scales with batch size (10–40 organoids), and the legend specifies the total numbers of organoids analyzed per line (line 1: n=248; line 2: n=70; line 3: n=70). Together with the distribution of success rates between ~40– 90% across multiple time points and three iPSC lines, this more detailed representation allows readers to directly assess the robustness of line-to-line and batch-to-batch performance. In addition, new time course quantifications of interneuron proportion (Figure 2G,H), synaptic marker densities (Figure 4H, I), and late-stage calcium imaging (Figure 5C,D,E) further demonstrate that key structural and functional read-outs show overlapping ranges across lines and independent differentiations, reinforcing that the method yields reproducible core phenotypes despite some biological variability.

      (3) The data presented is very promising, and it suggests that the authors derived optimal conditions for neural differentiation and neural culture diversification. I am confident that the authors can show that reproducibility, at least in a practical sense (e.g. in wells that form a culture) is high.

      Overall, this is a very promising and exciting work, that I am looking forward to reading in a mature manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.

      We have now more clearly elaborated the differences with other methods. As addressed in our response to point 2 of Reviewer 1 in the public reviews, there are several limitations and advantages to the adherent cortical organoids model listed as follows:

      Advantages of adherent cortical organoids:

      (1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.

      (2) Longevity: adherent cortical organoids can be successfully kept in culture for at least 1 year, whereas 2D cultures typically deteriorate after 8-12 weeks.

      (3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.

      (4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks), and the emergence of oligodendrocyte lineage cells.

      On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures include:

      (1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.

      (2) Whole cell patch clamping is not easily feasible in adherent cortical organoids because of the restrictive geometry of 384-well plates.

      (2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate?

      We have addressed this question in the current version of Fig. S1C, in which multiple batches of organoids of all three lines were scored for their success rate. The graph reflects the proportion of properly formed organoids of +/- 400 seeded wells scored at different timepoints, in which each timepoint is a different batch. As mentioned in the response to Reviewer 1, we have also added data on the number of organoids seeded per line in the figure legend.

      (3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs.

      As outlined in the response to Reviewer 1, we have clarified the Methods and Discussion sections on seeding density and proliferation rate.

      Reviewer #3 (Recommendations for the authors):

      Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells. Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems. While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Particularly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.

      (1) Considering the emergence of astrocyte markers (GFAP, S100b) and upper layer neuron marker (CUX1) around Day 60, the overall differentiation speed is significantly faster compared to other forebrain organoid protocols. Are these accelerated sequences of neurodevelopment consistent across different hiPSC lines?

      As shown in Fig. S5, astrocytes are present around Day 60 for all three lines. For comparison with other organoid protocols, an important consideration is that the timeline for these organoids starts at NPC plating, while for other protocols timing often starts from the hiPSC stage. We have clarified the timeline in the graphical abstract in Figure 1A and in the Methods.

      (2) The calcium imaging results in Figure 4G were recorded at a single time point, Day 61, a relatively early time window compared to other forebrain organoid protocols (more than 100 days, PMID: 31257131; PMID: 36120104). Are the neurons in adherent cortical organoids functionally mature enough around Day 61? How consistent is this functional activity across different cell lines and independent differentiation batches?

      As discussed above in Point 1, it is important to consider that the specified timeline starts from NPC plating. In analogy to 2D neural networks, robust neuronal activity can be observed after ~8 weeks in culture. In addition, we have now added calcium imaging data for an additional batch of organoids at Day 100 in Figure 5, which exhibit comparable levels of neuronal activity as observed on Day 61.

      (3) Along the same line, Various cell types, such as oligodendrocytes and astrocytes, are believed to influence neuronal maturation. Therefore, longitudinal studies until the late stage are necessary to observe changes in electrophysiological activity based on the degree of neuronal maturation (at least two more later time points, such as 100 days and 150 days).

      As described in the previous points, we have now included a Day 100 time point in the calcium imaging data, in addition to the recordings at Day 61 (Figure 5C-E).

      (4) The authors assert that heterogeneity among organoids has been diminished using the human adherent cortical organoids protocol. However, there is inadequate quantitative data to prove the consistency of neuronal activities between different wells. Therefore, experiments quantifying the degree of heterogeneity between organoids, such as through methods like calcium imaging, are necessary to determine if neuron activity occurs consistently across each organoid well.

      We agree with the review and have added several quantitative experiments: a) we’ve added another timepoint to the calcium imaging data in Figure 5 from an independent batch of 3 adherent cortical organoids, which demonstrates continued robust activity at day 100, as well as day 61; b) we added synapse quantification in Figure 4, and c) interneuron quantification in Figure 2. We are currently also pursuing high throughput measures of activity to assess the longitudinal activity of ACOs in a larger number of wells. This way we can more definitively quantify the time-dependent variance in organoid activity.

      (5) Is this platform applicable to other functional measurements for neuronal activity, such as the MEA system? When observing the morphology of neurons formed in organoids, they appear to extend axons and dendrites in a consistent direction, suggesting a radial structure that demonstrates high reproducibility across wells. A culture system where neurons are arranged with such consistency in directionality could be highly beneficial for experiments utilizing the MEA system to assess parameters such as the speed of electrical activity transmission and stimulus-response. Therefore, there seems to be a need for a more detailed explanation of the utility of the structural characteristics of the culture system.

      The ACO platform is indeed suitable for MEA recordings. We are in the process of engineering the required geometry using HD-MEA systems through specialized inserts to generate ACOs on MEA systems.

      (6) In Figure 2E-I, authors suggest morphological diversity of GFAP+/S100b+ astrocyte, but the imaging data presented in Figure F-I is only based on GFAP immunoreactivity.

      Since GFAP is also expressed in radial glial cells at this stage (Figure 2I), many fibrous astrocytes and interlaminar astrocytes are likely radial glial neural progenitor cells instead of astrocytes. It appears necessary to perform additional staining using astrocyte markers such as S100B or outer radial glia markers such as HOPX to demonstrate that the figure depicts subtype-specific morphologies of astrocytes.

      In Figure 2M, we stained for GFAP and PAX6 to mark radial glia that look different than the astrocyte morphologies we describe in Figure 2J-L. We see a large overlap in GFAP and S100B staining in Figure 2I, in which most GFAP+ cells are double positive for S100B (yellow) that is more consistent with astrocyte maturation than radial glia. Furthermore, we have not seen PAX6 staining outside the dense edges of the center of the ACO.

      (7) In Figure 4D, the axon appears to exhibit directionality. Additional explanation regarding the organization of the axon is necessary. Further research utilizing sparse staining to examine the morphology of single neurons seems warranted.

      The polarized directionality of the axons is something we indeed have also noticed. We are looking into options to further investigate this intriguing property of the ACOs.

      (8) Figure 1E-F only showed cell viability in the early stages around Day 40-50. To demonstrate the superior long-term viability of ACO culture, it appears necessary to illustrate the ratio of dead cells to live cells over the course of a time course.

      Figure S1B shows LIVE/DEAD staining for ACOs of all three lines, revealing minimal DEAD staining at Day 56. A longitudinal time course experiment was not performed, however the line- and batch-specific quantifications over developmental timepoints in Figure S1C provide an indication of the robust long-term viability of the ACOs.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this fMRI study, the authors wished to assess neural mechanisms supporting flexible temporal construals. For this, human participants learned a story consisting of fifteen events. During fMRI, events were shown to them, and participants were instructed to consider the event from "an internal" or from "an external" perspective. The authors found distinct patterns of brain activity in the posterior parietal cortex (PPC) and anterior hippocampus for the internal and the external viewpoint. Specifically, activation in the posterior parietal cortex positively correlated with distance during the external-perspective task, but negatively during the internal-perspective task. The anterior hippocampus positively correlated with distance in both perspectives. The authors conclude that allocentric sequences are stored in the hippocampus, whereas egocentric sequences are supported by the parietal cortex.

      We thank the reviewer for the accurate summary of our study.

      Strengths:

      The research topic is fascinating, and very few labs in the world are asking the question of how time is represented in the human brain. Working hypotheses have been recently formulated, and the work tackles them from the perspective of construals theory.

      We appreciate the reviewer's positive and encouraging comments.

      Weaknesses:

      Although the work uses two distinct psychological tasks, the authors do not elaborate on the cognitive operationalization the tasks entail, nor the implication of the task design for the observed neural activation.

      We thank the reviewer for bringing this issue to our attention. In the revised manuscript, we have added a paragraph to the Discussion acknowledging this potential limitation of the study. Please see our response below.

      Reviewer #1 (Recommendations for the authors):

      Overall, I thank the authors for providing clear responses and much-needed detail on their original work, which enables a better understanding of their perspectives. I still have some detailed questions about the reported work, which I provide below. It could help clarify the work for a more general audience and its replicability by the community.

      We thank the reviewer for their positive evaluation of our previous revisions.

      Main general concern:

      I have one remaining core concern, which I distill as being a very different take on the usefulness of task design with neuroimaging. This concern follows from the authors' response to my original comment, which suggested possible confounds in fMRI data analysis and interpretation, as differences in task design and behavioral outcomes were not incorporated in the analytical approach.

      The authors confirmed that "there is a substantial difference between the two tasks" but argue that these differences are not relevant seing that "the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component " However, the authors do perform such contrasts in their analysis (e.g. p. 10: "We first directly contrasted the activity level between external- and internal-perspective tasks in the time window of...") and build inferences on brain activation from them (e.g., p. 10: "Compared with the internal-perspective task, the externalperspective task specifically activated the...").

      To clarify, my original concern was not about comparing neural activity in response to the two tasks but about the brain activity generated by two distinct tasks, which aim to reveal fundamentally distinct neural processes. The authors' response raises several concerns about the theoretical, methodological and empirical foundation of the work that are beyond the scope of a single empirical study and too long to detail here. Cognitive neuroscience relies on tasks to infer neural processes; this is the fertile and essential ground for using behavior in neuroscience to get to a mechanistic understanding of brain functions (e.g., Krakauer et al., 2017). In short, task design is fundamental because it shapes what neural processes are being investigated. Any inferences about brain activity recorded while a participant performs a task result from manipulated variables that should be under the control of the experimenter. Acknowledging that two tasks are distinct is acknowledging that different (neural) processes may govern their resolution. My initial remark was meant to highlight that, from basic signal detection theory, a same/different task and a temporal order task may not yield the same kind of basic biases and decision-making processes; these are far below and more basic than the posited sophisticated representations herein (construals, perspective taking).

      In short, the general approach is far coarser than the level of interpretational granularity being pushed forward in the paper would suggest.

      We greatly appreciate the reviewer’s comments and agree that this is a very fair point. We acknowledge that the two tasks differ in their underlying decision-making processes. In the revised manuscript, we have added a paragraph at the end of the Discussion to explicitly acknowledge this limitation and to outline possible avenues for future research (Page 23).

      “One limitation of the present study is that the external- and internal-perspective tasks differed not only in the type of perspective-taking they were intended to elicit, but also in their underlying decision-making processes. The external-perspective task explicitly required participants to compare two events with respect to external temporal landmarks and judge whether they occurred in the same or different parts of the day (i.e., a same/different judgment), whereas the internalperspective task explicitly required participants to project themselves into a reference event and judge whether the target event occurred in the future or the past relative to that reference (i.e., a temporal-order judgment). This task design ensured that participants adopted two distinct perspectives on the event series, but at the expense of coherence in the cognitive operations required to make the two types of judgments. One alternative approach would be to more closely align the response demands of the two tasks by drawing on McTaggart’s (1908) A-series and Bseries distinction: in the external-perspective task, participants could judge whether the target event occurred before or after the reference event (i.e., a before/after judgment), whereas in the internal-perspective task they could judge whether the target event occurred in the past or future relative to the reference event (i.e., a past/future judgment). Although such a design would improve coherence in the underlying decision-making processes (i.e., both are temporal-order judgments), it would reduce experimental control over the perspective-taking manipulation. For example, before/after judgments could still be made from an internal perspective. Future studies are therefore needed to determine whether findings obtained from these two task designs converge.”

      Additional clarifications:

      Intro/theory

      In this revised MS, the authors provided some clarifications of their theoretical perspective in the introduction. From my standpoint, the motivation remains insufficiently precise for a scientific report. Some theoretical aspects, such as construals or perspective taking remain evasive in relation to ego and allocentric representations. A couple of paragraphs dedicated to explaining what the authors mean precisely when using these terms would greatly help to situate the validity of the working hypothesis. In the absence of clear definitions, it remains difficult to evaluate what is being tested. For instance, what do the authors mean by "time construal"? How is a time construal the same or not as a "temporal distance" or a "temporal sequence"? This would greatly help the readership.

      Additionally, some assertions are not clearly identified or fairly attributed. For instance, the assertion that EST provides a means to spatialize time is the authors' point of view or interpretation of this work, not an original proposition of the theory. Another example is McTaggart's metaphysics on time series (in the ontology of time in physics) "echoed" in linguistics; it has effectively been proposed and popularized by L. Boroditskty. The prospective and retrospective views of time should not be attributed to Tsao et al but to Hicks or Block in the 70's, who studied the psychology of time in humans.

      We sincerely thank the reviewer for this criticism, which prompted us to clarify the relevant concepts in our manuscript. In the revised version, we made the following three main changes to the Introduction.

      In the second paragraph of the Introduction (page 3), we clarify that event segmentation theory is independent of, but related to, the spatial construal of time hypothesis. We also clarify what we mean by time construals and explain that the two temporal components—duration and sequence—can be represented within such time construals, rather than constituting time construals themselves. These revisions were intended to prevent potential misunderstandings for the reader. In addition, we incorporated Boroditsky’s contributions relevant to this framework:

      “One solution, which might be unique to humans, is to conceptualize time in terms of space (i.e., the spatial construal of time; e.g., Clark, 1973; Traugott, 1978; Lakoff & Johnson, 1980). Within this framework, time is usually first segmented into events—the basic temporal entities that observers conceive as having a beginning and an end (Zacks & Tversky, 2001). These temporal entities are then ordered in space, such that events occurring at different times can be maintained in working memory, allowing them to be flexibly accessed from different perspectives and easily referenced during communication (e.g., Casasanto & Boroditsky, 2008; Núñez & Cooperrider, 2013; Bender & Beller, 2014; Abrahamse et al., 2014; Figure 1A). The two core temporal components—duration and sequence—can be readily represented in such time construals.”

      In the third paragraph of the Introduction (pages 3-4), we acknowledge the contributions of earlier behavioral studies on prospective and retrospective timing by citing the work suggested by the reviewer (Block & Zakay, 1997), which indicates that two distinct cognitive systems underlie timing processes. These behavioral findings converge with the conclusions of more recent neuroimaging studies:

      “Unlike prospective timing tracking the continuous passage of time, durations in time construals are event-based (Sinha & Gärdenfors, 2014): the interval boundaries are constituted by events, and the event durations reflect their span (Figure 1A). Accumulating evidence suggests that distinct cognitive systems underlie these two types of duration (e.g., Block & Zakay, 1997). The motor and attentional system—particularly the supplementary motor area—has been associated with prospective timing (e.g., Protopapa et al., 2019; Nani et al., 2019; De Kock et al., 2021; Robbe, 2023), whereas the episodic memory system—particularly the hippocampus—is considered to support the representation of duration embedded within an event sequence (e.g., Barnett et al., 2014; Thavabalasingam et al., 2018; see also the comprehensive review by Lee et al., 2020).”

      Block, R. A., & Zakay, D. (1997). Prospective and retrospective duration judgments: A meta-analytic review. Psychonomic Bulletin & Review, 4(2), 184-197.

      In the fifth paragraph of the Introduction (page 5), we added a sentence to clarify the relationship between allocentric and egocentric reference frames and perspective taking:

      “However, the neural mechanisms that enable the brain to generate distinct construals of an event sequence remain largely unknown. Valuable insights may be drawn from research in the spatial domain, which posits the existence of stable allocentric representations that are independent of viewpoint, from which variable egocentric representations corresponding to different perspectives can be generated.”

      Methods:

      While more detail is provided in the Methods, some additional detail would be helpful to enable the replication of this work. For instance,

      - The table reports a sequence of phrases with assigned durations. Are the event phrases actual sentences given to participants? If so, how were participants made aware of the duration of the events, seeing that these sentence parts do not provide time information?

      We apologize that we did not make this clear. The full text used during the reading phase of learning has already been provided in Figure 1—source data 1, which includes the information about event durations. In the revised manuscript, we now explicitly refer to this information in the Methods section (page 38): In the reading phase, participants read a narrative describing the whole ritual on a computer screen twice (Figure 1—source data 1).

      - One of my original questions was about the narrative. In the Methods section, the authors state that participants read a text. Providing the full text would be helpful, also as a sanity check for sequentiality.

      As clarified in the previous response, the texts are provided in Figure 1—source data 1, which illustrates the texts for both even- and odd-numbered participants.

      - In the imagination phase, the authors introduce proportionality between imagination and experience (p. 37). What scale was used? What motivated it?

      We thank the reviewer for bringing this issue to our attention. In this study, participants did not directly experience the events; instead, they learned the event information through narrative reading or imagination to ensure experimental control and efficiency. As clarified in the Methods section, the ratio between imagination duration and actual event duration was 30 seconds to 1 hour. In the revised manuscript, we have further explained our motivation for this design choice (page 39):

      Here, we let participants learn the event information through narrative reading or imagination. Compared to learning through actual experience, this approach prioritizes experimental control and efficiency. The timing of the events is compressed, akin to the process of retrospectively recalling our experiences, in which we mentally traverse events without requiring the actual time they originally took. However, future studies may be needed to investigate whether the encoding of events from first- and second-hand experience differs.

      Results:

      - p. 10: the interpretation of the data on chunking and boundary effects should be properly referenced to e.g. Davachi's published work.

      We thank the reviewer for highlighting Davachi’s important work on event boundaries. We have appropriately cited these studies in the revised manuscript (page 10), as reflected in the following passage: This pattern can be interpreted as a categorical effect: sequential distances within the same part of the day were perceived as shorter (i.e., a chunking effect), whereas distances spanning different parts of the day were perceived as longer (i.e., a boundary effect). Similar boundary- or chunking-related effects on event cognition have been reported in previous studies (e.g., Ezzyat & Davachi, 2011; DuBrow & Davachi, 2013; Radvansky & Zacks, 2017).

      Ezzyat, Y., & Davachi, L. (2011). What constitutes an episode in episodic memory?. Psychological Science, 22(2), 243-252.

      DuBrow, S., & Davachi, L. (2013). The influence of context boundaries on memory for the sequential order of events. Journal of Experimental Psychology: General, 142(4), 1277.

      Radvansky, G. A., & Zacks, J. M. (2017). Event boundaries in memory and cognition. Current Opinion in Behavioral Sciences, 17, 133-140.

      Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      We sincerely appreciate the reviewers for providing an accurate, comprehensive, and objective summary of our study.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses as well as neuroimaging analyses is also much appreciated.

      We thank the reviewer for the positive and encouraging comments.

      Suggestions:

      The authors have done a commendable job addressing my previous comments. In particular, the additional analyses elucidating the potential contribution of boundary effects to the behavioural data, the impact of incorporating RT into the fMRI GLMs, and the differential contributions of RT and sequential distance to neural activity (i.e., in PPC) are valuable and strengthen the authors' interpretation of their findings.

      My one remaining suggestion pertains to the potential contribution of boundary effects. While the new analyses suggest that the RT findings are driven by sequential distance and duration independent of a boundary effect (i.e., Same vs. Different factor), I'm wondering whether the same applies to the neural findings? In other words, have the authors run a GLM in which the Same vs. Different factor is incorporated alongside distance and duration?

      We thank the reviewer for their positive evaluation of our previous revisions and are pleased that the additional analyses adequately address the boundary effects in the behavioral data and the RT effects in the neural data.

      With respect to boundary effects in the neural data, we followed the reviewer’s suggestion and constructed a more complex GLM that incorporated the Same/Different part of the day as an additional regressors modulating the target events. Importantly, the same PPC region continued to show an interaction effect between Task Type and Sequential Distance. We have added this important control analysis in our revised manuscript (Pages 13–14):

      “To further assess whether the observed PPC reactivation can be attributed to boundary or chunking effects introduced by the Parts of the Day, as well as other behavioral outputs, we performed an additional control analysis. Using a more complex first-level model, we included two extra regressors modulating the target events in both internal- and external-perspective tasks, alongside Sequential Distance and Duration: (1) Same/Different parts of the day (coded as 1/−1) and (2) Future/Past (coded as 1/−1). Even with these additional controls, the same PPC region remained the strongest area across the entire brain, showing an interaction effect between Task Type and Sequential Distance, although the cluster size was slightly reduced (voxel-level p < 0.001; clusterlevel FWE-corrected p = 0.054).”

    1. Reviewer #2 (Public review):

      Summary:

      The manuscript by Selvaratnam et al. defines how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. Using conditional HEB ablation driven by Vav Cre, flow cytometry, scRNA-seq, and reanalysis of ChIP-seq data the authors, provide evidence for a sequential model in which HEB and TCR-induced Egr2 cooperatively upregulate Id3, enabling gdT17 maturation and limiting diversion to the ab lineages. The work provides an important mechanistic insight into how the E/ID-protein axis coordinates gd T cell specification and effector maturation.

      Strengths include:

      (1) The proposed model that HEB primes, TCR induces, and Id3 stabilizes gdT17 cells in embryonal development is elegant and consistent with the findings.

      (2) The choice of animal models and the study of a precise developmental window.

      (3) The cross-validation of flow, scRNA-seq, and ChIP-seq reanalyses strengthens the conclusions.

      (4) The study clarifies the dual role of Id3, first as an HEB-dependent maturation factor for gdT17 cells, and as a suppressor of diversion to the ab lineages.

      Comments on revisions:

      In this revised version of their manuscript the authors have effectively addressed all of my previous concerns. In its current form the study represents a significant advancement in our understanding of how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. In this revised version of their manuscript the authors have effectively addressed all of my previous concerns. In its current form the study represents a significant advancement in our understanding of how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus.

    2. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their enthusiasm and insightful suggestions. Our responses to specific concerns and questions are detailed below.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors use Flow cytometry and scRNA seq to identify and characterize the defect in gdT17 cell development from HEB f/f, Vav-icre (HEB cKO), and Id3 germline-deficient mice. HEB cKO mice showed defects in the gdT17 program at an early stage, and failed to properly upregulate expression of Id3 along with other genes downstream of TCR signaling. Id3KO mice showed a later defect in maturation. The results together indicate HEB and Id3 act sequentially during gdT17 development. The authors further showed that HEB and TCR signaling synergize to upregulate Id3 expression in the Scid-adh DN3-like T cell line. Analysis of previously published Chi-seq data revealed binding of HEB (and Egr2) at overlapping regulatory regions near Id3 in DN3 cells.

      The study provides insight into mechanisms by which HEB and Id3 act to mediate gdT17 specification and maturation. The work is well performed and clearly presented. We only have minor comments.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Selvaratnam et al. defines how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. Using conditional HEB ablation driven by Vav Cre, flow cytometry, scRNA-seq, and reanalysis of ChIP-seq data the authors, provide evidence for a sequential model in which HEB and TCR-induced Egr2 cooperatively upregulate Id3, enabling gdT17 maturation and limiting diversion to the ab lineages. The work provides an important mechanistic insight into how the E/ID-protein axis coordinates gd T cell specification and effector maturation.

      Strengths include:

      (1) The proposed model that HEB primes, TCR induces, and Id3 stabilizes gdT17 cells in embryonal development is elegant and consistent with the findings.

      (2) The choice of animal models and the study of a precise developmental window.

      (3) The cross-validation of flow, scRNA-seq, and ChIP-seq reanalyses strengthens the conclusions.

      (4) The study clarifies the dual role of Id3, first as an HEB-dependent maturation factor for gdT17 cells, and as a suppressor of diversion to the ab lineages.

      Weaknesses:

      (1) The ChIP-seq reanalysis indicates overlapping HEB, E2A, and Egr2 peaks ~60 kb upstream of Id3. Given that the Egr2 data are not generated using the same thymocyte subsets, some form of validation should be considered for the co-binding of HEB and Egr2, potentially ChIP-qPCR in sorted gdT17 progenitors.

      We agree that this is a valid concern and continue to work on confirming the mechanism from several other angles. Validating HEB/E2A and Egr2 co-binding in gdT17 cell progenitors by ChIP-qPCR would/will be a very precise and definitive experiment, but it will be very challenging to perform, in part due to the low numbers of gdT17 precursors in the fetal thymus (note the y-axis scales in Fig. 1F, J). As a complementary approach, we have analyzed additional ChIP-seq data for HEB/E2A binding in Rag2<sup>-/-</sup> DN3 cells retrovirally transduced with the KN6 gdTCR cultured with stroma expressing the weak KN6 ligand T10 for 4 days. This analysis revealed that the binding of HEB/E2A on those sites persisted after weak gdTCR signaling, strengthening the likelihood that concurrent binding of HEB/E2A and Egr2 occurs during this developmental transition. We noted that HEB/E2A binding was slightly dampened in Rag2<sup>-/-</sup> DN3 + gdTCR cells relative to Rag2<sup>-/-</sup> DN3 cells, consistent with the induction of Id3 and subsequent Id3-mediated disruption of E protein binding. We also located HEB/E2A and Egr binding sites in close proximity in the two regions that shared peaks between HEB/E2A and Egr2 analyses (HE1 and HE2), in line with the potential participation of these two transcription factors in an enhanceosome binding complex.

      Furthermore, we examined the chromatin landscape of the Id3 locus by sorting WT DN3 and DN4 cells, as well as Rag2<sup>-/-</sup> DN3 cells to provide a genuine pre-selection context, and performing ATAC-seq (Figure 7–suppl 7A). Given the known ability of E2A and HEB to induce chromatin remodeling, we also examined accessibility in DN3 and DN4 cells from HEB cKO mice. Alignment of ATAC-seq and ChIP-seq peaks in the Id3 locus revealed accessibility of HE1 and HE2 in Rag2<sup>-/-</sup>, WT DN3, and WT DN4 cells. However, accessibility of HE1 and HE2 was dampened in HEB cKO cells, especially at the DN3 stage, suggesting that HEB may be involved in remodeling the Id3 locus, resulting in a poised state that enables TCR-dependent transcription factors to induce Id3 proportionally to TCR signal strength. These data are now presented as a new “Figure 7 – figure supplement 1” with corresponding Results, Discussion, and Methods updates.

      Our next story will be focused on a finer dissection of the Id3 cis-regulatory elements and their combinatorial regulation by HEB/E2A and other transcription factors, and how they relate to specific signaling pathways. For this study, we will modify the language regarding Egr2 to reflect the open questions that still remain to be addressed.

      (2) E2A expression is not affected in HEB-deficient cells, raising the question of partial compensation, a point that should be specifically discussed.

      This confounding factor is always an issue with E proteins. We have now added a section to the discussion that highlights previous literature and relates it to our findings.

      (3) All experiments are done at E18, when fetal gdT17 development predominates. The discussion could address whether these mechanisms extend to neonatal or adult gdT17 subsets.

      In our 2017 paper (PMID 29222418) we showed that HEB cKO mice have defects in the production of functional gdT17 cells in fetal and neonatal thymus and in the adult periphery (in lungs and spleen). While the adult thymus does not support the development of fully functional innate gd T cells, it does contain gdTCR+ cells that have activated the Sox-Maf-Rorc network (Yang 2023, PMID 37815917). It will be very interesting to assess the impact of HEB loss on these cells, and we are actively pursuing this goal. For now, we will add a paragraph to the discussion addressing what we know from previous work and what is yet to be learned.

      Reviewer #3 (Public review):

      Summary:

      The authors of this manuscript have addressed a key concept in T cell development: how early thymus gd T cell subsets are specified and the elements that govern gd T17 versus other gd T cell subsets or ab T cell subsets are specified. They show that the transcriptional regulator HEB/Tcf12 plays a critical role in specifying the gd T17 lineage and, intriguingly, that it upregulates the inhibitor Id3, which is later required for further gd T17 maturation.

      Strengths:

      The conclusions drawn by the authors are amply supported by a detailed analysis of various stages of T cell maturation in WT and KO mouse strains at the single cell level, both phenotypically, by flow cytometry for various diagnostic surface markers, and transcriptionally, by single cell sequencing. Their conclusions are balanced and well supported by the data and citations of previous literature.

      Weaknesses:

      I actually found this work to be quite comprehensive. I have a few suggestions for additional analyses the authors could explore that are unrelated to the predominant conclusions of the manuscript, but I failed to find major flaws in the current work.

      I note that HEB is expressed in many hematopoietic lineages from the earliest progenitors and throughout T cell development. It is also noteworthy that abortive gamma and delta TCR rearrangements have been observed in early NK cells and ILCs, suggesting that, particularly in early thymic development, specification of these lineages may have lower fidelity. It might prove interesting to see whether their single-cell sequencing or flow data reveal changes in the frequency of these other T-cell-related lineages. Is it possible that HEB is playing a role not only in the fidelity of gdT17 cell specification, but also perhaps in the separation of T cells from NK cells and ILCs or the frequency of DN1, DN2, and DN3 cells? Perhaps their single-cell sequencing data or flow analyses could examine the frequency of these cells? That minor caveat aside, I find this to be an extremely exciting body of work.

      Excellent question, and the underlying answer is yes, loss of HEB renders the cells more open to divergence to non-T lineages, even at the DN3 stage. Although our datasets did not reveal those cells, we have examined this question previously. In our 2011 paper (Braunstein, 2011, PMID 21189289) where we identified “DN1-like” cells arising from HEB-/- DN3 cells in OP9-DL1 co-cultures. These cells responded to IL-15 and IL-7 by differentiating into cytotoxic NK-like cells. We did not detect TCRb rearrangements but did not look for gdTCR rearrangements. Subsequently, multiple papers from other labs showed that ILC2 were greatly expanded in the thymus using Id-overexpression transgenic mice and HEB/E2A-double deficient mice (Miyazaki, 2023, PMID 28514688; Miyazaki, 2025, PMID 39904558; Berrett, 2019, PMID 31852728; Qian, 2019, PMID 30898894; Peng, 2020, PMID:32817168). The ILCs in these mice had TCRg rearrangements, consistent with a shared origin with WT thymic-derived ILCs. In unpublished data from our lab, we found an increase in the numbers of ILC2 but not ILC3 in HEB cKO fetal thymic organ cultures. We did not follow up on this work any further since the topic was being heavily pursued in other labs, but remain very interested in this branchpoint, and will mention the literature in the discussion.

      Joint recommendations for the authors:

      (1) Experimental validation (for mechanistic clarity)

      The ChIP-seq reanalysis indicates overlapping HEB, E2A, and Egr2 peaks ~60 kb upstream of Id3. Given that the Egr2 data are not generated using the same thymocyte subsets, some form of validation should be considered for the co-binding of HEB and Egr2, potentially ChIP-qPCR in sorted gdT17 progenitors to substantiate the proposed cooperative mechanism.

      See above; new experiments with ATAC-seq and additional ChIP-seq analysis.

      (2) Figures

      Potential inconsistencies in Figure 1H: In the legend to Figure 1H, Vg1-Vg5- cells are considered Vg6+ cells. Flow plots show reduced A Vg1-Vg5- population in HEBc ko mice, but the accompanying bar plot shows increased frequency of Vg6+ cells.

      Vg6 cells are actually considered to be Vg4-Vg5-Vg1- cells (not Vg4- Vg1- cells, which is important in the fetal context). The flow plot shows the percentage of Vg6 cells out of the Vg1-Vg4- population, whereas the bar plot shows the percentage of Vg6 cells out of all gdTCR+ cells. The ratio of Vg6 to Vg5 cells decreases within the Vg1-Vg4- population, whereas the overall percentages and numbers of Vg6 cells in all gd T cells is increased in HEB cKO mice. We have now more clearly explained this in the text and the figure legend.

      Clarify which cells produce IL-17A in Figure 1L.

      This plot is gated on all gd T cells stimulated with PMA/ionomycin; this has been added to the results and figure legend.

      In Supplementary Figure 2, legend, do the authors mean that TRGV4 was depleted? The authors write TRDV4. Please check.

      Thank you for catching this mistake, we have corrected it.

      In Figure 7, the Author showed Id3 mRNA expression. Can the expression of Id2 be included?

      That is a really interesting question, and we will follow up on it in future studies.

      If Id1 or Id4 are relevant for any of these studies, can their expression be shown in Supplementary Figure 3A? If these are minimally expressed or not expressed, this could be mentioned.

      Id1 and Id4 were not detectable in our studies, this is now stated in the results section describing expression of E proteins and Id proteins.

      (3) Discussion

      Discuss possible redundancy between HEB and E2A, as E2A expression appears unaffected in HEB-deficient cells.

      See above

      Address whether the mechanisms identified at E18 (embryonic stage) also apply to neonatal or adult γδT17 subsets.

      See above

      Expand on how HEB function may relate to other hematopoietic or early lymphoid lineages (NK/ILC, DN1-DN3 stages), based on reviewer curiosity.

      See above

      (4) Methods and terminology

      Define the terms γδTe1 and γδTe2 (e.g., early effector subsets).

      This has been defined more clearly in several sections of the text.

      Add details to the scRNA-seq methods section (average number of cells analyzed and sequencing depth per cell).

      These details have been added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We now performed new experiments that were included in the manuscript. Our new results show that that monocyte-derived dendritic cells primed in vivo during P. chabaudi infection, or in vitro with TNF express high levels or GLUT-1 (Figures 4M, 5D, 6L). Furthermore, our new data show that mice treated with 2-DG (na inhibitor of glycolysis) are more susceptible to infection (Figures 6N, O). In addition, new results of glucose uptake by muscle and adipose tissues were added to the manuscript. Finally, figure legends were revised, densitometric analysis performed, and other issues addressed in the text.

      Please see below a point-by-point reply to the Reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Kely C. Matteucci et al. titled "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF-1α axis plays a key role in host resistance to Plasmodium infection" describes that TNF induces HIF-1α stabilization that increases GLUT1 expression as well as glycolytic metabolism in monocytic and splenic CD11b+ cells in P. chabaudi infected mice. Also, TNF signaling plays a crucial role in host energy metabolism, controlling parasitemia, and regulating the clinical symptoms in experimental malaria.

      This paper involves an incredible amount of work, and the authors have done an exciting study addressing the TNF-iNOS-HIF-1α axis as a critical role in host immune defense during Plasmodium infection.

      Reviewer #2 (Public Review):

      Summary:

      The premise of the manuscript by Matteucci et al. is interesting and elaborates on a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      Strengths:

      The authors provide elegant in vivo experiments to characterize metabolic consequences of Plasmodium infection, and isolate cell populations whose metabolic state is regulated downstream of TNFa. Furthermore, the authors tie together several interesting observations to propose an interesting model.

      Weaknesses:

      The main conclusion of this work - that "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF1a axis plays a key role in host resistance to Plasmodium infection" is unsubstantiated. The authors show that TNFa induces GLUT1 in monocytes, but never show a direct role for GLUT1 or glucose uptake in monocytes in host resistance to infection (nor the hypoglycemia phenotype they describe).

      We kindly disagree with the Reviewer. There is a series of experiments showing that TNFR KO (Figures 1, 2, 4), HIF1a KO (Figure 5) and iNOS KO (Figure 6) mice have partially impaired inflammatory response and control of parasitemia (Figures Figures 1E, 5G and 6B).

      To further address the issue raised by the reviewer, we performed two sets of experiments. First, we show, in vitro, the impact of TNF stimulation on GLUT1 expression and glucose uptake (Figure 4M, 5D, 6L). Our results show that GLUT1 is increased after 18 hours with TNF (100 ng/mL) stimulation in MODCs from WT mice but not from iNOS KO, HIF1a KO e TNFR KO mice. Similar results were obtained with monocytic cells derived from infected mice (Figure 4L, 5C, 6K). The results support the discussion by demonstrating that TNF stimulation influences GLUT1 expression in monocytic cells. This aligns with the proposed mechanism that TNF signaling regulates HIF-1α stabilization and glycolytic metabolism via RNI. The absence of GLUT1 upregulation and glucose uptake in TNFR KO, iNOS KO and HIF-1α KO mice further reinforces the role of RNI in promoting HIF-1α stabilization, as suggested in the discussion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      All Figure legends are not precise about the data express means {plus minus} standard errors of the means (SEM) or SD. Figure 1D shows no SD in the data from the uninfected group. It strongly suggests precise and improving all figure legends, giving more details in terms of including an explanation of all symbols, non-standard abbreviations, error bars (standard deviation or standard error), experimental and biological replicates, and the number of animals, and representative of the independent experiments.

      We apologize for the lack of details in the Figure legends. As requested, we are now indicating whether we used SEM or STDV, number of mice per group, number of replicate experiments. We also clarified the groups that are being compared, and the statistical significance indicated by the symbols. We also standardized symbols as asterisk only, and number of asterisk indicating the significance.

      Figure 1. The figure legend has no information about the organ for which TNF mRNA was measured (Figure 1D). Also, regarding the TNF data, Figure 1 C e 1D shows that the circulating levels of TNF and the expression of TNF mRNA in the liver peaked at the same time point, and after 6h, there is no difference between infected and uninfected mice. It would be expected that the TNF mRNA expression would be detected earlier than the protein, assuming that the primary source of TNF is from the liver. Is there another organ that could mainly source blood TNF levels? Did the authors have a chance to measure the blood TNF levels during infection (0-8dpi), besides the measurement at different times only on day 8?

      We included in the legend of Figure 1D that mRNA was extracted from liver.

      Liver and spleen are the main reservoir of infected erythrocytes and the main source of cytokines during the infection with the erythrocytic stage of malaria. The results presented in Figures 1C and 1D are from in vivo experiments, not a controlled cellular experiment in vitro. So, we can not conclude about exact time and synchronous production of TNF mRNA and protein. We have published earlier that during P. chabaudi infection, the peaks of TNF mRNA expression and the levels of circulating TNF protein occur between midnight and 6 am (Hirako at al., 2018). Hence the results are consistent in the results described here. In addition, this earlier study also shows that the same pattern of TNF at days 6 and 8 post-infection are similar. Furthermore, in another studies, we reported that the peak of TNF production occurs between days 6 and 10 post P. chabaudi infection (Franklin et al, PNAS, 2009; Franklin et al, Microbes and Infection, 2007). This is now clarified in the text (page 05, line 132):

      “As previously demonstrated, the circulating levels of TNF and expression of TNF mRNA in the liver peaked at 6 am (end of dark cycle) at 8 dpi (Figure 1C and 1D), and has been reported to peak between days 6 and 10 post-infection, with a consistent pattern observed on days 6 and 8.”

      Figure 2. "We observed that in naïve animals, all of these parameters were similar in TNFR<sup>-/-</sup> and C57BL/6 mice (Figures 2A-D, top panels, and Figures 2E-H)." Interestingly, the respiratory exchange rate of TNFR<sup>-/-</sup> uninfected mice seems higher in TNFR<sup>-/-</sup> uninfected mice than in naïve uninfected mice, and this pattern seems to be more pronounced in TNFR<sup>-/-</sup> uninfected mice. Is there any suggestion that could explain the change in respiratory exchange rate behavior without infection in those animals?

      At the moment, we have not investigated the basis of this difference between uninfected WT and TNFR KO mice, which goes beyond the scope of this research. This is indeed an interesting observation that should be pursued in the future by our group and elsewhere. We mentioned this difference, when describing the results (page 06, lines 155):

      “We observed that in naïve animals, all of these parameters were similar in TNFR<sup>-/-</sup> and C57BL/6 mice (Figures 2A-D, top panels and Figures 2E-H), with a slightly higher respiratory exchange rate in uninfected TNFR<sup>-/-</sup> mice. In contrast, all the evaluated parameters were decreased in infected C57BL/6 mice compared to their naïve counterparts during the light and dark cycles. When we analyzed only infected mice, the alterations in all parameters were milder in TNFR<sup>-/-</sup> compared to C57BL/6 mice (Figures 2A-D bottom panels and 2E-H).”

      Figure 3. To give an idea of the main population of non-parenchymal cells, it will be helpful to clarify briefly how non-parenchymal cells from the liver of infected or uninfected mice were isolated.

      We described in detail at Material and Methods (Page 19, Lines 566.)

      Figure 3, B, C, D, G and Figure 4K and Figure 5 A and B - Semi-quantitative data through the densitometric analysis of western blots should be included in all figures.

      Thank you for the suggestion. We now included the densitometric analysis for all Western blot results in Supplementary figure.

      Figure 4. The author describes, "We observed that except for Hexokinase-3, the expression of mRNAs of glycolytic enzymes (Hexokinase-1, PFKP, and PKM) was increased in C57BL/6 but not TNFR-/- 8dpi." Sometimes, it is hard to understand which groups have been compared to some data. Be precise in describing the statistical analysis between the groups. It seems that those genes were increased in "infected C57BL/6 in comparison to uninfected mice, but not TNFR-/- 8-dpi. Moreover, even though the authors include statistic symbols "ι, ιι, ιιι" in other legends, there is no explanation about statistic symbols in the legend of Figure 4.

      As mentioned above, we improved the descriptions of all figures in the legend, and when necessary in the main text describing the results.

      Figure 5. The authors describe, "We found that GLUT1 protein and glycolysis (ECAR) was impaired, respectively, in monocytic cells and splenic CD11b+ cells from infected, as compared to uninfected HIF-1aΔLyz2 mice (Figures 5C-5E)." The GLUT-1 expression was inhibited in both cells compared to HIF-1afl/fl mice but not even close to impaired GLUT-1 expression. There is still a robust amount of GLUT-1 expression, and significantly higher when compared to cells from uninfected mice.

      We tuned our statement to partially impaired, indicating that other host or parasite components maybe be also influencing GLUT-1 expression. In fact, we have recently published that IFNγ has also an important role in regulating GLUT1 expression in MO-DCs and this reference is mentioned in the text (page 10, line 291):

      “We found that glycolysis (ECAR) and GLUT1 expression were impaired, though partially, in monocytic and splenic CD11b+ cells from infected HIF-1aΔLyz2 mice (Figures 5C-5E) compared to infected WT mice. The level of GLUT1 expression that is still maintained is likely due to other host or parasite factors, such as IFN-γ (Ramalho 2024).”

      Figure 6. It is essential to have more information about the number of replicates in Figure 6A. However, there are just two dots replicates in the condition CD11b+ splenic cells from C57BL/6 stimulated with or without LPS (purple bars). It is essential to be precise regarding the number of experimental and biological replicates in each experiment and the statistical analysis that has been applied, including this group. Furthermore, the author concludes, "...these data demonstrated that RNI induces HIF-1α expression...." This conclusion needs a more careful description since no data supports that monocytic cells or splenic CD11b+ cells from iNOS-/- infected mice decrease stabilization of HIF-1αm using blotting, as shown in Figure 5 A.

      As mentioned above the number of replicates for each experiment was included in the figure legends.

      Minor Points.

      Figure 3. "Hepatocytes have an important role in glucose uptake from the circulation, and they do this primarily through GLUT2 (38), whose mRNA expression was downregulated (Figure 3A) and protein expression unchanged in response to Pc infection (Figure 4K)." I suggest moving the Figure 4K to Figure 3 to make it easy to follow the data description.

      We thank the reviewer for the suggestion. However, we chose to keep Figure 4K in Figure 4, as this panel includes data from TNF receptor deficient mice, and the analysis of TNF knockout models is first introduced and discussed in Figure 4. For clarity and consistency, we therefore maintained this panel within Figure 4.

      Line 433. Replace iNOS for iNOS-/- mice.

      iNOS is now replaced for iNOS-/- mice.

      Reviewer #2 (Recommendations For The Authors):

      The premise of the manuscript by Matteucci et al. is interesting and elaborates on a mechanism via which TNFa regulates monocyte activation and metabolism to promote murine survival during Plasmodium infection. The authors show that TNF signaling (via an unknown mechanism) induces nitrite synthesis, which (via yet an unknown mechanism), and stabilizes the transcription factor HIF1a. Furthermore, HIF1a (via an unknown mechanism) increases GLUT1 expression and increases glycolysis in monocytes. The authors demonstrate that this metabolic rewiring towards increased glycolysis in a subset of monocytes is necessary for monocyte activation including cytokine secretion, and parasite control.

      The main goal of this work is to study the interplay of TNF/HIF1a/iNOs in the pathogenesis in an experimental model of malaria. To dissect the molecular mechanism by which TNF induces reactive nitrogen species and regulates HIFa expression is beyond the scope of our research. Nevertheless, there is a vast literature addressing these issues. We now include in the discussion a paragraph describing the main conclusion of these studies published previously (page 12, line 363):

      "Previous studies have shown that TNF induces the production of RNI through the upregulation of iNOS via the NF-κB pathway (63, 64). TNF-mediated iNOS expression is critical for NO production, which in turn stabilizes HIF-1α by inhibiting prolyl hydroxylases (PHDs) even under normoxic conditions (58, 59). HIF-1α then upregulates the expression of glycolytic genes, including GLUT1 (22, 62).”

      Major comments

      Issues concerning novelty

      Some of the reported observations are not novel. TNFa and TNFa signaling has been demonstrated to contribute to the release of certain cytokines, and to contribute to the control parasitemia (PMID: 10225939). TNFa has been shown to increase glucose uptake in tissues (PMID: 2589544). There is a textbook about the role of INOS during the pathogenesis of malaria, including its association with parasite control (https://link.springer.com/chapter/10.1007/0-306-46816-6_15). Furthermore, other mechanisms controlling glycemia during Plasmodium infection have been shown (PMID: 35841892). The authors should adequately discuss other papers which have reported some of their findings.

      Thanks for the comments on previously existing literature. We are well aware of some of this earlier literature. Some of these earlier findings are mentioned in our manuscript. We emphasized these fundamental findings in the discussion, as requested (page 12, line 368):

      “TNF has been described as a critical mediator in malaria, driving cytokine release and parasitemia control (PMID: 10225939). It also enhances glucose uptake in tissues, aligning with our findings of increased glycolysis in monocytes (PMID: 2589544). The role of iNOS in malaria is well documented. IFN-γ and TNF induced the production of NO, which inhibits parasite growth but can cause tissue damage and organ dysfunction, especially in severe malaria (Mordmüller et al., 2002). Recent studies also highlight the complexity of glycemia regulation during Plasmodium infection describing its role in modulating parasite virulence and transmission (PMID:35841892). These studies demonstrate the critical function of TNF and iNOS in immune responses against Plasmodium, aligning with our findings of this axis and metabolic rewiring that are essential for monocyte activation and outcome of Pc infection.”

      The authors claim that "Reprogramming of host energy metabolism mediated by the TNF-iNOS-HIF1a axis plays a key role in host resistance to Plasmodium infection," and contributes significantly to their effector functions (particularly parasite clearing), and the systemic drop in glycemia observed during Pc infection. Although the authors show that TNFa does result in altered metabolism and increased GLUT1 levels in a subpopulation of monocytes, the evidence that TNFa-induced glylcolysis plays a key role in host resistance is correlative at best.

      This is an important question. We did show that TNFR KO have higher parasitemia. But TNF is pleiotropic cytokine and has multiple roles on innate and acquired immunity. The experiment we have performed and helps to address this issue is the in vivo treatment with 2DG. We found that treatment with this inhibitor of glycolysis results in a increase of parasitemia. These results are now included in Figure 6.

      When considering that the majority of monocytic populations are reduced in frequency and only a small subset (i.e., Monocyte-derived DCs) increase in frequency (Fig 3K) during Pc infection, this makes it very difficult to demonstrate that a cell population whose overall frequency reduces contributes significantly to the drop in glycemia during Pc infection. The authors should therefore include experiments that demonstrate that the inhibition of glycolysis induced by TNFa in monocytes is protective and/or contributes to a decrease in extracellular glucose. The authors could assess the impact of the loss of function of GLUT1 on activated monocytes and monocyte-derived DCs on glycemia upon TNFa stimulation.

      We agree. We focused on monocytes and the derived inflammatory monocytes and MO-DCs. In fact, the frequency of monocytes, considering the inflammatory monocytes and MO-DCs, is increased both in spleen and liver. One interesting result is that the HIF1a Lysm KO mice has impaired metabolism, attenuated hypoglycemia and increased parasitemia (Figure 5). Nevertheless, we agree that our current data thus not proof that the glycemia is due to the consumption of glucose by the activated monocytes, and that these are the only cells with increased glucose consumption. This is now added to the discussion (page 13, line 395):

      "Although the frequency of MO-DCs increases during infection, other cell populations may also contribute to glucose consumption. Further experiments, including the assessment of GLUT1 function in these populations, are needed to clarify their contribution to glucose consumption during infection."

      Furthermore, in the current state of the manuscript, it is unclear how activated monocyte populations uptake glucose. The authors claim that glucose uptake by activated monocytes is GLUT1-dependent, however, glucose transport via GLUT1 is insulin-dependent. Since Plasmodium infection is associated with insulin resistance, and almost unquantifiable levels of insulin (PMID: 35841892), and TNFa itself induces insulin resistance (PMCID: PMC43887), it is unclear how the activated monocyte population uptakes glucose. If the authors consider TNFa to be sufficient for GLUT1 induction, in vitro experiments (TNFa+monocytes) could bolster this claim (and support that GLUT1 is induced in an insulin-independent mechanism.

      There is significant evidences indicating that in contrast to GLUT4, induction of GLUT1 in mice is independent of insulin (PMID: 9801136). In our case, seems to be induced by the cytokines TNF and IFN𝛾(this study and Ramalho et al., 2024). We now performed experiments exposing monocytes to TNF and evaluating GLUT1 expression. The results indicate that monocytes exposed to TNF (100 ng/mL) for 18 hours from WT mice exhibited a significant increase in GLUT1 expression. This increase was comparable to the increased-GLUT1 phenotype observed in infected animals. The results of this experiment were included in the manuscript.

      A text was included to the discussion to clarify the issue of insulin dependence of GLUT1 expression (page 13, line 388):

      “GLUT1 expression is recognized as independent of insulin, in contrast to GLUT4 (PMID: 9801136). In our model, this regulation appears to be driven by pro-inflammatory cytokines, particularly TNF. Supporting this, our results show that in vitro stimulation with TNF, significantly increases GLUT1 expression in monocytes, accordingly to the ex vivo phenotype observed in infected animals.”

      Alternative hypothesis which might explain their phenotypes

      Figure 2 A-H: The metabolic effects of the genetic manipulations including INOS KO, TNFR KO, and HIF-1α∆Lyz2 could be explained by lesser disease morbidity owed to a reduction of inflammatory response during infection. Under this condition, the development of anorexia will not be as profound in the knock-outs compared with wild-type littermate controls, since anorexia of infection is tightly linked to the magnitude of inflammatory response. Accordingly, infected knock-out animals can keep eating, which presumably impacts glycemia, maintenance of core body temperature, and overall energetics of infected mice. The authors should exclude this possibility.

      We consider this possibility and the discussion now elaborates about this alternative hypothesis. We believe, that these two mechanisms are not mutually exclusive (page 16, line 474):

      “Although restored physical activity, food consumption and energy expenditure in knockout mice may contribute to the observed systemic metabolic parameters by altering energy balance, these effects are not mutually exclusive with the TNF-driven, cell-intrinsic metabolic mechanisms described here.”

      Minor comments

      The authors showed increased parasitemia upon TNFR and HIF1a depletion in the LyZ2 compartment. The same was observed upon organismal INOS depletion. This raises the question of whether the TNFHIF-INOS signaling axis is adaptive or maladaptive during Pcc infection. The authors should show host survival in mice lacking TNFR and HIF1a in the LyZ2 compartment, and in mice lacking INOS (presumably, they have these data).

      Despite the fact the various knockout mice have increased parasitemia and signs of disease, they all survive the infection. This is now included in the Figure legends.

      Are the higher tissue glucose levels specific to the liver and the spleen or this is a more general event? Have the authors looked at other organs?

      We now added the results of glucose uptake in the muscle and adipose tissues in figure 2. The fact that the glucose uptake is not increased in muscle and adipose tissue, further suggest that the increased glucose uptake in this model is insulin independent.

      Figure 1F: All core body temperatures are within the physiological range, i.e., >36 degrees C. This makes it unclear why the authors regarded this as hypothermia. The authors should present experiments demonstrating the development of hypothermia in Figure 1F, as they claim this.

      Temperature changes in mouse kept in animal house have been an issue discussed in the field. It is clear, however, that early in the morning (end of active period) mice have torpor. Lower temperature and physical activity.

      In Figure 4, since the authors already suggested that extra-hepatic cells, and not the liver parenchyma, contribute to glucose uptake, the authors should clarify why they analyzed the whole liver in Figure 4, and not extra-hepatic cells. Furthermore, the authors should quantify the hepatic monocytic population in non-infected versus infected wild-type animals.

      The reason we used whole liver, is that the number of non-parenchymal cells obtained from liver is limited for Western blot analysis. We thought that was important to show that expression of GLUT1 was decreased in the liver of TNFR KO mice. Nevertheless, the level of TNFR expression in different cell types in the liver was shown by flow cytometry. In addition, we performed the WB with cells extracted from the spleen, where lymphoid and myeloid cells are more abundant.

      Line 87: Phagocytizing parasitized what?

      This has been corrected in the manuscript.

      Line 111 Define RNI before being used.

      Is there a gender disparity in the TNFR KO phenotype? If yes, the authors should comment about this in their discussion.

      This has been defined and addressed in the manuscript

      Line 192: Did the authors mean 3B??

      In 3M, please plot monocytes from uninfected animals.

      The plot of uninfected animals are now included in Figure 3M

      Line 390 Remove the extra dash in HIF1a.

      Extra dash has been removed.

      Line 397 Define RA

      RA is now defined.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study represents an important advance in our understanding of how certain inhibitors affect the behavior of voltage gated potassium channels. Robust molecular dynamics simulation and analysis methods lead to a new proposed inhibition mechanism with strength of support being mostly convincing, and incomplete in some aspects. This study has considerable significance for the fields of ion channel physiology and pharmacology and could aid in development of selective inhibitors for protein targets 

      We are encouraged by this favorable assessment and thank editors and reviewers for their constructive feedback and recommendations. We trust that the revisions made to the manuscript will clarify the aspects that had been perceived to be incomplete.

      Reviewer #1 (Public review):

      Summary: 

      This study seeks to identify a molecular mechanism whereby the small molecule RY785 selectively inhibits Kv2.1 channels. Specifically, it sought to explain some of the functional differences that RY785 exhibits in experimental electrophysiology experiments as compared to other Kv inhibitors, namely the charged and non-specific inhibitor tetraethylammonium (TEA). This study used a recently published cryo-EM Kv2.1 channel structure in the open activated state and performed a series of multi-microsecond-long all-atom molecular dynamics simulations to study Kv2.1 channel conduction under the applied membrane voltage with and without RY785 or TEA present. While TEA directly blocks K+ permeation by occluding ion permeation pathway, RY785 binds to multiple nonpolar residues near the hydrophobic gate of the channel driving it to a semi-closed non-conductive state. This mechanism was confirmed using an additional set of simulations and used to explain experimental electrophysiology data.

      Strengths:

      The total length of simulation time is impressive, totaling many tens of microseconds. The study develops forcefield parameters for the RY785 molecule based on extensive QM-based parameterization. The computed permeation rate of K+ ions through the channel observed under applied voltage conditions is in reasonable agreement with experimental estimates of the singlechannel conductance. The study performed extensive simulations with the apo channel as well as both TEA and RY785. The simulations with TEA reasonably demonstrate that TEA directly blocks K+ permeation by binding in the center of the Kv2.1 channel cavity, preventing K+ ions from reaching the SCav site. The conclusion is that RY785 likely stabilizes a partially closed conformation of the Kv2.1 channel and thereby inhibits the K+ current. This conclusion is plausible given that RY785 makes stable contact with multiple hydrophobic residues in the S6 helix. This further provides a possible mechanism for the experimental observations that RY785 speeds up the deactivation kinetics of Kv2 channels from a previous experimental electrophysiology study.

      Weaknesses:

      The study, however, did not produce this semi-closed channel conformation and acknowledges that more direct simulation evidence would require extensive enhanced-sampling simulations. The study has not estimated the effect of RY785 binding on the protein-based hydrophobic pore constriction, which may further substantiate their proposed mechanism. And while the study quantified K+ permeation, it does not make any estimates of the ligand binding affinities or rates, which could have been potentially compared to the experiment and used to validate the models. 

      As stated in the original manuscript, we concur that the mechanism we propose remains hypothetical until further studies of the complete conformational cycle of the channel are conducted. The recently determined structure of a Kv2.1 channel in the closed state (Mandala and MacKinnon, PNAS 2025) presents an excellent opportunity to do so. Indeed, a cursory analysis of that structure shows that a Pro-Ile-Pro motif in helix S6 marks the position of the intracellular gate, where the pore domain constricts maximally (aside from the selectivity filter). As illustrated in Fig. 5, this motif is precisely where the benzimidazole and thiazole moieties of RY785 bind in our simulations. The mechanism we outline in Fig. 7 thus seems very plausible, in our view; that is RY785 occludes the K<sup>+</sup> permeation pathway before the pore domain reaches the closed conformation, explaining the observed electrophysiological effects (see Discussion). The Discussion has been revised to note the recent discovery of the aforementioned structure, its implications for the mechanism we propose, and the opportunities for further research that are now open.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Zhang et al. investigate the conductivity and inhibition mechanisms of the Kv2.1 channel, focusing on the distinct effects of TEA and RY785 on Kv2 potassium channels. The study employs microsecond-scale molecular dynamics simulations to characterize K+ ion permeation and compound binding inhibition in the central pore. 

      Strengths:

      The findings reveal a unique inhibition mechanism for RY785, which binds to the channel walls in the open structure while allowing reduced K+ flow. The study also proposes a long-range allosteric coupling between RY785 binding in the central pore and its effects on voltage-sensing domain dynamics. Overall, this well-organized paper presents a high-quality study with robust simulation and analysis methods, offering novel insights into voltage-gated ion channel inhibition that could prove valuable for future drug design efforts.

      Weaknesses:

      (1) The study neglects to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, there is potential for allosteric binding sites in the voltage-sensing domain (VSD), as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019).

      As noted in the manuscript, we designed our simulations to explore the possibility that RY785 binds within the pore domain, because TEA and RY785 are competitive and TEA is known to bind within the pore. That RY785 did in fact spontaneously and reproducibly bind within the pore was however not a predetermined outcome; if the site of interaction for the inhibitor was elsewhere in the channel, the simulation would not have shown a stable associated state, which would have prompted us to examine other possible sites, including the voltage sensors. It was also not predetermined or foreseeable a priori that the mode of interaction we observed in simulation provides a straightforward rationale for the electrophysiological effects of RY785. Based on our results, therefore, we believe that RY785 binds within the pore of Kv2. As stated by the reviewer, other allosteric modulators are known to bind instead to the sensors; to our knowledge, however, there is no precedent of a small-molecule inhibitor that simultaneously acts on the sensors and the pore domain. We therefore believe that future studies should focus on corroborating or refuting the mechanism we propose, through additional experimental and computational work; if, contrary to our claim, RY785 is found not to bind to the pore domain, it would be logical to explore other possible sites of interaction, as the reviewer suggests. The Discussion has been modified to address this point.

      (2) The study describes RY785 as a selective inhibitor of Kv2 channels and characterizes its binding residues through MD simulations. However, it is not clear whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      To clarify this question, we have included a multiple sequence alignment as Supplementary Figure 1; the revised manuscript refers to this figure in the Discussion section. The alignment reveals that the cluster of residues forming contacts with RY785 (Val409, Pro406, Ile405, Ile401, and Val398) is indeed specific to Kv2.1. Among Kv channels, Kv3.1 and Kv4.1 exhibit the greatest similarity to Kv2.1 at these positions, but they differ in a crucial substitution: Ile405 in Kv2.1 is replaced by Val. This replacement shortens the sidechain, undoubtedly reducing the magnitude of the hydrophobic interaction between inhibitor and channel (Val is approximately 6 kcal/mol, i.e. 1,000 times, more hydrophilic than Ile). Kv5.1 differs from Kv2.1 at two positions: Pro406 is replaced by His, and Val409 by Ile. The introduction of His abolishes the hydrophobic interaction at that position, and the need for hydration likely perturbs all adjacent contacts with RY785. Lastly, Kv6-Kv10 and Cav channels feature entirely different residues at these positions. Consistent with these findings, a recent study by the Sack lab (https://elifesciences.org/articles/99410) has demonstrated that Kv5, Kv6, Kv8, and Kv9 pore subunits confer resistance to RY785, while a high-throughput electrophysiological study carried out by Merck (Herrington et al., 2011) reported that RY785 shows no significant activity against Cav channels. The sequence alignment offers a simple interpretation for these experimental observations, namely that RY785 is recognized by Kv2 channels through the abovementioned hydrophobic cluster within the pore domain.

      (3) The study does not clarify the details, rationale, and ramifications of a biasing potential to dihedral angles.

      We refer the reviewer to published work, for example Stix et al, 2023 and Tan et al, 2022. We provide additional comments below.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing, yet it was not revealed whether polar groups of RY785 always interact with K+ ions.

      We detected no persistent specific interactions between RY785 and the permeant K+ ions.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The manuscript describes atomistic molecular dynamics (MD) simulations of a voltage-gated potassium channel Kv2.1 using its cryo-EM structure in the open activated state and its inhibition by a classical non-specific cationic blocker tetraethylammonium (TEA) as well as a novel selective inhibitor RY785. Using multi-microsecond-long all-atom MD runs under the applied membrane voltage of 100 mV the authors were able to confirm that the channel structure represents an open conducting state with the computed single-channel conductance lower than experimental values, but still in the same order of magnitude range. They also determined that both TEA and RY785 bind in the channel pore between the cytoplasmic hydrophobic gate and narrow selectivity filter (SF) region near the extracellular side. However, while TEA directly blocks a knock-on K+ conduction by physically obstructing ion access to the SF, the mechanism of action of RY785 is different. It does not directly prevent K+ access to the SF but rather binds to multiple residues in the hydrophobic gate region, which effectively narrows a pore and drives the channel toward a semi-closed nonconductive conformation, which might be distinct from one with the deactivated voltage sensors and closed pore observed at hyperpolarized membrane potentials. However, additional studies beyond the scope of this work might be needed to fully establish this mechanism as suggested by the authors.

      The manuscript is written very well and represents a significant advance in the field of ion channel research. I do not have any major issues, which need to be addressed. However, I have several suggestions.

      For the apo-channel K+ conduction MD simulation under the applied voltage, the authors seem to observe mostly a direct or Coulomb knock-on mechanism across the SF with almost no water copermeation. This is in line with computational electrophysiology studies with dual membrane setup by B. de Groot and others but in disagreement with multiple previous studies by B. Roux and others also using applied electric field and CHARMM force fields as in the present study. I wonder why the outcomes are so different. Is it related to the Kv2.1 channel itself, a relatively small applied electric field used (corresponding to a membrane potential of 100 mV vs. 500-750 mV used in many previous simulations), ion force field (e.g., LJ parameters), or some other factors? Could weak dihedral restraints on the protein backbone and side chains contribute to this mechanism? I also wonder if the authors might have considered different initial SF ion configurations. Related to that, I wonder if the authors observed any SF distortions in their simulations including frequently observed backbone carbonyl flipping and/or dilation/contraction.

      We are aware of these discrepancies between published simulation studies, but cannot offer a satisfactory explanation, beyond speculation. The reviewer is correct that the mechanism of ion permeation we observe is comparable to that reported by de Groot, as we noted in Tan et al, 2022 and Stix et al, 2023. Neither in this nor in those previous studies did we observe any persistent distortions of the selectivity filter – but that outcome was expected by construction. The weak biasing potentials acting on the mainchain dihedral angles allow for local fluctuations but not a persistent deformation, relative to the conductive form determined experimentally.

      For MD simulations with the ligand present, I wonder if the authors can comment on the effect of the ligand especially RY785 on the pore size or more importantly size of the hydrophobic gate. The presence of the ligand itself would definitely result in a narrower pore, but I also wonder if this would also lead to a rearrangement of pore sidechain and/or backbone residues, which would lead to a narrower pore from a protein itself thus confirming the proposed mechanism of driving the channel towards a semi-closed state. It is easy to compute but I wonder if the presence of weak dihedral restraints may preclude this analysis.

      Yes, while the simulation design used in this study allows for local fluctuations in the mainchain structure and nearly unrestricted sidechain dynamics, changes in either the secondary or tertiary structure of the channel are strongly disfavored. This approach is thus sufficient to examine ligand binding or ion flow in the microsecond timescale but not channel gating. In the revised version of the Discussion, we outline a roadmap for future computational studies of that gating process, on the basis of the open-channel structure we used and the recently determined structure of the closed state.

      The authors state that RY785 does not block K+ ion, but it does significantly slow the rate of K+ ion access to the pore Scav site. Is this not a part of the mechanism for inhibition of the channel? The authors seem to focus on the primary mechanism of inhibition as the RY785 promoting channel closing, but would it not also reduce K+ current in the open state by slowing the rate of K+ entry into the cavity and selectivity filter? The authors should address this point in the text. I am also somewhat confused that in the MD simulations performed by the authors, there is still some K+ conduction with RY785 in the pore, which is not in 100% agreement with electrophysiology experiments. Does it mean that the channel in the simulations has not yet reached that semiclosed state or a reduced K+ conduction is not observed experimentally?

      The salient experimental observation is RY785 abrogates K+ currents through Kv2 channels (Herrington et al, 2011; Marquis et al, 2022). In our view, that observation can be explained in one of two ways: either RY785 completely blocks the flow of K+ ions across the channel while the pore domain remains in the conductive, open state – like TEA does – or RY785 induces or facilitates the closing of the channel, thereby abrogating K+ flow. The fact that we observe K+ flow while RY785 is bound to the channel is therefore not in disagreement with the electrophysiological measurements, but it does rule out the first of those two possible interpretations of the existing experiments. As it happens, the second possible explanation, i.e. that RY785 facilitates the closing of the pore domain, also provides a rationale for another puzzling experimental observation, namely that RY785 shifts the voltage dependence of the currents produced by the voltage sensors as they reconfigure to open or close the intracellular gate.

      Also, I wonder if the authors considered that since there are 4 potential equivalent sites in the pore (although, overlapping) more than one RY785 might be needed to prevent K+ conduction, even though the experimental Hill coefficient of ~1 does not indicate cooperativity.

      Admittedly, our simulation design was based on the premise that only one RY785 molecule might be recognized within the pore. Based on the outcome of the simulations, we are confident that this assumption was valid, as the binding pose that we identified rules out multiple occupancy – which would be indeed consistent with a Hill coefficient of ~1.

      I also wonder if the authors considered estimating ligand binding affinities and/or "on" rates from their simulations to have a more direct comparison with experiments and test the accuracy of their models. There are multiple enhanced sampling techniques allowing to do that, although it can be a study on its own.

      We thank the reviewer for this suggestion, which we will consider for future studies.

      The authors also discussed that they could not study Kv2.1 deactivation in a reasonable simulation time. Indeed it is very challenging but they should cite previous studies e.g. 2012 Jensen et al paper (PMID: 22499946) on this subject. There are structures of Kv channels with the deactivated voltagesensing domains (VSDs) available, e..g of EAG1 channel (PDB 8EP1), although they do not have a domain-swapped architecture. There are structural modeling approaches including AlphaFold, which can be potentially used to get a Kv2.1 structure with deactivated VSDs, and targeted MD, string method etc. can be used to study transition between different states with and without bound ligands.

      As noted, a structure of a Kv2 channel with a closed pore has now been determined experimentally. In the revised Discussion, we comment on what this structure tells us about the mechanism of inhibition we propose, and how it could be leveraged in future studies.

      The authors should be commended for doing a thorough QM-based force field parameterization of RY785. However, a validation of the developed force field parameters is lacking. In terms of QM validation, a gas-phase dipole moment can be compared in terms of direction and magnitude (it's normal to be overestimated to implicitly reflect solvent-induced polarization). If there are any experimental data available for this compound, they can be tested as well.

      We agree with the reviewer that forcefield validation is important, but to our knowledge no experimental data exists for RY785 to compare with, such as hydration free energies. We did however compare the gas-phase dipole moment computed with QM and with the MM forcefield we developed based on atomic charges optimized to reproduce QM interactions with water. The MM model yields a gas-phase dipole moment of 3.94 D, which is 20% greater than the QM dipole moment, or 3.23 D. That deviation is within the typical range for electroneutral molecules (Vanommeslaeghe et al, 2010), and as the reviewer notes, reflects the solvent-induced polarization implicit in the derivation of atomic charges. As shown in Author response image 1, the orientation of the dipole moment calculated with MM (right, blue arrow) is also in good agreement with that predicted with QM (left)

      Author response image 1.

      (1) p. 3 "the last two helices in each subunit" -> "the last two transmembrane helices in each subunit".

      Thanks. Corrected.

      (2) p. 5 "and therefore do not cause large density variations e.g. 100-fold or greater.". I would be more specific here and indicate what are the actual variations in density or free energy encountered and how they are compared e.g. with thermal fluctuations (~kT).

      Thanks. The exact variations in K+ density had been included in the original manuscript, in Fig. 2C, but we failed to refer to this figure at this point in the description of the results. The ion density is plotted in a log scale to facilitate conversion to free-energy units. Corrected.

      (3) p. 6 Figure 1 caption "and along the perpendicular to the membrane" -> "perpendicular to the membrane normal"?. "The channel is an assembly of four distinct subunits (in colors);" -> "The channel is an assembly of four identical subunits (distinct by colors);". I would use the same protein coloring method in panels B and C as was used in panel A.

      Thanks. Corrected as needed.

      (4) p. 6 Figure 2 In panel B I would appreciate a representative complete ion permeation event trace. In panel C caption I would indicate corresponding sites "S0-S4, Scav" for each residue mentioned. I also would not use gray color for site names in the figure.

      We appreciate the suggestion, but believe the figure is clear as is. Panel B is meant to focused on the mechanism of knock-on. Panel A includes numerous complete permeation events. 

      (5) p. 7 Figure 3 caption. Please indicate which atoms of residues T373 and P406 were used to define SF and gate positions. Chemical structures of both TEA and RY785 would be useful. In panels C and F channel interacting residues (if any) would be helpful to show.

      The revised caption clarifies that the positions of T373 and P406 are represented by their carbonalpha atoms. A close-up view of the structures of TEA and RY785 is included in the Supplementary Information section.

      (6) p. 8. Figure 4 caption. Please indicate if N atoms ere used for density maps in panels B and C, and which value of the density was used to show meshes. In panel A please indicate what are the units of the density shown by color maps. 

      The caption has been revised to clarify these questions.

      (7) p. 9 "inside the protein" -> "inside the channel pore".

      Thanks. Corrected.

      (8) p. 10 "which lines the cavity" -> "which lines the water-filled cavity"

      We appreciate the suggestion but believe the wording is clear as is.

      (9) p.10 Fig. 5. It would be helpful to distinguish residues from different chains e.g. by different colors rather than using different colors for different residues. The S atom in RY785 is hard to recognize due to the yellow color used for C atoms. Figure 5B is very confusing. It is not clear what this plot represents. For instance, what does it mean that Pro405 has ~10 contacts in 20% of simulation snapshots? Does it mean 10 C..C/S interactions within 4.5 A? I am not sure what the value of this is. I think a bar or radar chart plot showing % of contacts with one, two, or more residues of each type would be more helpful. 

      Thanks. The revised caption ought to clarify how to interpret the plot.

      (10) p. 12 "Due to its 2-fold molecular symmetry". TEA has a tetrahedral point group or Td symmetry. It has several two-fold rotational axes though. 

      Thanks. Corrected.

      (11) p. 12 "it prevents K+ ions in the cytoplasmic space from destabilizing the K+ ions that reside in the selectivity filter" I am not sure if this statement is entirely accurate as there might be destabilization of a multi-ion SF configuration not ions per see.

      We believe this statement is clear as is.

      (12) p. 13 Fig. 7 caption "includes non-conductive or transiently inactivated states" - I am not sure what "transiently inactivated state" is as inactivation is a specific term used in ion channel research and it does not seem to be explicitly considered in this study.

      A reference has been included in the caption for readers interested in the process of inactivation.

      (13) p. 14 "the net charge of these constructs is thus zero". This would depend on the number of basic and acidic residues in the protein. 

      Yes, it does – and as a result the construct we model has a net zero charge.

      (14) p. 14 I wonder if the protein was constrained or heavily restrained during MARTINI membrane building and equilibration procedure. Otherwise, C-alpha mapping would be problematic and clashes with lipid membrane atoms might take place as well.

      It was indeed. When a protein is simulated using the MARTINI coarse-grained forcefield, its fold must be preserved through a network of strong ‘virtual’ bonds between adjacent carbon-alpha atoms. This is standard practice so we do not believe it requires further explanation.

      (15) p. 15 PME - please spell out and provide reference.

      Corrected.

      (16) p. 15 "with a smooth switching function" - is it a special or standard switching function? Also, was it used for energy or forces? 

      The switching function brings both forces and energies to a value of zero at the cut-off value, smoothly. We refer the reviewer to the NAMD manual for further details.

      (17) p. 15 '𝑘 = 1 𝑘B𝑇.' Please confirm that there is a factor of "1" there, which can be actually skipped if this is the case. 

      The value of k = 1 KBT is correct.

      (18) p. 15. Please cite PMID: 22001851 for the transmembrane electric field application technique.

      Corrected.

      (19) p. 15 "and CHARMM36m" -> "and CHARMM36m force field". 

      Corrected.

      (20) p. 16 "the four proteins subunits" -> "the four protein subunits". 

      Corrected.

      (21) p. 16. Please provide the reference for CGenFF. It's reference 49. 

      Corrected.

      Supporting Information (SI): CGenFF is misspelled in multiple figure captions in the SI. All potential energy scans indicate "angle", but some are bond angles while others are dihedral angles. Using subscripts for atom numbers is confusing and does not match the numbering scheme used in Fig. S1. So, please use the same style of numbering throughout, e.g. C46-C42-N43 (without subscripts). Please label the X and Y axes in Figsures S2-S19 and S21. In Figure S22 please perform a linear regression analysis and/or compute Pearson correlation coefficients and indicate trend lines. Table S1. It would be good to compute RMS or mean unsigned errors to get an idea about accuracy. Also, please indicate if reference QM values were scaled by 1.16 for energies or offset for distances. 

      The Supplementary Information has been corrected. We thank the reviewer for their detailed feedback. 

      Reviewer #3 (Recommendations for the authors):

      (1) The study needs to consider the possibility of multiple binding sites for RY785, particularly given its impact on voltage sensors and gating currents. Specifically, the potential for allosteric binding sites in the voltage-sensing domain (VSD) should be assessed, as some allosteric modulators with thiazole moieties are known to bind VSD domains in multiple voltage-gated sodium channels (Ahuja et al., 2015; Li et al., 2022; McCormack et al., 2013; Mulcahy et al., 2019). Molecular docking and/or MD simulations could quickly test this hypothesis. If this hypothesis is not true, a comprehensive search can exclude such a possibility, which can also confirm the long-range allosteric coupling between RY785 binding in the central pore and voltage-sensing domain dynamics. 

      Please see our response above.

      (2) The authors describe RY785 as a selective inhibitor of Kv2 channels and characterize its binding residues through MD simulations. To support this claim, Figure 5 needs to include a multiple sequence alignment with other Kv channels. This would help demonstrate whether the identified RY785-binding residues are indeed unique to Kv2 channels.

      Please see our response above.

      (3) The study applies a biasing potential to 𝜙, 𝜓, and 𝜒1 dihedral angles. Please clarify:

      (a) Is this potential solely to prevent selectivity filter collapse/degradation, as mentioned in a previous D. E. Shaw Research publication (Jensen et al., 2012)?

      Yes, that is correct.

      (b) If it applies to all amino acids, can this potential prevent other changes, such as in the voltagesensing domain?

      Yes, that is correct.

      (c) What specific "large-scale structural changes" does this potential preclude? 

      For example, it would preclude the spontaneous degradation of the secondary or tertiary structure of the protein. We have revised the Methods section to make these points clearer. 

      (d) Given that such biasing potentials on backbone dihedral angles can decrease conformational flexibility, and considering that Kv channel permeability/conductivity could be highly sensitive to filter flexibility, what insights can you provide about the impact of the force constant k on channel conductivity?

      In previous studies based on an identical methodology (Stix et al, 2023; Tan et al, 2022), we have observed good agreement between calculated and experimental conductance values – at least as good as can be hoped for, when all approximations are considered. Based on the data presented in those studies, we have no reason to believe our methodology inhibits the permeability of the channel, which is logical as the local structural fluctuations required for K+ flow across the selectivity filter are not impaired, by definition. To the contrary, the fact that these weak biasing potentials make the conductive form of the filter the most favorable state in simulation enable a clear-cut analysis of conductance under plausible simulation conditions, both in terms applied voltage and K+ concentration. We refer the reviewer to the abovementioned studies for further details and a discussion of this subject.

      (4) The observation that the Kv2.1 central pore remains partially permeable to K+ ions when RY785 is bound is intriguing. Given the compact nature of the central cavity when RY785 is bound, it would be valuable to investigate whether polar groups of RY785 (e.g., nitrogens from the amide, benzimidazole, and thiazole moieties) always interact with K+ ions. Characterizing these interactions could inform the design of similar compounds with differential modulation effects.

      We examined this possibility and detected no convincing interaction patterns between RY785 and K+ ions – logically, inhibitor and ions are in close proximity while residing concurrently within the pore, but we detected no evidence of specific interactions.

      Minor points:

      It is strongly recommended that the refined force field parameters for RY785 be shared as a separate supplementary file in CHARMM force field format. This addition would be valuable for the scientific community, allowing other researchers to use or compare these parameters in future studies.

      We agree entirely. Upon publication of the VOR for this article the forcefield parameters for RY785 will be made freely available for download at https://github.com/Faraldo-Gomez-Lab-atNIH/Download.

      The study uses a KCl concentration of 300 mM, which exceeds typical intracellular K+ levels. While this may be intentional to enhance K+ permeation probability, a brief justification for this choice should be included in the Methods section.

      Yes, what motivated this choice in this and in our previous studies of K+ channels was the expectation of a greater number of permeation events, for a given simulation length, and therefore greater confidence (i.e. statistical significance) in the observed ion conductance, or in the degree to which it might be inhibited by a blocker. It worth noting that 300 mM KCl, while atypical in the intracellular environment, is often used in electrophysiological studies. The Methods section has been amended to clarify this point.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Persistence is a phenomenon by which genetically susceptible cells are able to survive exposure to high concentrations of antibiotics. This is especially a major problem when treating infections caused by slow growing mycobacteria such as M. tuberculosis and M. abscessus. Studies on the mechanisms adopted by the persisting bacteria to survive and evade antibiotic killing can potentially lead to faster and more effective treatment strategies.

      To address this, in this study, the authors have used a transposon mutagenesis based sequencing approach to identify the genetic determinants of antibiotic persistence in M. abscessus. To enrich for persisters they employed conditions, that have been reported previously to increase persister frequency - nutrient starvation, to facilitate genetic screening for this phenotype. M.abs transposon library was grown in nutrient rich or nutrient depleted conditions and exposed to TIG/LZD for 6 days, following which Tnseq was carried out to identify genes involved in spontaneous (nutrient rich) or starvationinduced conditions. About 60% of the persistence hits were required in both the conditions. Pathway analysis revealed enrichment for genes involved in detoxification of nitrosative, oxidative, DNA damage and proteostasis stress. The authors then decided to validate the findings by constructing deletions of 5 different targets (pafA, katG, recR, blaR, Mab_1456c) and tested the persistence phenotype of these strains. Rather surprisingly only 2 of the 5 hits (katG and pafA) exhibited a significant persistence defect when compared to wild type upon exposure to TIG/LZD and this was complemented using an integrative construct. The authors then investigated the specificity of delta-katG susceptibility against different antibiotic classes and demonstrated increased killing by rifabutin. The katG phenotype was shown to be mediated through the production of oxidative stress which was reverted when the bacterial cells were cultured under hypoxic conditions. Interestingly, when testing the role of katG in other clinical strains of Mab, the phenotype was observed only in one of the clinical strains demonstrating that there might be alternative anti-oxidative stress defense mechanisms operating in some clinical strains.

      Strengths:

      While the role of ROS in antibiotic mediated killing of mycobacterial cells have been studied to some extent, this paper presents some new findings with regards to genetic analysis of M. abscessus susceptibility, especially against clinically used antibiotics, which makes it useful. Also, the attempts to validate their observations in clinical isolates is appreciated.

      Weaknesses:

      Amongst the 5 shortlisted candidates from the screen, only 2 showed marginal phenotypes which limits the impact of the screening approach.

      We appreciate the reviewer’s comments, but we note that 4 out of 5 genes displayed phenotypes concordant with findings of the Tn-Seq data, with katG and pafA, as well as MAB_1456c (during starvation only) and blaR (in rich media only) having decreased survival as shown in Figure 3A-D. We do agree that some of the phenotypes were more modest in a single-mutant context than in the pooled Tn-Seq screen. In addition, several mutants that had modest changes in survival also showed profound defects in resuming growth after removal of antibiotics, with the pafA mutants particularly impaired. (Figure 3 - figure supplement 1).

      While the role of KatG mediated detoxification of ROS and involvement of ROS in antibiotic killing was well demonstrated, the lack of replication of this phenotype in some of the clinical isolates limits the significance of these findings.

      While the role of katG varied among strains, the antibiotic-induced accumulation of ROS was seen in all three strains (Figure 6A). This suggests that in some strains other ROS-detoxification pathways are able to compensate for the loss of katG.

      (Figure 2—figure supplements 1–3)

      Figure 1—figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Weaknesses:

      The work can be improved by a more comprehensive treatment of prior work, especially comparison of E. coli work with mycobacterial studies.

      Moreover, the work still has some technical issues to fix regarding description of the methods, supplementary material, and reference formating.

      See detailed responses below.

      Overall impact: Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These are fieldsolidifying observations.

      Comments on revisions:

      The authors have moved this paper along nicely. I have a few general thoughts.

      It would be helpful to have more references to specific figures and panels listed in the text to make reading easier.

      Text modified to add more figure references.

      (1) I would suggest adding a statement about the importance of the work. From my perspective, the work shows the general nature of many statements derived from work with E. coli. This is important. The abstract says this overall, but a final sentence in the abstract would make it clear to all readers.

      We appreciate the suggestion and have added a line to the abstract.

      (2) The paper describes properties that may be peculiar to mycobacteria. If the authors agree, I would suggest some stress on the differences from E. coli. Also, I would place more stress on novel findings. This might be done in a section called Concluding Remarks. The paper by Shee 2022 AAC could be helpful in phrasing general properties.

      We have added mention of this in the discussion (lines 354-356).

      (3) Several aspects still need work to be of publication quality. Examples are the materials table and the presentation of supplementary material. Reference formatting also needs attention.

      We respond to the specific details below.

      Reviewer #3 (Public review):

      Summary:

      The manuscript demonstrates that starvation induces persister formation in M. abscesses.

      They also utilized Tn-Seq for the identification of genes involved in persistence. They identified the role of catalase-peroxidase KatG in preventing death from translation inhibitors Tigecycline and Linezolid. They further demonstrated that a combination of these translation inhibitors leads to the generation of ROS in PBS-starved cells.

      Strengths:

      The authors used high-throughput genomics-based methods for identification of genes playing a role in persistence.

      Weaknesses:

      The findings could not be validated in clinical strains.

      Comments on revisions: No more comments for the authors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are strongly encouraged to check the references. There is some systematic error in the citations of references. Started to list but then they were too many.

      For example Ln 51, Ref #11 cited, should be #10. Ln 59, #18 is wrongly cited. Should be - Ln 104. Ref #27 wrongly cited.

      Ref #26 and #28 identical.

      Even in discussion section a lot of references are mis-cited.

      We very much appreciate the reviewer catching this issue with the import of our references and we have corrected this.

      Reviewer #2 (Recommendations for the authors):

      Below I have listed comments on specific issues that I hope are useful during revision.

      Line 21 population is singular

      Text modified

      Line 21 comma after antibiotic (subordinate clause) Line

      Text modified

      25 is how singular?

      Text modified

      Impression of abstract: the work seems to confirm and therefore generalize concepts derived from studies with E. coli. If the authors agree, such a statement would be appropriate as a final sentence. I would also look for novel features to stress in the abstract.

      Line 41 this challenge is vague

      Text modified

      Line 43 comma such as (also comma at the end of the parenthetical statement). This type of comma error is common throughout the manuscript and slows reading.

      Text modified

      Line 60 paradoxically. Is this the best concept? Or is it the natural effect of evolution (assuming that mycobacteria or their ancestors were exposed to environmental antibiotics)?

      It is certainly problematic for clearing infection.

      Text not modified.

      Line 63 highlighted uncertainties ... meaning is unclear especially since you may have changed what "model" is referring to.

      Text modified

      Line 66 models.... Do you really mean systems? Models of what?

      This refers to mechanistic models. Text not modified.

      Line 67 arrest cell division. This is written as if it were true. Does the evidence point specifically to cell division or perhaps more accurately suppression of metabolism (see Ye et al 2025 mBio).

      Both have been postulated as important. Text modified to add concept of metabolism

      ... targeted by antibiotics non-essential... Do you think that antibiotics work by inactivating essential targets? That seems overly simplistic, as lethal action is more likely the metabolic response to the damage caused. By the end of the paragraph you come around to this view, but you have already misdirected the reader. The reader is not sure what to believe. Line 70 note that there are many inhibitors of transcription and translation that only block growth, they do not rapidly kill cells

      There can be both direct, and indirect secondary killing mechanisms. We devote a significant portion of the Discussion section to this topic.

      Line 71 debate. There was indeed a debate, but reference 22 is not a valid citation for this. I think you mislead the reader by not accurately describing the debate. It was basically about the inability of Kim Lewis and James Imlay to reproduce the work of ref. 22. A great deal of prior work and then subsequent work showed that the challenge to ref. 22 lacked substance.

      (1) Text modified to fix an error in the citation number related to direct β-lactam-mediated lysis.

      (2) We agree that there is a great deal of data supporting antibiotic-induced ROS as important for bactericidal activity in many circumstances and do not argue otherwise. This sentence points out that over the years the paradigm for how antibiotics kill bacteria has evolved.

      Line 80. It seems you are starting a new topic here. What about beginning a new paragraph?

      The paragraph introduces mycobacteria of which Mabs is one. Text not modified.

      Line 85 delete the comma: it implies a compound sentence that is not delivered.

      Text modified.

      Line 109 screen singular

      Text modified.

      Line 156 these conditions is imprecise and vague

      Conditions were described in paragraph above in the manuscript. Text not modified.

      Fig 2 it would be helpful to more clearly define the meaning of the coordinates

      Text modified.

      Line 230 and throughout please indicate the location of the data being cited for rapid reader reference

      Text modified.

      Lines 315-323 You could use this paragraph as the first of the Discussion. Some readers prefer to read the Discussion before the results. For them, a summary at the beginning of the Discussion is useful.

      Text modified.

      Line 328 without underlying mechanism... for E. coli refer to Zeng PNAS 2022. Depending on when the final version of this paper happens, there should be a figure in a Zhao Zhu mLife paper on purA that will have been published. Since it is not yet available, it cannot be cited.

      We agree that the Zeng et al study is interesting and have added this reference to our discussion. However, these findings related to broad Crp-regulated tolerance actually underscore the point that we are making: that there are multiple factors (Crp, RelA, Lon, TisB, MazE, others) that mediate antibiotic tolerance.

      Line 339 where are the data?

      These data are in Figure 5, panels C, D. We have clarified the text to indicate that only a single agent from each of these classes was tested.

      Line 346 here you are summarizing evidence for ROS in killing mycobacteria. You should include the moxifloxacin study by Shee et al 2022 AAC.

      Reference added.

      Line 348 refer to James Collins' work with E. coli in which his lab examined agents with a variety of mechanisms. There seems to be a fundamental difference between E. coli and mycobacteria with respect to rifampicin, a strictly static agent in E. coli but clearly lethal in mycobacteria. Note that chloramphenicol is static in E. coli and blocks ROS production. What does it do in mycobacteria? A brief discussion of this difference might be relevant at line 362

      Text modified.

      Lines 364-368 Here the idea might be simply that there are two modes of killing, one that is a direct extension of class-specific damage (chromosome fragmentation with fluoroquinolones, for example, or cell lysis by beta-lactams) and a second that is a metabolic response to the antibiotic damage (ROS accumulation). The second type is not class specific. Within this context, the mycobacterial killing by rifampicin might be a class-specific extension of inhibition of transcription that does not occur in E. coli.

      Agreed, text modified to include this.

      Line 400 The Key Resource table is not of publication quality. Precision and repeatability can be improved by spelling out the name of the vendor and its location (City, Country). In the present case, use of BD is lab jargon.

      We appreciate the reviewer’s precision. However, this is actually not lab jargon. Becton, Dickinson and Company now refers to itself as BD (see https://www.bd.com/en-us), and the American Type Culture Collection now refers to itself as ATCC (see https://www.atcc.org/about-us/who-we-are).

      Line 639 It would be good to have experienced colleagues critically review the manuscript, especially for English usage. Listing those persons here adds to the credibility of the work

      Text not changed.

      References: please refer to the journal style. Here you use italic for titles and scientific names, thereby obscuring the scientific names. Normally article titles are not italic and scientific names are ALWAYS italic unless prohibited by journal style.

      Our reference format is concordant with eLife submission guidelines, and all references are reformatted by the journal at the time of final publication (see https://elifesciences.org/insideelife/a43f95ca/elife-references-yes-we-take-any-format-no-we-re-not-rekeying).

      Supplemental Material: Please refer to journal style. Normally this is a stand-alone document that includes a title page and carefully crafted figure legends. Supplemental figures would be numbered as 1, 2, ... A professional appearing Supplemental Material section shows author publication experience not obvious in other parts of the paper. The text indicated MIC determinations. I would like to see a table of MIC values.

      (1) MIC table added as Supplemental Table 5.

      (2) The Supplemental figures are submitted and named in accordance with eLife instructions. Please note that for eLife, there is not a stand-alone supplementary figure section with a title page as you are requesting, but instead the figure supplements for each figure are provided as online files linked to each figure.

    1. Reviewer #2 (Public review):

      Summary

      The aim of this study was to assess which amino acid stretches are tolerated/favoured in the course of evolution, considering their physico-chemical properties, metabolic costs and proteotoxicity. To address this question, the authors expressed PolyX variants in yeast, E. coli and also referred to COS cells. The PolyX constructs were tagged with GFP or a different fluorescence reporter to assess expression levels and localization at the C-terminus with or without a cleavable linker or to study topological effects. The PolyX stretch was also embedded between two different fluorescent proteins. The authors used growth rate and expression levels as judged by fluorescence intensities to calculate the relative neutrality in comparison to GFP alone.

      They could show that harmful/beneficial effects depend on the specific amino acid (aa) and polar aa are tolerated well, whereas hydrophobic and positively charged aa are harmful to the cell. This is not surprising as hydrophobic and positively charged aa are known to be aggregation-prone. They could further show that the topology matters for some, but not all, PolyX variants. The PolyX stretch can affect the subcellular localization and aggregation propensity of the GFP it is fused to. Interestingly, overexpression of PolyG, PolyQ or PolyS was not harmful, and overexpression of PolyE was potentially even beneficial for the cell. The authors concluded their study with a theoretical analysis of the presence of aa stretches in various species and identified a high correlation between their expression in yeast and other species, suggesting that the selection of aa stretches is conserved and follows biochemical rules (trade-off between tolerance of expression levels, solubility, sub-cellular localization, and maybe metabolic costs).

      Strengths:

      The authors performed a high number of experiments and systematically assessed the expression and tolerance of 10mer stretches of 20 aa fused to GFP or other fluorophores in yeast and E. coli. This is an impressive effort.

      Weaknesses:

      (1) The analysis of expression levels of the various PolyX variants should not rely only on fluorescence intensities. The fusion of the PolyX stretch may affect the fluorescence properties (brightness, photostability) of the fluorescent partner and may or may not affect abundance. A quantitative analysis of PolyX-GFP (same applies to the other fusion constructs shown in Figure 3) is needed. Preferably by an MS-based proteomic analysis via peptide count. Western blot is less ideal as it would rely on epitope recognition of the respective antibody, and the epitope accessibility might be altered upon fusion with different PolyX stretches. In addition, the authors should analyse the PolyX stretch without an attached fluorophore as a control.

      (2) The images shown in Figure 4 are not very informative. The constructs should be subjected to FRAP to assess the solubility of the PolyX variants and Ssa1 (Hsp70). FCS could be an alternative as well.

      (3) The observation of the lack of mCherry fluorescence for PolyK and PolyP (Figure 4) can also be interpreted as an instability of the fusion protein (partial truncation and degradation) or quenching. The authors should test different fluorophores and different linker lengths between the PolyX stretch and the fluorophores. Fluorophore swapping (N/C-terminally) would also be a good control.

      (4) The study would benefit from a consideration of a large body of literature on protein aggregation and the contribution of amino acid composition. The here identified amino acids that as 10mer stretch are harmful to the cell and are known to be aggregation-prone and are also recognised by molecular chaperones to prevent their aggregation.

      (5) The study could further benefit from ex vivo and in vitro analyses of the PolyX constructs. They could isolate the PolyX variants and study their solubility by, e.g. light scattering outside of the cellular context.

    1. Reviewer #1 (Public review):

      Summary:

      This manuscript describes a putative clinical association between ARID5B genetic variants and a novel neurodevelopmental syndrome characterized by global developmental delay, intellectual disability, and occasional neuroinflammatory episodes. While the identification of 29 individuals with overlapping phenotypes and the use of a CRISPR-Cas9 mouse model suggest a potential gene-disease link, the study suffers from significant methodological gaps in variant prioritization and a lack of robust mechanistic evidence to support its primary claims. Specifically, the "neuroinflammation" component is over-emphasized despite appearing in only a minor subset of the cohort, and the molecular pathogenesis remains insufficiently explored beyond initial protein localization assays.

      Strengths:

      (1) The study proposes a new clinical syndrome associated with the ARID5B gene, distinguishing it from established Coffin-Siris syndromes related to other ARID family members.

      (2) The recruitment of a relatively large cohort of 29 individuals from diverse geographical and ethnic backgrounds strengthens the initial phenotypic description.

      (3) The combination of human clinical data, in vitro localization assays, and an in vivo mouse model provides a multi-level framework for investigating the gene's function.

      (4) The identification of variants in the exceptionally long final exon of ARID5B that escape nonsense-mediated mRNA decay (NMD) offers an interesting perspective on the molecular pathology of this gene.

      Weaknesses:

      (1) The description of the genomic methodology appears limited. A more detailed explanation of the filtration and selection process for variant prioritization is essential. The authors should provide a comprehensive summary of evidence (e.g., CADD scores, allele frequencies in gnomAD, and segregation analysis) to justify the selection of the reported variants, even if they do not strictly meet all ACMG/AMP criteria.

      (2) The cohort includes several inherited variants and missense mutations that require more robust evidence of pathogenicity. For example, the presence of the variant in population databases (gnomAD) suggests the need for careful re-evaluation of its causality. A more rigorous assessment using diverse computational metrics, such as PhyloP scores and conservation analysis, is necessary to confirm the pathogenicity of the missense variants.

      It is recommended that the authors re-evaluate the cohort to ensure that only variants with strong evidence of causality are included to maintain a clear genotype-phenotype correlation.

      (3) The proposed molecular mechanism would benefit from further empirical support. The claim of NMD escape is currently supported by only a small number of cases, and a much more detailed explanation is also required for the experimental data provided.

      Although the mouse model exhibits developmental abnormalities, it does not recapitulate the other systemic features reported in humans. In addition, given that "brain development" is a central theme, the manuscript lacks detailed neuroanatomical data, histopathology, or other molecular biological (e.g., RNA-seq) evidence from brain specimens to substantiate these claims at a molecular level.

      (4) The emphasis on "neuroinflammation" in the title may be disproportionate to its observed frequency. Central nervous system inflammation was identified in only a small subset of the cohort (2 of 29 individuals).

      Without additional experimental validation, such as immunological challenges in the Arid5b mouse model, it is premature to characterize this as a hallmark feature. Additionally, the inconsistent response to immunotherapy suggests that the autoimmune component requires further investigation.

      (5) Supplementary tables require reorganization to improve clarity. The current structures make it difficult for readers to effectively analyze the data, and a more standardized format is recommended.

      (6) As the manuscript proposes a novel disease entity, a more comprehensive clinical discussion is warranted. The authors should provide a more systematic description of the core clinical features and, crucially, address the genotype-phenotype correlation. Specifically, a more detailed analysis is required to determine whether the clinical severity or the presence of specific features varies according to the location of the variant or the type of mutation. Such insights are essential for clinicians to differentiate this syndrome from other ARID-related disorders.

    2. Reviewer #3 (Public review):

      Summary:

      In the present study, through international gene-matching efforts, the authors present 29 individuals with rare, heterozygous ARID5B variants and find that these individuals have a newly recognizable neurodevelopmental syndrome. A recurring clinical syndrome of developmental delay/intellectual disability, behavioral difficulties, renal malformation, and recurrent infections is described. 19 of these variants were confirmed to be de novo, and only one was inherited from an unaffected parent. 24/29 of these variants introduce premature termination codons in the final exon and are predicted to escape nonsense-mediated decay. The ARID5B p.Q522Ter variant was studied in a mouse heterozygous knock-in model, found to be associated with behavioral abnormalities. The well-described genetic and phenotypic data for this cohort provide convincing clinical evidence for a novel neurodevelopmental syndrome. The functional evidence provided is preliminary, and further studies are needed to understand disease mechanisms.

      Strengths:

      (1) The authors give a good description of a novel clinical syndrome manifesting as developmental delay/intellectual disability, facial dysmorphism, and behavioral challenges.

      (2) The authors create a mouse model harboring an Arid5b(Q522*/+) variant and identify subtle behavioral changes.

      (3) Attempts are made to functionally characterize a subset of ARID5B variants in human cell lines.

      Weaknesses:

      (1) The title - "ARID5B mutations cause a neurodevelopmental syndrome with neuroinflammation episodes" - should be revised. 2/29 individuals (7%) had CNS inflammation; this does not appear to be a core feature of the disease and should not be highlighted as such. If this is going to be a feature that is highlighted, then more details are needed. MRI images of cerebellitis and/or ADEM would be helpful, as well as lumbar puncture results and supplemental information detailing the treatment course.

      (2) The abstract states that "Remarkably, 19 of 29 variants (66%) cluster within the first quarter of exon 10, are de novo, and escape nonsense-mediated mRNA decay (NMD), which we confirmed for two variants affecting seven individuals." The authors state in the Results that they "indeed found no signs of NMD". In Figure 3f, when assessing for transcript amount, there appears to be a great deal of variability. Three ARID5B variant lines are tested. Transcript amounts in two lines appear to be near control levels, but one LCL ARID5B Ile497AsnfsTer31 line appears to demonstrate significantly lower levels of transcript. The control lines also show a great deal of variability. No explanation is given for this large difference between LCL ARID5B Ile497AsnfsTer31 lines and for the variability in control lines, making these data uninterpretable. A major theme of the paper is that early truncating variants in exon 10 escape NMD and lead to the described phenotypes, so this is an important point that needs to be resolved, either by testing more patient-derived lines or knocking in these variants into cell lines.

      (3) The Arid5b(Q522*/+) mice are not sufficiently molecularly characterized. Does the variant transcript escape NMD? What happens at the protein level? Is there mislocalization of the protein?

      (4) For the HEK293T cell experiments, variants are overexpressed and compared to a control. These experiments appear to leave endogenous ARID5B intact. What might the authors expect to see if these variants were knocked in?

      (5) The functional consequences of the missense variants are not tested. The authors suggest that missense variants may be more associated with macrocephaly and possibly ASD. Are these missense variants causing loss-of-function or gain-of-function? Is there preserved protein function?

      (6) There are a number of functional assays performed, but it remains unclear if the tested variants are operating through a loss- or gain-of-function. Are truncating variants early in exon 10 leading to a partial loss-of-function? Or do they prevent the functioning of the other allele through a dominant negative mechanism? These possibilities are not directly tested.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Perez-Rico et al use dCAS9 to activate and repress L1 expression levels in the subfamily level for younger mouse L1s to address their cisregulatory potential

      They choose to transcriptionally perturb full-length L1s subfamily.

      An important topic, and I liked the approach of using CRISPR to perturb an entire subfamily of transposable elements. However, this study has yet to add any new findings or conclusions. This study can be improved by further analysis or experimentation. The regulatory role of L1s in cis-regulatory function has been previously demonstrated in mouse and human stem cell systems using multiple approaches, including CRISPR-mediated perturbations, CRISPR-based deletions, and 3D chromatin structure analysis. My main concern is the analysis is very superficial and lacks integration of other datatypes such as HiC to further support the role of L1s in enhancer activity and further validation on candidate loci. One of the major limitations of CRISPR-based perturbations is that it can be difficult to conclude the cis-regulatory role of these targeted elements (L1) as synthetic tethering of VPR or KRAB domain can indirectly spread their effect in cis, which otherwise may not have a specific cis-regulatory role to regulates genes in cis. Some of the major comments that needs to be addressed to strengthen the manuscript conclusions

      1. One of the important controls could be targeting different or older L1s to show how it can affect gene expression. can authors perturb older families?
      2. Although meta-analysis of L1s show up/downregulation of L1MdTf, ChIPseq for dCAS9 is needed to demonstrate which L1s are directly targeted in the experiment and analyse the data separately for dCAS9 bound L1s vs rest of the L1s in the subfamily.
      3. L1s' transcriptional activation could be independent of their cis-regulatory activity. So, it is important to analyse the enhancer chromatin feature of TEs: Do they make 3D contact with the nearby genes?
      4. how many replicates of RNA seq is done?
      5. Are genes that respond to CRISPR activation in the same TAD? Do they loop with LINE1?
      6. Figures 4 A &B show clear perturbation of host genes upon CRISPRi. However, they need to show L1s outside the genes (not in the introns) as CRISPR dCAS9 perturbation of Intronic L1s is expected to affect the expression of genes.
      7. One of the key experiment missing is to genetically delete candidate L1s that are intergenic that are predicted to have a cis-regulatory function in the RNAseq data.
      8. Authors should present the RNAseq data for genes 10 to 100 kb away from dCAS9-KRAB and dCAS9-VPR targeted L1s. It will be interesting to compare KRAB v VPR effect with distance.
      9. The suggested role of L1s in neuronal gene expression is not strongly supported by their experiments or anlaysis. The authors end the results section with "Most of these genes are not expressed or have low expression levels in ESCs compared to other upregulated genes that are not part of these processes and pathways (Fig.S9E)". This result suggests that dCAS9-VPR targeting to multiple (How many are targeted is missing) L1s leads to the activation of many genes that are not normally expressed in mESCs due to the spreading of transcriptional activators or mESCs getting spontaneously differentiated to the neuronal lineage. Authors need to address these points.

      Minor comments

      Fig 2E) which housekeeping gene

      Figures 4 A &B, show if these L1s make looping interactions or are in the same TAD as target genes.

      Referee Cross-commenting

      I agree with the other two reviewers points.

      Significance

      This study can be improved by further analysis or experimentation. In its current version, the conceptual advance offered by the current results and approach of the manuscript seem limited.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this study Pérez-Rico et al. set out to study the cis-regulatory effects of evolutionary young L1 elements (specifically L1MdTf) in mouse embryonic stem cells. Using state of the art systems, they manage to silence/activate these elements expression and study the following gene expression cascade.

      Major comments:

      1. A general assessment or profiling of the L1MdTf expression in WT mESC is missing. It is very challenging to set their results in perspective (especially those of the repression system) without a general overview of the baseline L1-expression in mESC. How many elements does the WT mESC express? How is the genomic distribution of these elements? And how does it relate to nearby gene expression?
      2. Off-target effects are very poorly documented and limited to in silico analysis. Documentation of changes in H3K4me3 and H3K9me3 upon activation/silencing would be beneficial to interpret the results. As it is now it is very difficult to understand why the activation-system is so much stronger that the silencing system.
      3. It would be beneficial to explain why different experimental strategies were used for silencing/activation?
      4. The study would benefit of a more detailed characterization of their systems' effect. E.g. It is not clear from Figure 3D-E how many of these repeats are part of the L1MdTf subfamily or other closely related subfamilies. This is relevant to assess the specificity of the systems.
      5. The RNA-seq analysis relies on reads that are too short to uniquely map most L1s in the mouse genome. This should be mentioned and discussed.
      6. A key issue that remains unresolved is the functional of L1-silencing/activation in mESC. Does this change cell morphology, pluripotency, or differentiation potential? This is essential information.
      7. Fig 3B: Is this n=2? Hence why there is no statistical analyses? Same for Fig 5A-B: here a p-value is presented but it would be good to highlight the statistically significant results somehow.
      8. Are only L1MdTf elements plotted in Fig 3D and 3E? As the authors claim that the system is quite specific, it would be good to have a clear view of the repeats that are differentially expressed and are evolutionarily close to the L1MdTf family. The figure might also benefit of some further characterization on a family level (multi-mapping) rather than single elements (unique mapping or multi mapping with reads that map to a single location, like here)
      9. The authors provide some evidence for the existence of a bidirectional promoter on L1MdTf elements. In what elements was this observed? Does it result in chimeric transcripts? And if so, are these genes in line with the enrichment found?

      Minor comments:

      1. The introduction would benefit from a more detailed description of the L1-content in the mouse genome including the presence of polymorphic elements.
      2. There is no characterization of the edited mESC. Stainings? Markers from bulk RNA seq analysis? Morphology assessment?

      Significance

      The manuscript is timely and in line to some similar and very recent studies suggesting a link between evolutionary young L1s and neuronal-related genes in mice (e.g. Toda et al.). However, the manuscript is quite preliminary in the current form. The investigation of potential off-target effects of their system requires more in-depth investigations. In addition, the manuscript could be improved with further exploration into the functional consequences of L1-silencing/activation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      The authors utilize engineered transcriptional effectors (CRISPRa or KRAB-Zinc Finger proteins) to activate or inhibit the expression of retrotransposons in mouse embryonic stem cells. They focus on a young L1 family in mouse, L1MdTf, which is particularly active. This study reveals extensive alterations in gene expression for surrounding genes, and it suggests that the dysregulated genes are enriched in repressive chromatin and preferentially associated with neuronal functions.

      Major comments

      The experiments are well-executed and controlled overall. However, to fully support the conclusions, a few aspects of the work could be strengthened:

      1. Engineered Effector Binding: The binding sites for the engineered effectors (CRISPRa or KRAB-Zinc Finger proteins) were predicted computationally, but not validated experimentally using techniques like ChIP-seq. Although mapping reads within L1MdTf might be challenging, ChIP-seq could still be informative. It would help define the ratio of on-target vs. off-target binding and elucidate potential off-target effects (binding outside L1MdTf repeats).
      2. Mechanism of action: The study somehow assumes that the effectors influence L1MdTf transcription, which then impacts the host transcriptome. However, it is also possible that the engineered effectors directly modulate nearby cis-regulatory elements (promoters or enhancers) due to their proximity to L1MdTf sequences. Further investigation is needed to differentiate between these possibilities.
      3. Statistics: The study lacks robust statistical testing in several figures (e.g., Fig. 1B,D; 2C,E; S2; S5C,D). Additionally, the null hypotheses and statistical models used are not explicitly defined. To strengthen the analysis, the authors could quantify the percentage of differentially expressed (DE) or allele-specifically expressed genes associated with targeted L1s compared to non-targeted L1s or other relevant control groups. Furthermore, neuronal genes are known to be longer than average, which can bias enrichment analyses like those performed using DAVID. This increased length makes them more likely to harbor L1 insertions and potentially be affected by the engineered effectors targeting these elements. It would be informative to see how the conclusions change if the null models account for gene size or only consider genes with full-length L1MdTf elements.
      4. Allele-specific expression: The analysis of allele-specific expression could be further strengthened by incorporating L1 heterozygosity data. Since the whole genome sequence of these cells is available, calling genotypes for L1 insertions using established pipelines like MELT or others would be a straightforward approach. This would allow the authors to ensure that allele-specific expression patterns are indeed associated with L1 insertional polymorphisms.

      Minor comments

      1. Western blot analysis could benefit from a more quantitative approach. Analyzing the intensity profiles of the ORF1p bands might help differentiate between potential ORF1p isoforms and pinpoint the specific changes (reduction or increase) in the top band. Additionally, an earlier explanation for the observed doublet band as a consequence of coexisting L1 families would be helpful for readers unfamiliar with mouse L1s.
      2. The chromHMM enrichment analysis reports fold enrichments (ratios of observed vs. expected associations with chromatin states). However, the magnitude of these enrichments seems to be modest in most cases. It would be informative to see the actual number of sequences falling into each category for a better understanding of the enrichment patterns.

      Significance

      Strengths

      • This study contributes to a limited body of research exploring the consequences of the global activation or repression of entire transposable element families.
      • Notably, the authors achieve this manipulation in a relevant cell type (embryonic stem cells) where retrotransposition is known to occur, utilizing a retrotransposition-competent L1 family (L1MdTf).

      Limitations

      • While the study offers a relatively novel approach, the biological question addressed could be further refined. This would allow for more specific conclusions regarding the role/consequences of L1MdTf dysregulation.
      • Artificially targeting L1 sequences is conceptually similar with randomly targeting the genome for transcriptional effects, since many, possibly most, L1 elements may not be naturally expressed under any biological conditions.

      While the manuscript confirms the known potential of L1 sequences to act as cis-regulatory elements, it does not provide significant new insights into the mechanisms underlying their regulatory potential. Notably, a recent study by Tong et al. IJMS (2024) demonstrated the activation of young human L1 elements using CRISPRa. This work similarly confirmed the ability of young human L1 elements to form chimeric transcripts and influence gene expression, likely through promoter or enhancer activity. However, key limitations of both studies are that the strength of the engineered effector might influence the observed transcriptional effects, and that the targeted L1 elements might not be naturally expressed, potentially leading to an overestimation of the regulatory impact of L1 activation.

      The study focuses on the impact of L1MdTf on gene expression, but exploring the phenotypic consequences of manipulating its expression could significantly broaden its significance. This would provide valuable insights into the functional impact of L1 activity in a broader context, potentially attracting a wider audience. In this context, L1MdTf silencing in mESC cells which naturally express these elements could be more insightful than artificially over-activating them. Phenotypic analyses could encompass cell cycle progression, commitment to the primed pluripotent state, totipotency, differentiation potential, DNA damage response, or X chromosome inactivation. Identifying whether observed phenotypes arise from cis-effects (on neighboring genes) or trans-effects (mediated by L1 RNA or proteins) would be crucial for understanding the underlying mechanisms.

      My field of expertise lies in molecular and cellular biology and genomics, with a particular focus on transposable elements.

    1. Reviewer #2 (Public review):

      In this study, Xu et al. investigated the regulatory mechanisms controlling intramolecular cleavage of the transmembrane transcription factor MYRF-1, an important event that controls developmental progression in C. elegans.

      The authors made important advances in several aspects:

      (1) Through endogenous gene editing/tagging, further supported by western blots, the authors convincingly demonstrate the novel finding that the intramolecular cleavage and nuclear translocation of MYRF-1 is not static, but temporally controlled within each developmental stage: with nuclear translocation peaking at the late stage and then declining into lethargus/molts between developmental stages (Figure 1).

      (2) They demonstrate that this cleavage and nuclear translocation is controlled by external stimuli, namely starvation.

      (3) They reveal modes of regulation of the intramolecular cleavage that is mildly regulated by MYRF-1's own JM domain as well as the CCT tail of interacting partner PAN-1.

      The conclusions of this paper are mostly well supported by data, but some aspects of the manuscript and conclusions should be clarified and extended to strengthen its findings.

      (1) The authors concluded that the intramolecular cleavage and nuclear localization of MYRF-1 were similarly temporally-regulated in all tissue types. However, the data/image presented was limited to specific regions/cell types that were inconsistently chosen across developmental windows. For example, for the cleavage/nuclear translocation across L1 into lethargus (Figures 1B, E, F, G), the heads of the worm were shown to comprise mostly neurons and muscles. While across the rest of the larval stages, only mid-body pictures were shown, comprising mostly hypodermal and some intestinal cells. A complete coverage of all tissues across all time points would better support the author's conclusion that this temporal regulation occurs similarly in all tissue types. Additionally, the authors should clearly indicate which tissue/cell-types were used in the quantifications, as these were not done for several figure panels (including but not limited to Figure 1I and J).

      (2) Related to point 1 above, this inconsistency in tissue assessment was also true for downstream experiments (Figures 2-6; e.g., starvation, JM, and CCT regulation, etc.). Broad tissue specific assessment for all downstream experiments would greatly enhance the strength and relevance of the findings. Judging by the current data presented (Figures 3, 5, 6), it seems to suggest that there are tissue/cell-type differences in the regulation of MYRF-1 nuclear translocation.

      (3) Developmental progression was superficially and inconsistently assessed across the study. Developmental progression was mainly assessed by hypodermal (V-lineage) division patterns and worm length in this study. Several glaring omissions that should have been examined were the lengths of larval stages/lethargus and molting defects, as well as gonad development, to help identify which developmental landmarks were affected vs. not.

      (4) The phosphorylation within MYRF-1's JM domain was insufficiently investigated. There were two serine phosphorylation sites that were discovered through mass spectrometry experiments, however the authors only investigated one of the serine (S623) residues without any justifications for the choice. Additional investigation of the other residues, as well as both together, would strengthen the relevance of these phosphorylation events to cleavage and nuclear translocation, especially considering the minimal effect observed with only mutating the one residue.

    2. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors identified dual inhibitory mechanisms, an intrinsic juxtamembrane (JM) region and an extrinsic cytoplasmic tail (CCT) domain in the binding protein PAN-1, that suppress MYRF-1 cleavage in C. elegans. The authors showed that MYRF-1 cleavage oscillates across larval stages, peaking in mid-to-late phases and being suppressed during molts. This oscillatory pattern is consistent with MYRF-1's role in promoting transitions of larval stages, particularly in late-L1 involving lin-4 activation and DD neuron remodeling.

      Strengths:

      This work generated several knock-in strains of fluorescent tags and mutations in the endogenous myrf-1 and pan-1gene loci, which will provide resources for future identification and characterization of the underlying molecular mechanisms regulating MYRF-1 cleavage inhibition.

      The results presented in the paper are solid enough to support the paper's main conclusions.

      This study is valuable for establishing MYRF-1 cleavage as a key gatekeeper of the C. elegans developmental timing. Findings from C. elegans MYRF-1 may provide insight into the regulation and function of mammalian MYRF.

      Weaknesses:

      The following points should be discussed to further support the authors' model that MYRF-1 cleavage is a key gatekeeper of developmental timing.

      (1) Recent findings by Helge Großhans and Jordan Ward groups showed that KIN-20 (CK1δ) and LIN-42 (PERIOD) are required for proper molt timing in C. elegans, and that loss of LIN-42 binding or of the phosphorylated LIN-42 tail impairs nuclear accumulation of KIN-20, resulting in arrhythmic molts (EMBO J. 44, 6368-6396, 2025). In this paper, the authors concluded that PAN-1 promotes MYRF trafficking to the cell membrane, where MYRF-1 cleavage and nuclear translocation occur, and that oscillates with developmental molting cycles in C. elegans. It is unclear whether MYRF-1 and KIN-20 interact in the nucleus and, if so, how this interaction controls developmental timing.

      (2) Separately, it was previously shown that the let-7 primary transcript (pri-let-7) exhibits oscillating, pulse-like expression that peaks during each larval stage, rather than a steady increase, and directly correlates with developmental molting cycles. It is unclear whether the nuclear-localized MYRF-1 fragment regulates the oscillatory primary let-7 expression during larval transition (McCulloch and Rougvie, 2014; Van Wynsberghe et al., 2011).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Feng et al. uses mouse models to study the embryonic origins of HSPCs. Using multiple types of genetic lineage tracing, the authors aimed to identify whether BM-resident endothelial cells retain hematopoietic capacity in adult organisms. Through an important mix of various labeling methodologies (and various controls), they reach the conclusion that BM endothelial cells contribute up to 3% of hematopoietic cells in young mice.

      Strengths:

      The major strength of the paper lies in the combination of various labeling strategies, including multiple Cdh5-CreER transgenic lines, different CreER lines (col1a2), and different reporters (ZsGreen, mTmG), including a barcoding-type reporter (PolyLox). This makes it highly unlikely that the results are driven by a rare artifact due to one random Cre line or one leaky reporter. The transplantation control (where the authors show no labeling of transplanted LSKs from the Cdh5 model) is also very supportive of their conclusions.

      We appreciate the Reviewer’s consideration of the strengths of our study supporting the identification of adult endothelial to hematopoietic transition (EHT) in the mouse bone marrow.

      Weaknesses:

      We believe that the work of ruling out alternative hypotheses, though initiated, was left incomplete. We specifically think that the authors need to properly consider whether there is specific, sparse labeling of HSPCs (in their native, non-transplant, model, in young animals). Polylox experiments, though an exciting addition, are also incomplete without additional controls. Some additional killer experiments are suggested.

      Recognizing the importance of the weaknesses pointed by the Reviewer, we provide below our response to the thoughtful recommendations rendered.

      Reviewer #1 (Recommendations for the authors):

      The main model is to label cells using Cdh5 (VE-cadherin) CreERT2 genetic tracing. Cdh5 is a typical marker of endothelial cells. The data shows that, when treating adults with tamoxifen, the model labels PBMCs after ~10 days, and the labeling kinetics plateau by day 14... The authors reach the main conclusion: that adult ECs are making hematopoietic cells.

      We agree that the main tool used in this study is to label endothelial cells (ECs) using Cdh5 (VE-Cadherin) CreERT2 genetic tracing in mice. Indeed, Cdh5 is recognized as a good marker of ECs. As a minor point, we wish to clarify that the results from treating adult Cdh5-CreERT2 mice with tamoxifen (Figure 1F) show that the ZsGreen labeling kinetics plateau by day 28 (not by day 14).

      Important controls should be shown to rule out alternative possibilities: namely, that the CreERT2 reporter is being sparsely expressed in HSPCs. Many markers, specific as they may seem to be, can show expression in non-specific lineages - particularly in the cases of BAC and PAC transgenic models, in which the transgene can be present in multiple tandem copies and subject to genome location-specific effects. As the authors remind readers, the Cdh5 gene is partly transcribed (though at low levels) in HSPCs, and even more clearly expressed in specific subpopulations such as CLPs, DCs, pDCs, B cells, etc. Some options would be to: i) check if the Cdh5-CreERT2 transgene (not endogenous Cdh5, but the BAC/PAC transgene) is expressed in LSKs (at least by qPCR), ii) verify if any CreERT2 protein levels are present in LSKs (e.g., by western blot), and iii) check if tamoxifen is labeling any HSPCs freshly after induction (e.g., flow cytometry data of ZsGreen LSKs at 24-48h post tamoxifen injection).

      We fully agree with the Reviewer that many markers, allegedly specific to a certain cell type, can show expression in other cell lineages. We also agree that excluding sparse or ectopic CreERT2 expression in hematopoietic stem and progenitor cells (HSPCs) is essential for interpreting lineage-tracing results. As suggested by the Reviewer, we have now examined if the Cdh5-CreERT2 transgene is expressed in bone marrow LSKs. To this end, we analyzed the Polylox single-cell RNAseq dataset presented in this study, containing ZsGreen<sup>+</sup> ECs and enriched ZsGreen<sup>+</sup> LSKs. As shown in the revised Figure S4D, CreERT2 transcripts were detected exclusively in Cdh5-expressing endothelial populations and were absent from Ptprc/CD45-expressing hematopoietic cells, except for plasmacytoid dendritic cells (pDCs; Figure S4E). These results are consistent with the RNAseq data from adult mouse bone marrow[1] showing that the Cdh5 gene is not expressed in HSPCs, CLPs, DCs, or B cells. Rather, among hematopoietic CD45<sup>+</sup> cells, Cdh5 is only expressed in a small subset of plasmacytoid dendritic cells (pDCs), which are terminally differentiated cells. These published results are described in the text.

      To further support this conclusion, we provide additional single-cell RNAseq analyses from our unpublished dataset of LSKs isolated from Cdh5-CreERT2/ZsGreen mice and not enriched for ZsGreen expression. These new analyses were performed after integrating the single-cell data from ECs and ZsGreen<sup>+</sup> hematopoietic cells from the Polylox dataset (current study). As shown in Author response images 1 and 2, CreERT2 expression closely matches the expression patterns of Cdh5, Pecam1, and Emcn and is not detected in Ptprc/CD45-expressing hematopoietic cells.

      Author response image 1.

      Expression of CreERT2, Cdh5, Ptprc and ZsGreen in BM cell populations enriched with ECs and hematopoietic cells. The single-cell RNAseq results are derived from ZsGreen-enriched BM ECs and ZsGreen-enriched BM hematopoietic cells were derived from Polylox lineage-tracing experiments (data shown in Fig. 5; 37,667 ECs and 48,065 BM hematopoietic cells) and from LSKs (23,017 cells) independently isolated from tamoxifen-treated Cdh5-CreERT2/ZsGreen mice without ZsGreen enrichment (unpublished data).

      Author response image 2.

      Expression of CreERT2, Cdh5, Ptprc, Pecam1, Emcn, ZsGreen1, Col1a2, Cd19, Cd3e, Itgam (CD11b), Ly6a (Sca-1), Kit(cKit), Cd34, Cd48, Slamf1 (CD150), and Siglech in enriched BM ECs and LSKs from Cdh5-CreERT2/ZsGreen mice treated with tamoxifen 4 weeks prior to harvest (same cell source as indicated in Author response image 1).

      Additionally, we functionally tested whether hematopoietic progenitors could acquire ZsGreen labeling following tamoxifen administration using transplantation assays (Figure 4A-D). ZsGreen<sup>-</sup> LSKs (purity 99%), sorted from Cdh5-CreERT2/ZsGreen donors that had never been exposed to tamoxifen to exclude background Cre leakiness, were transplanted into lethally irradiated wild-type recipients. After stable hematopoietic reconstitution, recipients were treated with tamoxifen. If transplanted HSPCs or their progeny expressed CreERT2, tamoxifen administration would be expected to induce ZsGreen labeling. However, no ZsGreen<sup>+</sup> hematopoietic cells were detected in these recipients, demonstrating that hematopoietic progenitors from Cdh5-CreERT2/ZsGreen and their descendants do not undergo tamoxifen-induced recombination.

      Together, the single-cell transcriptional and transplantation data demonstrate that CreERT2 expression and tamoxifen-induced recombination are restricted to Cdh5-expressing ECs (except for pDCs). These findings support the conclusion that ZsGreen<sup>+</sup> hematopoietic cells arise from adult bone marrow ECs rather than from contaminating hematopoietic progenitors.

      One important missing experiment is to trace how ECs actually do this hematopoietic conversion: meaning, which populations of HSPCs are being produced by adult ECs in the first instance? LT-HSCs? ST-HSCs? MPPs? GMPs? All of the above? What are the kinetics? Differentiation is likely to follow a hierarchical path, but this is unclear at the moment.

      We agree that defining the earliest EC-derived hematopoietic cell progenitors and the kinetics by which these progenitors appear (LT-HSC vs ST-HSC/MPP vs lineage-restricted progenitors) would provide important insights into adult EHT.

      In the current genetic labeling system, a rigorous kinetic analysis of hematopoietic cells first generated by EC-derived in vivo is not straightforward. Specifically, the low-level baseline reporter ZsGreen<sup>+</sup> fluorescence in hematopoietic cells (dependent on EHT occurring prenatally, perinatally or in young mice or other causes (Figure 1 A-D and Figure S1 D-I) impairs identification of newly generated ZsGreen<sup>+</sup> progenitors at early time points and distinguish them from baseline fluorescence. A potential solution might be to introduce serial harvests across multiple time-points in large mouse cohorts to capture rare transitional events with statistical significance.

      We wish to emphasize that the primary objective of this study was to establish whether adult bone marrow ECs have a hemogenic potential. Our data demonstrate adult EC-derived hematopoietic cell output that includes progenitor-containing fractions and multilineage mature progeny, under both steady-state conditions. We acknowledge that the current work does not resolve the order and kinetics of hematopoietic cell emergence following EHT. Therefore, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      One warning sign is how rare the reported phenomenon is. Even when labeling almost 90% of the BM ECs, these make at most ~3% of blood (less than 1% in the transplants in Figure 4F, less than 0.5% in the col1a2 tracing in Figure 7). This means this is a very rare and/or transient phenomenon... The most major warning sign is the fast kinetics of labeling and the fast plateau. We know that: a) differentiation typically follows some hierarchy, b) in situ dynamics of blood production are slow (work by Rodewald and Höfer). Considering how fast these populations need to be replaced to reach a steady state so rapidly (as reported here, 2-4 weeks), the presumably specialized ECs would need to be steadily dividing and producing hematopoietic cells at a fast pace (as a side prediction, the adult "EHT" cluster would likely be highly Mki67+). More importantly, the ZsGreen LSKs produced by the ECs would have to undergo VERY rapid differentiation (much faster than normal LSKs) or otherwise, if 3% of them are produced by a top compartment (the BM ECs) every 4 weeks, then the labeled population would continue to grow with time. The authors could try to challenge this by testing if the ZsGreen LSKs undergo much faster differentiation kinetics or lower self-renewal (which does not seem to be the case, at least in their own transplantation data). We believe a more likely explanation is that the label is being acquired more or less non-specifically, directly across a bunch of HSPC populations.

      The Reviewer correctly notes that that the population of hemogenic ECs in the adult mouse bone marrow is small and the output of hematopoietic cells from these hemogenic ECs accounts for at most 3% of blood cells. We agree that delineating the kinetics by which hematopoietic cells are generated from adult EC is important, as this information would provide important insights into adult EHT.

      Nonetheless, we believe that the rapid appearance and early plateau of labeled blood cells in our experiments may not derive from a sustained, high-rate generation of labeled blood cells from self-renewing top-tier hematopoietic cell compartments, such as LT-HSCs. Rather, our data are more consistent with a predominantly lineage-restricted and biased hematopoietic progenitor cell population being the source of labeled blood cells. Supporting this interpretation, longitudinal analysis of peripheral blood shows that EGFP<sup>+</sup> PBMCs are consistently enriched with myeloid cells, whereas EGFP<sup>-</sup> PBMCs are predominantly B cells (Figure 4G and H). This myeloid lineage skewing is stable over time and contrasts with what would be expected if labeling were acquired broadly and nonspecifically across the hematopoietic hierarchy. Therefore, our results are more consistent with myeloid biased progenitors being among the first populations that EHT generates.

      We acknowledge that our studies do not identify the earliest endothelial-derived hematopoietic cells produced in vivo, and do not define their differentiation kinetics. Addressing rigorously these questions would require temporally resolved lineage tracing with sufficiently powered cohorts at early time point to statistically distinguish from baseline reporter background. These important experiments were beyond the scope of the present study. As noted above, under “Limitations of the study” we explicitly state this limitation and frame the identification of the earliest endothelial-derived progenitors and their kinetics as an important direction for future work.

      Transplant experiments in Figure 4 do offer a crucial experiment in support of the main conclusion of the manuscript. These experiments show that transplanted LSKs bearing the Cdh5-CreERT2 and ZsGreen reporter cannot acquire the tamoxifen-induced label post-transplantation - suggesting that the label is coming from ECs. However, it is also possible that the LSK Cdh5-CreERT expression is partly during the transplantation process... Indeed, we know through the aging data that the labeling is less active in aged mice. In any case, this would be verified by qPCR/western-blot (comparing native vs post-transplant LSKs).

      We agree with the Reviewer that the experiment in Figure 4A-D “offer a crucial experiment in support of the main conclusion of the manuscript.” The results of this experiment show that ZsGreen negative LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not acquire tamoxifen-induced ZsGreen fluorescence post transplantation, supporting the endothelial cell origin of blood ZsGreen<sup>+ </sup>cells.

      The Reviewer raises the possibility a “that the LSK Cdh5-CreERT expression is partly during the transplantation process... , and that this Cdh5-CreERT expression may occur slowly as learned “through the aging data that the labeling is less active in aged mice.” As we show in Figure 3F, tamoxifen administration induced a similar percentage of ZsGreen<sup>+ </sup>ECs in the bone marrow of Cdh5-Cre<sup>ERT2</sup>(BAC)/ZsGreen mice, whether tamoxifen was administered to 6-week-old, 16-week-old, 26-week-old or 36-week-old mice. Similar results with Cdh5-CreERT2 (BAC) mice are reported in the literature[2]. Since the mice transplanted with ZsGreen<sup>-</sup> LSKs were followed for 25 weeks after tamoxifen administration, we believe that the results in Figure 4A-D address the concern raised by the Reviewer.

      Supporting the conclusion that LSKs from the Cdh5-CreERT2-ZsGreen reporter mice do not express the Cdh5-CreERT2 under a native -non-transplant- setting, we now provide transcriptomic data from Cdh5-CreERT2/ZsGreen mice (not transplanted) showing that CreERT2 expression closely tracks with expression of canonical endothelial markers (Cdh5, Pecam1, Emcn) and is not detectable in Ptprc/CD45-expressing hematopoietic cells (Author response images 1 and 2). These data were obtained from non-transplanted mice treated with tamoxifen at ~12 weeks of age and analyzed four weeks later. Together, these results indicate that CreERT2 expression is endothelial-restricted in Cdh5-CreERT2-ZsGreen reporter mice.

      Figure 5 presents PolyLox experiments to challenge whether adult ECs produce hematopoietic cells through in situ barcoding. Several important details of the experiment are missing in the main text (how many cells were labeled, at which time point, how long after induction were the cells sampled, how many bones/BM-cells were used for the sample preparation, what was the sampling rate per population after sorting, how many total barcodes were detected per population, how many were discarded/kept, what was the clone-size/abundance per compartment). As presented, the authors imply that 31 out of ~200 EC barcodes are shared with hematopoietic cells... This would suggest that ~15% of endothelial cells are producing hematopoietic cells at steady state. This does not align well with the rarity of the behavior and the steady state kinetics (unless any BM EC could stochastically produce hematopoietic cells every couple of weeks, or if the clonality of the BM EC compartment would be drastically reduced during the pulse-chase overlap with mesenchymal cells. Important controls are missing, such as what would be the overlap with a population that is known to be phylogenetically unrelated (e.g., how many of these barcodes would be found by random chance at this same Pgen cut-off in a second induced mouse). Also, the Pgen value could be plotted directly to see whether the clones with more overlapping populations/cells (3HG, 127, 125, CBA) also have a higher Pgen. We posit that there are large numbers of hematopoietic clones that contribute to adult hematopoiesis (anywhere from 2,000-20,000 clones would be producing granulocytes after 16 weeks post chase), and it would be easy to find clones that overlap with granulocytes (the most abundant and easily sampled population) - HSPCs would be the more stringent metric.

      We thank the Reviewer for highlighting the need for a more detailed description of the Polylox experiments. To address this deficiency, we have compiled a document (Additional Supplementary Information file) containing all the specifics of the Polylox experimental and analytical parameters in one location. This includes: (i) the number of cells analyzed per population, (ii) the time points of induction and sample collection, (iii) the number of bones and total bone marrow cells used for preparation, (iv) the sampling rate following cell sorting, (v) the total number of detected barcodes per population, (vi) barcode filtering criteria and numbers retained or discarded, and (vii) clone-size and barcode number across cell compartments. We have updated the manuscript to refer readers to this Supplementary file.

      The Reviewer concluded from our results (Figure 5, Figure S5) that 31 out of ~200 endothelial cell (EC) barcodes shared with hematopoietic cells (HCs), implying that ~15% of ECs produce hematopoietic cell progeny at steady state. This interpretation in inconsistent with our data showing the rare nature of adult EHT and would require either that a large fraction of bone-marrow ECs can generate hematopoietic cells within short time windows, or that EC would clonally expand rapidly during the pulse-chase period, as noted by the Reviewer. The explanation for this apparent problem is technical. Briefly, the ~200 EC barcodes recovered do not represent all barcoded ECs. During Polylox barcode library construction, a mandatory size-selection step is applied prior to PacBio sequencing, retaining fragments that are approximately 800–1500 bp in length, whereas the full Polylox cassette spans ~2800 bp. This is mainly because the PacBio sequencer requires that the library be either 800-1500bp or over 2500bp, for optimal sequencing results. As described in the original Polylox publication[3,4], this size selection eliminates most (approximately 75%) longer barcodes, together with ~85% of the shorter barcodes. Thus, ECs harboring very long or short recombined barcodes are under-represented or excluded from sequencing. As a result, the 22 true barcodes linking ECs and HCs recovered from sequencing do not indicate that ~10–15% of ECs generate hematopoietic progeny. Rather, these barcodes represent a highly selected subset of ECs with barcode configurations compatible with library recovery and sequencing. The observed EC–HC barcode sharing thus reflects qualitative lineage connectivity, not the quantitative frequency of endothelial-derived hematopoiesis at steady state.

      The Reviewer correctly notes that true Polylox barcodes are shared by ECs and mesenchymal-type cells and asks that we examine whether this overlap could occur by chance alone. The Polylox filtering threshold (pGen < 1 × 10<sup>-6</sup>), that we have revised for stringency (from pGen < 1 × 10<sup>-4</sup>, without altering the essential results; new Figure S4 and revised Figure 5C-F) renders such overlap exceedingly unlikely. At this threshold, the expected number of random recombination events among 4,069 barcoded cells is approximately 0.004. Consequently, among the 87 mesenchymal cells identified here, fewer than 0.4 cells would be expected, to share a barcode with another cell by chance alone. Thus, the probability of recovering identical barcodes across unrelated lineages due to random recombination is vanishingly small, and the observed EC–mesenchymal barcode sharing substantially exceeds random expectation.

      Related to this observation, the Reviewer correctly notes that the endothelial and mesenchymal cell lineages are phylogenetically unrelated. However, endothelial-to-mesenchymal cell transition (EndMT), the process by which normal ECs completely or partially lose their endothelial identity and acquire expression of mesenchymal markers, is a well-established process that occurs physiologically and in disease states (Simons M Curr Opin Physiol 2023). In the bone marrow, the occurrence of EndMT has been documented in patients with myelofibrosis, and the process affects the bone marrow microvasculature (Erba BG et al The Amer J Patholl 2017). Single-cell RNAseq of non-hematopoietic bone marrow cells has shown the existence of a rare population of ECs that co-expresses endothelial cell markers (Cdh5, Kdr, Emcm and others) and the mesenchymal cell markers, as shown in Figure 6E and F.

      We fully agree with the Reviewer that given the large number of hematopoietic clones contributing to adult hematopoiesis -particularly granulocyte-producing clones- it may be relatively easy to detect barcode overlap with abundant mature populations, whereas overlap with HSPCs would represent a more stringent and informative metric of lineage relationships. The Polylox results presented here show the sharing of true barcodes between individual ECs and HSPC.

      Reviewer #2 (Public review):

      Summary:

      Feng, Jing-Xin et al. studied the hemogenic capacity of the endothelial cells in the adult mouse bone marrow. Using Cdh5-CreERT2 in vivo inducible system, though rare, they characterized a subset of endothelial cells expressing hematopoietic markers that were transplantable. They suggested that the endothelial cells need the support of stromal cells to acquire blood-forming capacity ex vivo. These endothelial cells were transplantable and contributed to hematopoiesis with ca. 1% chimerism in a stress hematopoiesis condition (5-FU) and recruited to the peritoneal cavity upon Thioglycolate treatment. Ultimately, the authors detailed the blood lineage generation of the adult endothelial cells in a single cell fashion, suggesting a predominant HSPCs-independent blood formation by adult bone marrow endothelial cells, in addition to the discovery of Col1a2+ endothelial cells with blood-forming potential, corresponding to their high Runx1 expressing property.

      The conclusion regarding the characterization of hematopoietic-related endothelial cells in adult bone marrow is well supported by data. However, the paper would be more convincing, if the function of the endothelial cells were characterized more rigorously.

      We thank the Reviewer for the supportive comments about our study.

      (1) Ex vivo culture of CD45-VE-Cadherin+ZsGreen EC cells generated CD45+ZsGreen+ hematopoietic cells. However, given that FACS sorting can never achieve 100% purity, there is a concern that hematopoietic cells might arise from the ones that got contaminated into the culture at the time of sorting. The sorting purity and time course analysis of ex vivo culture should be shown to exclude the possibility.

      We agree that FACS sorting can never achieve 100% cell purity and that sorting purity is critical for interpreting the ex vivo culture experiments presented in our study. As requested by the Reviewer, we have now documented the purity of the sorted endothelial cell (EC) population used in the ex vivo culture experiments. The post-sort purity of CD45<sup->/sup>VE-cadherin<sup>+</sup>ZsGreen<sup>+</sup> ECs was 96.5 %; this data is now shown in the revised Figure 2B (Post Sort Purity panel). This purity level is comparable to purity levels of sorted ECs shown in Figure S2I (94.5 %).

      While we agree that a detailed time-course analysis of hematopoietic cell output from EC cultures could further strengthen the conclusion that bone marrow ECs can produce hematopoietic cells ex vivo, we wish to call attention to the additional critical control in the experiment shown in Figure 2B-D. In this experiment, we co-cultured CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells from Cdh5-CreERT2/ZsGreen mice, rather than ECs, and examined if these hematopoietic cells could produce ZsGreen<sup>+</sup> cell progeny after 8-week culture under the same conditions used in EC co-cultures (conditions not designed to support hematopoietic cells long-term). Unlike ECs, the CD45<sup>+</sup>ZsGreen<sup>+</sup> hematopoietic cells did not generate ZsGreen<sup>+</sup> hematopoietic cells at the end of the 8-week culture, indicating that the culture conditions are not permissive for the maintenance, proliferation and differentiation of hematopoietic cells. This provides strong evidence that even if few hematopoietic cells contaminated the sorted ECs, these hematopoietic cells would not contribute to EC-derived production of hematopoietic cells at the 8-week time-point. We have revised the text of the results describing the results of Figure 2B-D.

      (2) Although it was mentioned in the text that the experimental mice survived up to 12 weeks after lethal irradiation and transplantation, the time-course kinetics of donor cell repopulation (>12 weeks) would add a precise and convincing evaluation. This would be absolutely needed as the chimerism kinetics can allow us to guess what repopulation they were (HSC versus progenitors). Moreover, data on either bone marrow chimerism assessing phenotypic LT-HSC and/or secondary transplantation would dramatically strengthen the manuscript.

      The original manuscript reported survival and engraftment up to 12 weeks post transplantation. The recipient mice have now been monitored for up to 10 months post transplantation. These extended survival and engraftment data are now included in the revised Figure 2I and J replacing the previous 10-week analyses.

      We agree with the Reviewer that the time-course kinetics of donor cell repopulation would help define adult endothelial to hematopoietic transition (EHT) and the hematopoietic cell types produced by adult (EHT). We did not perform serial time-course sampling of peripheral blood beyond the 10-week and the 10-month time-points. Given that the recipient mice were lethally irradiated with increased susceptibility to infection, we sought to minimize repeated interventions that could compromise animal health and survival. We therefore prioritized long-term survival and endpoint analysis over repeated longitudinal sampling. Nonetheless, the long-term survival,10 months, and multilineage hematopoietic cell reconstitution after lethal irradiation provides functional evidence that adult EHT produced at least some LT-HSC.

      We acknowledge that phenotypic assessment of bone marrow LT-HSC chimerism /or secondary transplantation would further strengthen the manuscript. We have clarified these limitations in the revised manuscript under “Limitations of the study”.

      (3) The conclusion by the authors, which says "Adult EHT is independent of pre-existing hematopoietic cell progenitors", is not fully supported by the experimental evidence provided (Figure 4 and Figure S3). More recipients with ZsGreen+ LSK must be tested.

      We agree with the Reviewer that, in most cases, a larger number of experimental data points is helpful to strengthen the conclusions, and that having additional mice transplanted with ZsGreen-enriched LSK would be desirable. However, we do not believe that additional mice transplanted with ZsGreen LSKs would strengthen the conclusions drawn from the experimental results shown in Figure 4D, in which we used 6 mice transplanted with ZsGreen-depleted (ZsGreen<sup>-</sup>) LSKs and 2 mice transplanted with ZsGreen<sup>+</sup>-enriched (ZsGreen<sup>+</sup>) LSKs. The independence of adult EHT from “pre-existing hematopoietic cell progenitors” is based on the following experimental results and conclusion from these results.

      First, ZsGreen<sup>-</sup> LSKs (purity 99%) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 6). These ZsGreen<sup>-</sup> LSKs robustly reconstituted hematopoiesis, demonstrating successful engraftment. Importantly, tamoxifen administration to the recipients of ZsGreen<sup>-</sup> LSKs produced no detectable ZsGreen<sup>+</sup> cells in the blood for up to 6 months post transplantation (Figure 4D, blue line encompassing the results of the 6 mice). This result demonstrates that the transplanted ZsGreen<sup>-</sup> hematopoietic progenitors and their progeny do not acquire ZsGreen labeling in vivo following tamoxifen treatment, indicating that they lack the Cre-recombinase. This result is consistent with the endothelial specificity of Cdh5 expression.

      Second, ZsGreen<sup>+</sup> LSKs (accounting for ~50% of the LSKs) isolated from Cdh5-CreERT2/ZsGreen mice were transplanted into lethally irradiated WT recipients (n = 2). This arm of the experiment was performed in part as a technical control to confirm successful engraftment and detection of ZsGreen<sup>+</sup> hematopoietic cells in the transplant setting. Importantly, tamoxifen administration to the two recipients of ZsGreen<sup>+</sup> LSKs (Figure 4D, two green lines reflecting these two mice) show that the level of ZsGreen<sup>+</sup> blood cells stabilized in each of the mice between week 10 and 24, showing equilibrium between the proportion of ZsGreen<sup>+</sup> and ZsGreen<sup>-</sup>cells in the blood. This indicates that pre-existing ZsGreen<sup>+</sup> LSK are not responsible for tamoxifen-induced increases in ZsGreen<sup>+</sup> hematopoietic cell in blood.

      Together, the results from this experiment demonstrate that in the setting of transplantation, tamoxifen does not induce ZsGreen labeling of ZsGreen- hematopoietic progenitors/their progeny. This result strongly supports the conclusion that ZsGreen⁺ hematopoietic cells arise independently of pre-existing or inducible hematopoietic progenitors. We have revised the text to clarify these experiments and to present the results in a simplified manner.

      Strengths:

      The authors used multiple methods to characterize the blood-forming capacity of the genetically - and phenotypically - defined endothelial cells from several reporter mouse systems. The polylox barcoding method to trace the adult bone marrow endothelial cell contribution to hematopoiesis is a strong insight to estimate the lineage contribution.

      Weaknesses:

      It is unclear what the biological significance of the blood cells de novo generated from the adult bone marrow endothelial cells is. Moreover, since the frequency is very rare (<1% bone marrow and peripheral blood CD45+), more data regarding its identity (function, morphology, and markers) are needed to clearly exclude the possibility of contamination/mosaicism of the reporter mice system used.

      We agree that the biological significance and functional roles of hematopoietic cells generated de novo from adult bone marrow ECs remain important open questions. We also agree that the output of hematopoietic cells from adult EHT is low, but rare events can be important, particularly as they pertain to stem/progenitor cell biology. Both points are described under “Limitations of the study”. The primary goal of the present study was to address the question whether adult bone marrow ECs can undergo EHT. We believe that the combination of various mouse transgenic lines, different Cre-ER, different reporters (ZsGreen and mTmG), including the s.c. barcoding reporter (PolyloxExpress), different approaches to evaluate hematopoiesis in vivo and ex vivo, makes it rather unlikely that our conclusions are driven by an artifact related to a specific leaky reporter, contamination, or problems with one of the Cre-lines. The experiment where we find no tamoxifen-induced labeling of transplanted ZsGreen<sup>-</sup> LSKs derived from the Cdh5-CreERT2/ZsGreen mice is strongly supportive of the existence of adult EHT, virtually excluding a contribution of contaminant hematopoietic cells.

      Reviewer 2 Recommendations for the authors:

      (1) There is a discrepancy in the proportion of peripheral blood composition between different reporters (mTmG and ZsGreen) (Figure 1G and Figure S1K), especially the contrasting B cell proportion between both models. The additional comments on this data should be mentioned.

      In the revised Results section, we now note that the mTmG and ZsGreen reporters show slightly different efficiencies or kinetics of labeling. These differences have previously been reported[5] and have been attributed to relative reporter leakiness, sensitivity to tamoxifen, or different kinetics of Cre recombination. As suggested, these comments have been added to the text following the description of (Figure S2A).

      (2) Experimental methods concerning cell transplantation/transfer need more information, such as: a) using or not using rescue cells and how many cells are they if using, b) single or split dose of irradiation, c) when were cells transplanted following irradiation, etc. Otherwise, the data are uninterpretable.

      We have ensured that the Material and Methods section under “Bone marrow ablation and transplantation” contains all the information requested by the Reviewer.

      (3) Some of the grouped data haven't been statistically analyzed.

      We have reviewed all data and performed appropriate statistical analyses where comparisons were made. In the revised figures and legends, all grouped datasets now include statistical tests and p-values are indicated (added to Fig. 3H and I; Figure 4G).

      (4) Some flowcytometry plot has the quantitative number, others do not. The quantitative information is absolutely needed in all flow cytometry plots.

      We have updated the flow cytometry figures to include quantitative values (percentages or absolute counts) in all relevant plots (2B (new figure, bottom left); 2C; S1G, S1H).

      (5) It is more relevant to present the Emcn/VE-Cadherin plot from gated CD45+/ZsGreen+, not the CD45-/ZsGreen+ fraction (Figure 2C), as the latter were not the EHT-derived offspring, but rather the common phenotypic endothelial cells

      As requested, we have added the suggested flow cytometry plot. The revised Figure 2C now includes an Emcn vs. VE-Cadherin plot from the gated CD45<sup>+</sup>ZsGreen<sup>+</sup> population. This complements the existing panel and confirms that the cells of interest retain endothelial cell markers after culture, while the CD45<sup>+</sup>ZsGreen<sup>+</sup> cells did not express endothelial markers. The figure legend has been updated to explain the new panel. We agree that this plot more directly highlights the phenotype of the presumed EHT-derived cells.

      (6) To show the effect of the ex vivo culture, the authors should present the absolute number of CD45+ZsGreen+ cells in the pre-/post-culture; otherwise, the data are uninterpretable (Figure 2D).

      Our interpretation of the Reviewer’s comment above (relative to the experiment shown in Figure 2B-D) is that the Reviewer would like that we provide the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells introduced into the co-culture (supplemented with unsorted BM cells, ZsGreen<sup>+</sup> hematopoietic cell or ZsGreen<sup>+</sup> ECs) and the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. Currently, the results in Figure 2D show the absolute number of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells recovered at the end of the 8-week culture. The input of CD45<sup>+</sup>ZsGreen<sup>+</sup> cells for unsorted BM cells was 2.93e6 on average; for ZsGreen<sup>+</sup> hematopoietic cells was 1.68e6 on average and from sorted ZsGreen<sup>+</sup> ECs was estimate up to 100.

      (7) It is confusing to see Figures 2F and 2G, which apparently show the data from the middle of the experimental procedure (Figure 2E). Those data should be labelled clearly regarding which procedures of the whole experiment protocol.

      As correctly noted by the Reviewer, Figures 2F and 2G provide data that relate to the middle of the graphical representation of the experiment shown in Figure 2E. We see how this may be confusing.

      Therefore, we have updated both the figure labeling and legend to explicitly indicate that Figure 2F and 2G provide the FACS sorting results for the cells used for transplantation. The revised legend now reads: “Representative flow cytometry plots of the non-adherent cell fraction after 8 weeks of co-culture (cells used for transplantation).”

      References

      (1) Kucinski, I., Campos, J., Barile, M., Severi, F., Bohin, N., Moreira, P.N., Allen, L., Lawson, H., Haltalli, M.L.R., Kinston, S.J., et al. (2024). A time- and single-cell-resolved model of murine bone marrow hematopoiesis. Cell Stem Cell 31, 244-259.e10. https://doi.org/10.1016/j.stem.2023.12.001.

      (2) Identification of a clonally expanding haematopoietic compartment in bone marrow | The EMBO Journal | Springer Nature Link https://link.springer.com/article/10.1038/emboj.2012.308.

      (3) Pei, W., Shang, F., Wang, X., Fanti, A.-K., Greco, A., Busch, K., Klapproth, K., Zhang, Q., Quedenau, C., Sauer, S., et al. (2020). Resolving Fates and Single-Cell Transcriptomes of Hematopoietic Stem Cell Clones by PolyloxExpress Barcoding. Cell Stem Cell 27, 383-395.e8. https://doi.org/10.1016/j.stem.2020.07.018.

      (4) Pei, W., Feyerabend, T.B., Rössler, J., Wang, X., Postrach, D., Busch, K., Rode, I., Klapproth, K., Dietlein, N., Quedenau, C., et al. (2017). Polylox barcoding reveals haematopoietic stem cell fates realized in vivo. Nature 548, 456–460. https://doi.org/10.1038/nature23653.

      (5) Álvarez-Aznar, A., Martínez-Corral, I., Daubel, N., Betsholtz, C., Mäkinen, T., and Gaengel, K. (2020). Tamoxifen-independent recombination of reporter genes limits lineage tracing and mosaic analysis using CreERT2 lines. Transgenic Res 29, 53–68. https://doi.org/10.1007/s11248-019-00177-8.

    1. Le Pilotage Partagé et le Rôle Stratégique de la Vie Scolaire

      Synthèse

      Ce document analyse les mutations profondes de la Vie Scolaire au sein du système éducatif français, passant d'un service de gestion des flux et de la discipline à un levier stratégique de réussite.

      Le Conseiller Principal d'Éducation (CPE) s'affirme désormais comme un cadre pivot, expert du climat scolaire et de la socialisation, dont l'action s'articule étroitement avec le pilotage pédagogique.

      La réussite d'un établissement repose sur la transition d'une « Vie Scolaire qui gère » vers une « Vie Scolaire qui pilote », intégrée dans une dynamique de pilotage partagé fondée sur la confiance, la délégation réelle et la déconstruction de la frontière artificielle entre l'éducatif et le pédagogique.

      --------------------------------------------------------------------------------

      1. Perspectives Historiques et Mutation des Enjeux

      L'évolution du système éducatif a transformé la Vie Scolaire en un point névralgique de l'établissement.

      • L'héritage du collège unique : Depuis 50 ans, l'accueil de l'intégralité d'une génération a radicalement changé la donne par rapport au modèle historique qui ne scolarisait qu'une infime partie de la population.

      • De la surveillance au pilotage : Historiquement centrée sur les « surveillants généraux » gérant ce qui se passait hors de la classe, la fonction a évolué vers le corps des CPE.

      Cette mutation reconnaît l'importance du contexte de scolarisation (climat, santé mentale, inclusion) dans la réussite des apprentissages.

      • La singularité française : Le CPE est une exception française issue de la séparation précoce entre la transmission des savoirs (domaine des enseignants) et les tâches d'encadrement, longtemps jugées moins légitimes par le corps professoral.

      --------------------------------------------------------------------------------

      2. Le CPE : Un Cadre Pivot et non un Exécutant

      Le positionnement institutionnel du CPE est celui d'un cadre à part entière, dont les missions dépassent la simple application du règlement intérieur.

      Un statut de cadre affirmé

      Le CPE n'est pas un exécutant des décisions de la direction, mais un conseiller technique associé aux décisions éducatives et pédagogiques.

      • Expertise des interstices : Le CPE agit dans les articulations du système, offrant une vision globale de l'élève.

      • Responsabilité de service : Il dirige le service de Vie Scolaire (AED) et participe à l'élaboration de la politique éducative de l'établissement (circulaire de mission de 2015).

      Les caractéristiques du métier

      | Caractéristique | Impact sur le fonctionnement | | --- | --- | | Position d'entre-deux | Le CPE se situe à l'interface entre la direction, les enseignants, les élèves et les familles. | | Minorité numérique | Souvent seul ou en très petit nombre (1 à 5 par établissement), il doit déployer une intelligence collective pour agir sur une masse importante d'enseignants. | | Polyvalence | Il alterne entre la gestion de l'urgence (discipline, crises) et le travail de long terme (projets, engagement lycéen/collégien). |

      --------------------------------------------------------------------------------

      3. Déconstruire la Frontière Éducatif / Pédagogique

      Le document souligne que la distinction entre l'éducatif (Vie Scolaire) et le pédagogique (Classe) est un « mythe républicain » de moins en moins pertinent.

      • Interdépendance : Les modalités d'éducation participent directement à la qualité des apprentissages. Un élève ne peut apprendre correctement si le climat scolaire ou sa santé mentale est dégradé.

      • Le Conseil Pédagogique : La présence du CPE dans cette instance est jugée symboliquement et stratégiquement essentielle.

      Elle permet d'intégrer la vision éducative aux réflexions sur les apprentissages et vice versa.

      • Projets communs : Le CPE devient un animateur de projets où les élèves sont acteurs (CVC, CVL), transformant la vie scolaire en un espace d'apprentissage de la citoyenneté.

      --------------------------------------------------------------------------------

      4. Les Clés du Pilotage Partagé

      Le terme « équipe de direction élargie » est parfois jugé obsolète ou sémantiquement complexe ; les experts privilégient la notion de pilotage partagé.

      Conditions de réussite du pilotage

      • La Confiance : Elle est la condition sine qua non. Le chef d'établissement doit créer un écosystème où la parole est libre et les points de vue contraires acceptés.

      • La Délégation Réelle : Déléguer signifie confier des dossiers entiers (et non des tâches) en octroyant une autonomie de décision et les moyens d'agir.

      • L'Expertise Reconnue : Le CPE doit être sollicité pour son expertise technique lors de l'élaboration de l'ordre du jour des réunions de direction.

      La métaphore du « vol en escadrille »

      Plutôt qu'un modèle pyramidal où seul le pilote (chef d'établissement) décide, le pilotage partagé ressemble à un vol en escadrille :

      • Chacun connaît sa place et sa mission.

      • La configuration est agile et permet de réagir rapidement aux difficultés.

      • Personne ne « vole seul », renforçant le sentiment d'appartenance à une communauté professionnelle.

      --------------------------------------------------------------------------------

      5. Rituels de Gouvernance et Outils Stratégiques

      Pour éviter l'isolement de la Vie Scolaire, plusieurs dispositifs concrets sont préconisés :

      • Le Comité de Direction (Codir) : Réunion hebdomadaire (souvent d'une durée de 2 heures) associant les cadres (Direction, CPE, Gestionnaire) pour définir la stratégie et suivre les dossiers.

      • Le Projet de Service : Document qui définit les priorités de la Vie Scolaire en fonction du contexte local (urbain, rural, industriel), permettant de réguler la charge de travail et d'éviter le sentiment de « barque surchargée ».

      • La Politique de la Porte Ouverte : Accessibilité mutuelle permanente entre le chef d'établissement et le CPE pour réguler les urgences et désamorcer les tensions.

      • Comités de pilotage ciblés : Instances courtes et gratifiantes regroupant les acteurs responsables de dossiers spécifiques (harcèlement, décrochage, santé mentale).

      --------------------------------------------------------------------------------

      6. Nouveaux Défis : Santé Mentale et Engagement

      Depuis la crise du Covid-19, de nouvelles problématiques ont émergé, renforçant le rôle de veille du CPE.

      • Santé mentale : On observe une libération de la parole des élèves et une augmentation des situations de mal-être. Les services de Vie Scolaire sont les premiers réceptacles de cette détresse.

      • Harcèlement (Programme Phare) : La formation accrue des personnels permet de briser les tabous et de passer d'une simple surveillance à un accompagnement global.

      • Engagement des élèves : Le CPE doit « faire avec » les élèves et non « pour » eux, en les responsabilisant dans des instances comme le CVC ou le CVL pour garantir la pérennité des projets.

      --------------------------------------------------------------------------------

      Citations Clés

      « On ne passe pas d'une vie scolaire qui gère à une vie scolaire qui pilote sans une équipe de direction réellement partagée. »

      « Le CPE est le métier des interstices, des articulations, celui qui essaie de faire du lien. »

      « La délégation, c'est garantir à chacun sa capacité d'agir. Ce n'est pas consulter, c'est confier la responsabilité. »

      « La Vie Scolaire n'est pas une périphérie, c'est un cœur battant de l'établissement. »

    1. 1. What are some personal experiences you have that relate to larger social issues? I was a medical provider back home in Nepal. During rural rotation, I saw a pregnant woman walk up to 4 hours to reach a basic health check post and denied basic care because she did not have insurance and could not pay out of pocket. I was a medical provider back home in Nepal. During rural rotation, I saw a pregnant woman walk up to 4 hours to reach a basic health check post and denied basic care because she did not have insurance and could not pay out of pocket. 2. Why are these issues important to you? This experience is important to me, it shows how even the most basic healthcare is inaccessible to people in Nepal. Irony is that Nepal's constitution guarantees basic access to health care to everyone. This moment forced me to question how if a pregnement woman walking 4 hours in labour can be denied care , what would be the fate of others facing serious issues. This experience is important to me, it shows how even the most basic healthcare is inaccessible to people in Nepal. Irony is that Nepal's constitution guarantees basic access to health care to everyone. This moment forced me to question how if a pregnement woman walking 4 hours in labour can be denied care , what would be the fate of others facing serious issues. 3. Choose one of these issues to discuss: Why or how would your research of this issue represent an example of “intrinsic motivation” for you? Universal health care to basic medical care in Nepal: i would research this topic because that is one of the most basic need that a citizen would expect from their government. I want learn more on how my country is not able to provide basic acess to healthcare. Universal health care to basic medical care in Nepal: i would research this topic because that is one of the most basic need that a citizen would expect from their government. I want learn more on how my country is not able to provide basic acess to healthcare. Name: Kamal Shrestha
      1. What are some personal experiences you have that relate to larger social issues? I was a medical provider back home in Nepal. During rural rotation, I saw a pregnant woman walk up to 4 hours to reach a basic health check post and denied basic care because she did not have insurance and could not pay out of pocket.
      2. Why are these issues important to you? This experience is important to me, it shows how even the most basic healthcare is inaccessible to people in Nepal. Irony is that Nepal's constitution guarantees basic access to health care to everyone. This moment forced me to question how if a pregnement woman walking 4 hours in labour can be denied care , what would be the fate of others facing serious issues.
      3. Choose one of these issues to discuss: Why or how would your research of this issue represent an example of “intrinsic motivation” for you? Universal health care to basic medical care in Nepal: i would research this topic because that is one of the most basic need that a citizen would expect from their government. I want learn more on how my country is not able to provide basic acess to healthcare. Name: Kamal Shrestha
    1. Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues. In particular, the majority of "autoimmunity-related TCRs" considered in this study are in fact specific to type 1 diabetes (T1D). Notably, T1D incidence is higher in males, which directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. Given this conceptual inconsistency, the evidence presented does not support the authors' conclusions.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      Weaknesses:

      I thank the authors for their detailed responses to my previous comments. Several concerns were addressed satisfactorily; however, important issues remain unresolved, and a new major concern has emerged from the revised manuscript.

      Major concerns:

      (1) Autoimmune specificity is dominated by T1D, contradicting the study's premise. Newly added supplementary Table 3 shows that the authors considered only 14 autoimmune-related epitopes, of which 12 are associated with type 1 diabetes (T1D) and 2 with celiac disease (CeD). (I guess this is because identification of particular peptide autoantigens is an extremely difficult task and was only successful in T1D and CeD.) Thus conclusions of this work mostly relate to T1D. However, the incidence of T1D is higher in males than in females (e.g. doi:10.1111/j.1365-2796.2007.01896.x; doi:10.25646/11439.2). This directly contradicts the stated objective of the study - to explain the higher prevalence of autoimmune diseases in women. As a result, the authors' conclusions (a) cannot be generalized to autoimmune disease as a whole as the authors only considered T1D and CeD antigens and (b) are internally inconsistent with the stated objective of the study.

      (2) By contrast, CeD does show a female bias (~60/40 female/male; doi:10.1016/j.cgh.2018.11.013). However, the manuscript does not allow evaluation of how much the reported "autoimmune TCR enrichment" derives from T1D versus CeD. Despite my previous request, the authors did not provide per-donor and per-epitope distributions of autoimmune-specific TCR matches. I therefore explicitly request a table in which: each row corresponds to a specific autoimmune antigen; each column corresponds to a donor (with metadata available including sex); each cell reports the number of unique TCRs specific to that antigen in that donor. Without such data, the conclusions cannot be evaluated.

      (3) It is scientifically inappropriate to generalize findings to "autoimmune diseases" when only T1D and CeD were analyzed. Moreover, given that T1D and CeD show opposite directions of sex bias, combining them into a single "AID" category is misleading. All analyses presented in Figure 8 and Supplementary Figure 16 should be repeated and shown separately for T1D and CeD, rather than combined.

      (4) The McPAS database contains TCRs associated with other autoimmune diseases (e.g., multiple sclerosis, rheumatoid arthritis), although the exact autoantigens in these contexts are unknown. Why didn't the authors perform the search for such TCRs? I believe disease association even without particular known antigen could still be insightful.

      (5) Misuse of the concept of polyspecificity. I appreciate the authors' reference to Don Mason's work; however, the concept of polyspecificity discussed there is fundamentally different from the authors' usage. Mason, Sewell (doi:10.1074/jbc.M111.289488), Garcia (doi:10.1016/j.cell.2014.03.047), and others demonstrated that individual TCRs can recognize multiple peptides, possibly around 1 million. But importantly these peptides are not random but share some sequence motif. This is a general feature of TCRs, i.e. 100% of TCRs are polyspecific in this sense.<br /> In contrast, the authors define polyspecificity as TRB sequences annotated as specific to unrelated epitopes in TCR databases such as VDJdb. These databases are well known to contain substantial numbers of false-positive annotations (see, e.g., Ton Schumacher's preprint https://www.biorxiv.org/content/10.1101/2025.04.28.651095.abstract). The authors acknowledge that, under their definition, polyspecificity has been experimentally validated for only one (!) TCR (Quiniou et al.). In the absence of robust experimental validation, use of the term "polyspecificity" in this context is misleading. I strongly recommend removing all analyses and conclusions related to polyspecificity from the manuscript unless supported by independent functional validation.

      (6) I agree that comparing specificity enrichment between sexes is meaningful. However, enrichment relative to the database composition itself is not biologically interpretable, as acknowledged by the authors in their response. I therefore recommend removing Supplementary Figure 15, which is potentially misleading.

      (7) In contrast, Supplementary Figure 16 represents the most convincing result of the study (keeping in mind that the AID group should be splitted to T1D and CeD with T1D and that T1D and CeD have opposing directions of sex biases) and should be shown as a main figure, replacing Figure 8A-B which is less convincing as it doesn't show per-donor distribution.

      (8) The authors argue that applying mixed-effects modeling to Rényi entropy would require assuming a common sex effect across subsets. I do not find this assumption unreasonable. For example, if sex effects are mediated through AIRE-dependent negative selection, one would indeed expect a consistent direction of effect across subsets. The lack of statistical significance in Figure 3 may reflect limited sample size rather than true absence of the difference. Moreover, the title's phrasing "comparable TCR repertoire diversity" is vague: what is the statistical definition of "comparable"?

    2. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides useful insights into addressing the question of whether the prevalence of autoimmune disease could be driven by sex differences in the T cell receptor (TCR) repertoire, correlating with higher rates of autoimmune disease in females. The authors compare male and female TCR repertoires using bulk RNA sequencing, from sorted thymocyte subpopulations in pediatric and adult human thymuses; however, the results do not provide sufficient analytical rigor and incompletely support the central claims.

      The statement in the editorial assessment that our study “does not provide sufficient analytical rigor” surprised us. TCR repertoire analysis is indeed a highly complex domain, both experimentally and computationally. We consider ourselves to be leading experts in this field and have invested a great deal of effort to ensure the rigor and reproducibility of every analytical step.

      Specifically, our group has previously benchmarked and published validated methodologies for the following areas: (i) TCR repertoire generation (Barennes et al., Nat Biotechnol 2021), (ii) repertoire analysis (Six et al., Frontiers in Immunol, 2013; Chaara et al., Frontiers in Immunol, 2018; Ritvo et al., PNAS, 2018; Mhanna et al., Diabetes, 2021; Trück et al., eLife, 2021; Quiniou et al., eLife, 2023; Mhanna et al., Cell Rep Methods, 2024; Mhanna et al., Nat Rev Primers Methods, 2024), and (iii) the curation and quality control of public TCR databases (Jouannet et al., NAR Genomics and Bioinformatics 2025). The current study applies these optimized and peer-reviewed pipelines, along with additional internal quality controls that we have been implemented over the years, ensuring the highest possible analytical standards for TCR repertoire studies.

      We therefore respectfully feel that the phrase “insufficient analytical rigor” does not accurately reflect the methodological robustness of our work. This perception is also in contrast to the comment made by one of the reviewers, who explicitly noted that “overall, the methodologies appear to be sound.”

      We would therefore be grateful if, upon reviewing our detailed point-by-point responses, the editors could reconsider this statement and tone it down in the final editorial summary.

      With regard to comment that our results “incompletely support the central claims”, we will leave it to the reader’s judgement. We believe that our work provides a robust and transparent basis for future research into TCR repertoire, autoimmunity, and women’s health.

      Reviewer 1 (Public reviews):

      Summary

      The goal of this paper was to determine whether the T cell receptor (TCR) repertoire differs between a male and a female human. To address this, this group sequenced TCRs from doublepositive and single-positive thymocytes in male and female humans of various ages. Such an analysis on sorted thymocyte subsets has not been performed in the past. The only comparable dataset is a pediatric thymocyte dataset where total thymocytes were sorted.

      They report on participant ages and sexes, but not on ethnicity, race, nor provide information about HLA typing of individuals. Though the experiments themselves are heroic, they do represent a relatively small sampling of diverse humans. They observed no differences in TCRbeta or TCRalpha usage, combinational diversity, or differences in the length of the CDR3 region, or amino acid usage in the CD3aa region between males or females. Though they observed some TCRbeta CD3aa sequence motifs that differed between males and females, these findings could not be replicated using an external dataset and therefore were not generalizable to the human population.

      They also compared TCRbeta sequences against those identified in the past using computational approaches to recognize cancer-, bacterial-, viral-, or autoimmune-antigens. They found very little overlap of their sequences with these annotated sequences (depending on the individual, ranging from 0.82-3.58% of sequences). Within the sequences that were in overlap, they found that certain sequences against autoimmune or bacterial antigens were significantly over-represented in female versus male CD8 SP cells. Since no other comparable dataset is available, they could not conclude whether this is a finding that is generalizable to the human population.

      Strengths:

      This is a novel dataset. Overall, the methodologies appear to be sound. There was an attempt to replicate their findings in cases where an appropriate dataset was available. I agree that there are no gross differences in TCR diversity between males and females.

      We appreciate the positive feedback from the reviewer regarding these points.

      Weaknesses:

      Overall, the sample size is small given that it is an outbred population. The cleaner experiment would have been to study the impact of sex in a number of inbred MHC I/II identical mouse strains or in humans with HLA-identical backgrounds.

      We respectfully disagree with the reviewer’s statement. We firmly believe that the issue we are dealing with, namely sex-based differences in thymic TCR selection relevant to autoimmunity, should be investigated more thoroughly in the general human population than in inbred mouse models.

      While inbred mouse strains, being MHC I/II identical, eliminate the complexity of MHC variation, this comes at the cost of biological relevance. Firstly, a discrepancy in TCR generation or selection may only become apparent under specific MHC contexts, which could easily be overlooked when studying a single inbred strain. Secondly, inbred strains frequently contain fixed genetic variants that may influence thymic selection or immune regulation. This has the potential to introduce confounding effects rather than reducing them and not solving the generalization issue.

      We are in full agreement that an HLA-matched human cohort would reduce inter-individual variability. However, such sampling is impossible in practice, as our thymic tissues were obtained from deceased organ donors, a collection effort that was, as the reviewer rightly noted, “heroic”. Despite these inherent limitations, the patterns we observed were consistent across multiple analytical approaches, lending robustness to our findings.

      We now explicitly acknowledge this limitation in the Discussion of the revised manuscript and explain why, despite this constraint, our study provides meaningful and biologically relevant insights into human TCR selection and sex-related immune differences.

      It is unclear whether there was consensus between the three databases they used regarding the antigens recognized by the TCR sequences. Given the very low overlap between the TCR sequences identified in these databases and their dataset, and the lack of replication, they should tone down their excitement about the CD8 T cell sequences recognizing autoimmune and bacterial antigens being over-represented in females.

      The three databases used in this study - McPAS-TCR, IEDB, and VDJdb - provide complementary and partially non-overlapping specificity landscapes. McPAS-TCR is enriched for pathology-associated TCRs, while IEDB and VDJdb contain a higher proportion of viral specificities. Combining them therefore broadens the antigenic spectrum accessible for analysis and represents the most comprehensive approach currently possible to capture the diversity of TCR–antigen annotations.

      With regard to the limited overlap between our dataset and these databases, this observation should be interpreted with caution. While the overlap may appear minimal at first glance, it is a biologically significant phenomenon. The public databases collectively contain only a minute fraction of the total universe of TCR specificities, estimated to exceed 10<sup>15-21</sup> possible receptors in humans. In this context, the observation of any overlap at all, particularly with coherent biological patterns such as the overrepresentation of autoimmune- and bacterialassociated TCRs in females, is noteworthy.

      We have included a short clarification in the Discussion of the revised manuscript to make this point explicit and to further temper the language describing this finding.

      The dataset could be valuable to the community.

      We thank the reviewer for highlighting the potential value of this dataset to the community. It will be made publicly available on the NCBI website. We would like to clarify that our intention has always been to make this dataset publicly available; therefore, we take back any incorrect suggestions made in the original submission.

      Reviewer #1 (Recommendations for the authors):

      I would just recommend toning down the excitement about autoimmune TCRs being overrepresented in females. Then the conclusions will be in alignment with their results.

      We thank the reviewer for this constructive recommendation. We would like to express our full support for the editorial transparency policies of eLife, which allow readers to access to both the reviewers’ comments and our detailed responses, enabling them to form their own informed opinions regarding our conclusions.

      Nevertheless, we have moderated some of our wording.

      Reviewer #2 (Public review):

      Summary:

      This study addresses the hypothesis that the strikingly higher prevalence of autoimmune diseases in women could be the result of biased thymic generation or selection of TCR repertoires. The biological question is important, and the hypothesis is valuable. Although the topic is conceptually interesting and the dataset is rich, the study has a number of major issues that require substantial improvement. In several instances, the authors conclude that there are no sex-associated differences for specific parameters, yet inspection of the data suggests visible trends that are not properly quantified. The authors should either apply more appropriate statistical approaches to test these trends or provide stronger evidence that the observed differences are not significant. In other analyses, the authors report the differences between sexes based on a pulled analysis of TCR sequences from all the donors, which could result in differences driven by one or two single donors (e.g., having particular HLA variants) rather than reflect sex-related differences.

      Strengths:

      The key strength of this work is the newly generated dataset of TCR repertoires from sorted thymocyte subsets (DP and SP populations). This approach enables the authors to distinguish between biases in TCR generation (DP) and thymic selection (SP). Bulk TCR sequencing allows deeper repertoire coverage than single-cell approaches, which is valuable here, although the absence of TRA-TRB pairing and HLA context limits the interpretability of antigen specificity analyses. Importantly, this dataset represents a valuable community resource and should be openly deposited rather than being "available upon request."

      We thank the reviewer for highlighting the potential value of this dataset to the community. It will be made publicly available on the NCBI website. We would like to clarify that our intention has always been to make this dataset publicly available; therefore, we take back any incorrect suggestions made in the original submission.

      Weaknesses:

      Major:

      The authors state that there is "no clear separation in PCA for both TRA and TRB across all subsets." However, Figure 2 shows a visible separation for DP thymocytes (especially TRA, and to a lesser degree TRB) and also for TRA of Tregs. This apparent structure should be acknowledged and discussed rather than dismissed.

      We thank the reviewer for this careful observation. Discussing apparent “trends” rather than statistically significant results is indeed a nuanced issue, as over-interpretation of visual patterns is usually discouraged. We agree that, within the specific context of TCR repertoire analyses, visual structures in multivariate projections such as PCA can provide useful contextual information.

      However, we have not identified a striking trend in our representation. We therefore chose to avoid overemphasizing these visual impressions in the text.

      Supplementary Figures 2-5 involve many comparisons, yet no correction for multiple testing appears to be applied. After appropriate correction, all the reported differences would likely lose significance. These analyses must be re-evaluated with proper multiple-testing correction, and apparent differences should be tested for reproducibility in an external dataset (for example, the pediatric thymus and peripheral blood repertoires later used for motif validation).

      As is standard in exploratory immunogenomic studies, including TCR repertoire analyses, our objective was to uncover broad biological patterns rather than to establish definitive statistical associations. In analyses that are discovery-oriented, correction for multiple testing, while essential in confirmatory contexts, is not mandatory and may even obscure meaningful trends by inflating type II error rates. Our objective was therefore to highlight consistent directional patterns across analytical layers, to guide future confirmatory work rather than to make categorical claims.

      We also note that this comment somewhat contrasts with the earlier suggestion to discuss trends that are not statistically significant.

      With regard to the proposal to verify our observations using an external dataset, we are in full agreement that independent confirmation would be beneficial. However, as reviewer 1 rightly emphasized, the generation of such datasets from sorted human thymocyte subsets is “heroic” and has rarely, if ever, been achieved. We are aware of no existing dataset that provides comparable material or analytical depth.

      The available single-cell thymic dataset (Park et al., Science 2020) includes only a few hundred sequences per donor, which is significantly less than the number of sequences in our study. This limited dataset is not adequate for cross-validation or for representing the full complexity of thymic TCR repertoires.

      As with the pediatric thymus dataset, the lack of statistical power in the dataset due to the small number of female subjects (only three) means that sex-related differences in V/J usage cannot be evaluated.

      Finally, the peripheral blood dataset is not appropriate for validating thymic generation or selection processes, as it reflects post-thymic selection and antigen-driven remodeling, making it impossible to distinguish peripheral effects from thymic influences.

      For these reasons, none of the currently available datasets provides a sufficiently clean or powerful framework to test the reproducibility of subtle sex-associated effects on thymic TCR repertoires. Nevertheless, we fully agree that confirmation in an independent and larger cohort will be an important next step to refine these exploratory findings and assess their generalizability to a broader human population.

      Supplementary Figure 6 suggests that women consistently show higher Rényi entropies across all subsets. Although individual p-values are borderline, the consistent direction of change is notable. The authors should apply an integrated statistical test across subsets (for example, a mixed-effects model) to determine whether there is an overall significant trend toward higher diversity in females.

      We agree that Rényi entropies tend to show a consistent direction of change across subsets, with slightly higher values observed in females. In this section, our objective was to provide a descriptive overview of diversity patterns for each thymic subset. This is because these subsets are biologically distinct and therefore require individual analysis, as we previously demonstrated using the same dataset (Isacchini et al, PRX Life. 2024). Therefore, while a mixed-effects approach could in principle be applied to test for an overall trend, such an analysis would rely on the assumption of a common sex effect across heterogeneous cell types.

      It is important to note that the complete dataset has now been made publicly available, enabling interested researchers to perform additional integrative or model-based analyses to further explore these diversity trends.

      Figures 4B and S8 clearly indicate enrichment of hydrophobic residues in female CDR3s for both TRA and TRB (excluding alanine, which is not strongly hydrophobic). Because CDR3 hydrophobicity has been linked to increased cross-reactivity and self-reactivity (see, e.g., Stadinski et al., Nat Immunol 2016), this observation is biologically meaningful and consistent with higher autoimmune susceptibility in females.

      We thank the reviewer for this insightful comment.

      As correctly noted, increased hydrophobicity at specific CDR3β positions has been linked to enhanced cross-reactivity and self-reactivity, as described by Stadinski et al. (Nat Immunol 2016), and we reference this work in the manuscript.

      In our analysis corresponding to Figure 4B (TRB), hydrophobicity was quantified at the sequence level by computing, for each unique CDR3β sequence, the overall proportion of hydrophobic amino acids across the CDR3 loop. This approach aligns with that of Lagattuta et al. (Nat Immunol 2022), whose code we adapted to accommodate longer CDR3s. This global hydrophobicity metric captures overall composition, but, by its construction, does not account for positional context, the key mechanism implicated by Stadinski et al.

      As outlined in our original Figure 4C, the results were obtained through a position-based amino acid analysis. For each CDR3β sequence, we extracted the amino acid at every IMGTdefined CDR3 position (p104–p118) and quantified, at each position, the percentage of unique sequences containing each amino acid. Positions p109 and p110 correspond to the p6–p7 sites highlighted by Stadinski et al. as functionally relevant for self-reactivity. This analysis evaluates positional composition independently of clonotype frequency, focusing specifically on hydrophobic amino acid classes.

      Following the recommendation of the reviewer, the revised manuscript has removed alanine (which is only weakly hydrophobic) has been excluded from the hydrophobic residue set. With this refined definition, we observe a significant enrichment of hydrophobic amino acids at p109 in CD8 T cell repertoires from females, with similar but non-significant trends at p109 in DP and CD4 Teff cells and at p110 in CD8 cells (see new Figure 4C).

      As outlined in the revised Methods, Results, and Discussion sections, Figure 4C focuses exclusively on positional hydrophobic amino acid usage. This was previously implicit, although it was noted in the legend and visually represented in the plots.

      The majority of "hundreds of sex-specific motifs" are probably donor-specific motifs confounded by HLA restriction. This interpretation is supported by the failure to validate motifs in external datasets (pediatric thymus, peripheral blood). The authors should restrict analysis to public motifs (shared across multiple donors) and report the number of donors contributing to each motif.

      We fully agree that donor-specific and HLA-restricted motifs represent a major potential confounder in repertoire-level comparisons. To minimize this potential bias, our analysis was explicitly restricted to public motifs, as clearly stated in the Materials and Methods section:

      “Additional filters were applied so that: (i) a motif includes public CDR3aa sequences (shared by at least two individuals); (ii) a significant enrichment is detected (Fisher’s exact test, p < 0.01); and (iii) a usage difference between groups of at least twofold (Wilcoxon test, p < 0.05).”

      Accordingly, every motif reported in the manuscript is supported by at least two independent donors, ensuring that no motif reflects an individual- or HLA-specific effect (see Supplementary Figures 10-13[previously Supplementary Figure 9]). We have now added a more explicit mention of the number of donors contributing to each motif in the figure legend and have clarified this point in the revised Methods and Results sections to make this criterion more visible to readers.

      When comparing TCRs to VDJdb or other databases, it is critical to consider HLA restriction. Only database matches corresponding to epitopes that can be presented by the donor's HLA should be counted. The authors must either perform HLA typing or explicitly discuss this limitation and how it affects their conclusions.

      We respectfully disagree with the assertion that HLA typing is necessary for the type of comparative analysis we have conducted. While it is true that HLA molecules present peptides to TCRs and thereby contribute to the tripartite interaction determining T cell activation, extensive evidence indicates that the CDR3 region, particularly CDR3β, is the dominant determinant of antigen specificity. This finding is supported by structural and computational studies (Madi et al., eLife, 2017; Huang et al., Nat. Biotech., 2020; MayerBlackwell et al., Methods Mol. Biol., 2022) showing that CDR3β residues are responsible for the majority of peptide contacts, whereas CDR1 and CDR2 primarily interact with the MHC framework.

      As emphasized in several recent benchmarking studies (e.g., Springer et al., Front Immunol, 2021), CDR3β sequence composition alone captures most of the information required for specificity inference. Consequently, widely used and validated computational tools such as GIANA (Zhang et al. Nat. Commun. 2021), iSMART (Zhang et al. Clin. Cancer Res. 2020), and ATMTCR (Cai et al. Front. Immunol. 2022) rely exclusively on CDR3β aminoacid sequences and still achieve high predictive performance.

      Our analysis aligns with this well-established paradigm. While we agree that integrating donor HLA typing would refine epitope-level annotation and reduce potential noise, the absence of HLA data does not invalidate the comparative framework we used, which focuses on relative representation of annotated specificities across groups rather than on individual TCR–HLA–peptide triads.

      Although the age distributions of male and female donors are similar, the key question is whether HLA alleles are similarly distributed. If women in the cohort happen to carry autoimmuneassociated alleles more often, this alone could explain observed repertoire differences. HLA typing and HLA comparison between sexes are therefore essential.

      To address the issue of any potential differences in HLA background, we examined the subset of adult donors for whom HLA typing information was available (HLA-A, HLA-B, HLADR, and HLA-DQB; n = 16). Within this subset, the distribution of HLA alleles was relatively balanced between males and females (as illustrated by the heatmap showing HLA class II expression patterns and HLA class I family grouping in Author response image 1). This analysis suggests that the sex-associated differences in the repertoire observed in our study are unlikely to be driven solely by unequal representation of autoimmune-associated HLA alleles.

      We acknowledge, however, that complete HLA information was not available for all donors, which remains a limitation of the dataset.

      Author response image 1.

      In some analyses (e.g., Figures 8C-D) data are shown per donor, while others (e.g., Fig. 8A-B) pool all sequences. This inconsistency is concerning. The apparent enrichment of autoimmune or bacterial specificities in females could be driven by one or two donors with particular HLAs. All analyses should display donor-level values, not pooled data.

      While Figures 8A–B present pooled data to summarize global trends, the corresponding donor-level analyses were provided in Supplementary Figures 15B and 16 (previously Supplementary Figures 11B and 12). In these, each individual is shown separately, with each point representing an individual. It is important to note that these donor-resolved plots do not reveal any sample-specific driver: the patterns observed in the pooled data remain consistent across donors, without any single individual accounting for the apparent enrichments. As outlined in the revised manuscript, readers now directed to the relevant supplementary figures for further clarification.

      The reported enrichment of matches to certain specificities relative to the database composition is conceptually problematic. Because the reference database has an arbitrary distribution of epitopes, enrichment relative to it lacks biological meaning. HLA distribution in the studied patients and HLA restrictions of antigens in the database could be completely different, which could alone explain enrichment and depletions for particular specificities. Moreover, differences in Pgen distributions across epitopes can produce apparent enrichment artifacts. Exact matches typically correspond to high-Pgen "public" sequences; thus, the enrichment analysis may simply reflect variation in Pgen of specific TCRs (i.e., fraction of high-Pgen TCRs) across epitopes rather than true selection. Consequently, statements such as "We observed a significant enrichment of unique TRB CDR3aa sequences specific to self-antigens" should be removed.

      We respectfully disagree with the conclusion that our enrichment analysis lacks biological meaning. Our approach directly involves a direct comparison of the same set of observed TCR sequences between males and females. Consequently, any potential biases related to generation probability (Pgen), which affect all sequences equally, cannot account for the observed sex-specific differences. To summarize, because the comparison is performed on the same set of sequences, changes in the probability of generation across epitopes cannot explain the differences seen between the sexes.

      We do agree, however, that the composition of the reference databases may influence apparent enrichment patterns, as these resources contain uneven distributions of epitope categories and often incomplete information regarding HLA restriction. It should be noted that this limitation is inherent to all database-based annotation approaches, a fact which is explicitly acknowledged in the revised Discussion.

      The overrepresentation of self-specific TCRs in females is the manuscript's most interesting finding, yet it is not described in detail. The authors should list the corresponding self-antigens, indicate which autoimmune diseases they relate to, and show per-donor distributions of these matches.

      We thank the reviewer for this constructive suggestion.

      As recommended, we have expanded the description of the self-specific TCRs identified in our dataset and now provide this information in Supplementary Table 2 of the revised manuscript. Specifically, the table lists the corresponding self-antigens and the autoimmune diseases with which they are associated. In our curated database, these annotations primarily correspond to celiac disease and type 1 diabetes, which were the two autoimmune contexts explicitly defined in the manually curated reference datasets.

      For the “cancer” specificity group, we have clarified that antigen assignments were established based on (i) annotations available in the original databases (IEDB, VDJdb, McPAS-TCR) and (ii) cross-referencing with additional resources, including the Human Protein Atlas, the Cancer Antigenic Peptide Database (de Duve Institute), and the Cancer Antigen Atlas (Yi et al., iScience 2021), to ensure consistency in the classification of cancer and neoantigen specificities. Please refer to the Materials and Methods section for a full description of the procedure for this specific assignment.

      Donor-level distributions of these self-specific matches are now shown in Supplementary Figures 15B and 16 (previously Supplemental Figures 11B and 12), allowing direct visualization of inter-donor variability. Importantly, these plots confirm that the observed enrichment in females is not driven by a single individual, further supporting the robustness of the finding.

      The concept of poly-specificity is controversial. The authors should clearly explain how polyspecific TCRs were defined in this study and highlight that the experimental evidence supporting true polyspecificity is very limited (e.g., just a single TCR from Figure 5 from Quiniou et al.).

      We certainly agree (and regret) that the concept of TCR polyspecificity remains a subject of debate and often underappreciated in the field of immunology. As Don Mason famously discussed in his seminal essay “A very high cross-reactivity is an essential feature of the TCR” (doi: 10.1016/S0167-5699(98)01299-7) published over 25 years ago, both theoretical and experimental evidence indicates that each TCR can, in principle, recognize millions of distinct peptides, albeit with variable avidity.

      Although this principle is widely accepted, it is frequently overlooked in the field of experimental immunology. In this area, anything that deviates from strict monospecificity is often disregarded as noise.

      In our own analyses of large-scale TCR repertoires, we have repeatedly observed that many CDR3 sequences are annotated with multiple specificities across different databases, often corresponding to peptides from unrelated organisms. As demonstrated in Quiniou et al. (eLife 2023), such polyreactive TCRs exhibit distinctive features, including biased physicochemical composition, and tend to be enriched in various biological contexts. In our preliminary study of such TCRs, which have the capacity to be specific for multiple viral- and self- epitopes, we hypothesized that they may serve as a first line of defense against pathogens and also be involved in triggering autoimmunity. We therefore consider it important to report this phenomenon rather than omit it, especially given its potential relevance to both protective immunity and autoimmunity.

      In the present study, polyspecific TCRs were defined operationally as TRB CDR3aa sequences associated with a minimum of two distinct specificity groups, corresponding either to different microbial species or to multiple antigen categories within the curated database. Therefore, our definition captures broader antigenic groupings rather than epitope-level binding events.

      We fully acknowledge that direct experimental evidence for true molecular-level polyspecificity remains limited. Indeed, as the reviewer notes, only a single TCR with multiepitope reactivity has been rigorously demonstrated to date (Quiniou et al.2023). Consequently, our analysis does not make claims about structural promiscuity; instead, it uses database-annotated cross-reactivity as a proxy to explore broader repertoire-level patterns.

      As outlined in the Methods section, this definition has been clarified and its discussion expanded in the Discussion to explicitly address these conceptual and methodological nuances.

      Minor:

      Clarify why the Pgen model was used only for DP and CD8 subsets and not for others.

      As noted, computing Pgen values involves two steps: (i) training a generative model of V(D)J recombination using IGoR, and (ii) estimating generation probabilities with OLGA based on that model. Both steps require a significant amount of computing power, especially when applied to large repertoires across multiple subsets. For this reason, we focused the analysis on DP thymocytes, which represent the repertoire prior to thymic selection, and CD8 T cells after CD8 selection.

      The Methods section should define what a "high sequence reliability score" is and describe precisely how the "harmonized" database was constructed.

      Briefly, the annotated database used in this study was constructed in accordance with the procedure established in our previously published work (Jouannet et al., NAR Genomics and Bioinformatics, 2025). The study integrates three publicly available resources, IEDB, VDJdb, and McPAS-TCR, which were collected as of October 2023. These three datasets were then merged into a single harmonized compendium, undergoing extensive standardization. When entries shared identical information across databases (same V–CDR3–J for both TRA and TRB, same epitope, organism, PubMed ID, and cell subset), only one representative was kept; discrepant or incomplete entries were retained to preserve information. We then assigned a sequence reliability score, the Verified Score (VS), following the verification strategy used by IEDB. The scale ranges from 0 to 2 and reflects the concordance between calculated and curated TRA/TRB CDR3 sequences (2 = both TRA and TRB present are verified, 1.1 = only TRA verified, 1.2 = only TRB verified, 0 = no verified chain). A second score, the Antigen Identification Score (AIS), is used to rank antigen-identification methods on a scale of 0 to 5, according to the strength of the experimental evidence supporting them.

      In the present study, “high reliability” refers to sequences with a verified TRB CDR3aa chain (VS ≥ 1.2) and an AIS score corresponding to T cells in vitro stimulation with a pathogen, protein or peptide, or pMHC X-mer sorting (> 3.2, excluding categories 4.1 and 4.2), ensuring that downstream analyses were performed on a rigorously curated and biologically trustworthy dataset. The Methods section now explicitly details these criteria.

      The statement "we generated 20,000 permuted mixed-sex groups" is unclear. It is not evident how this permutation corrects for individual variation or sex bias. A more appropriate approach would be to train the Pgen model separately for each individual's nonproductive sequences (if the number of sequences is large enough).

      The objective of this analysis was to determine whether the enrichment of TRBV06-5 in females was due to random grouping of individuals or whether it was attributable to sex itself. To do so, we generated all possible perfectly mixed groups of donors (i.e., groups containing an equal number of male and female donors) for the concerned thymocyte subset, and then performed 20,000 random pairwise comparisons between such mixed groups. For each comparison, we tested the TRBV06-5 usage between the two mixed groups. This procedure directly evaluates whether group composition (independent of sex) could spuriously generate differences in TRBV usage. Notably, none of these 20,000 comparisons between the two mixed groups yielded a statistically significant difference in TRBV06-5 usage. In contrast, when comparing the true male and female groups, a significant difference was identified. This demonstrates that the signal we observe is not driven by random donor grouping or individual-level variation, but is specifically associated with sex. It is important to note that this analysis, which is designed to exclude spurious group effects, is rarely performed in published repertoire studies, yet it provides an important internal control for robustness.

      Reviewer #2 (Recommendations for the authors):

      (1) Data availability "upon request" is unacceptable. All raw and processed data, as well as scripts used for analysis and figure generation, must be publicly deposited before publication.

      We would like to clarify that our intention has always been to make this dataset publicly available. It was a mistake to suggest otherwise in the original submission.

      (2) At the beginning of the Results section, include a brief description of the dataset: number of donors, sex ratio, age range, number of samples per subset, and sorting strategy. Although Figure 1 shows this, the information should also be mentioned in the main text.

      In line with the recommendation, we have now added a summary of the cohort characteristics at the beginning of the Results section. This includes the number of donors, sex ratio, age range, number of samples per subset, and the sorting strategy used. While this information was already included in Figure 1, we concur that including it directly in the main text enhances readability.

      (3) Report the number of cells and unique clonotypes analyzed per individual. Rank-frequency plots (in log-log coordinates) would be helpful.

      We have now added, for each donor and each subset, the number of cells, and additionally for each chain, the number of total and unique clonotypes analyzed. This information is provided in the revised manuscript in a new supplementary table (Supplemental Table 1).

      These plots have been integrated into the revised manuscript as Supplementary Figure 2.

      (4) For analysis in Figure 4B, the total fraction of hydrophobic amino acids should be calculated for each patient separately, and values for men and women should be compared (analogously to Figure 4C, but for the whole CDR3 and excluding alanine).

      Please note that the TRB CDR3aa composition in Figure 4B has already been quantified at the individual level. For each unique TRB CDR3aa sequence, we computed the proportion of each of the 20 amino acids across the CDR3β loop, then summarized these values per donor (mean per individual). The log2 fold change displayed in Figure 4B (and supplemental Figure 9 for TRA) is calculated from the median donor-level values for females versus males, rather than from pooled CDR3s. It is intended as descriptive, “global” view of amino acid usage within the central CDR3 region. Hydrophobicity was not used directly in the computation, but is indicated only by bar color, based on the Kyte-Doolittle- derived IMGT classification. This provides an observational overview of amino acid composition in the central CDR3 region.

      As the mechanistic link between hydrophobicity and self-reactivity described by Stadinski et al. is explicitly position-dependent, we consider positional analyses to be the most appropriate method for formally interrogating this hypothesis, as we did in Figure 4C. Here, our primary focus was on the position-specific usage of hydrophobic amino acids at IMGT positions p109-p110. These positions correspond to the central p6-p7 positions described by Stadinski et al. For each individual, we computed the proportion of unique TRB CDR3aa sequences carrying a hydrophobic amino acid at a given position.

      Accordingly, in the revised manuscript we refined the Figure 4C by excluding alanine due to its weak hydrophobic property (as recommended by the reviewer) This positional composition analysis now reveals a statistically significant increase in hydrophobic usage at p109 in female CD8 repertoires, with similar, though non-significant, trends at p109 in DP and CD4Teff ad at p110 in CD8 cells. Figure 4B is therefore retained as an exploratory overview of amino acid composition usage along the CDR3 loop, while Figure 4C is used for the more specific question of hydrophobicity and potential cross-reactivity.

      The Methods section has been expanded to provide clearer descriptions of these computations, and the Results and Discussion sections corresponding to Figures 4B-C (and supplemental Figure 9) have been revised to make the rationale, implementation, and interpretation of these hydrophobicity analyses more explicit.

      (5) Figure 6 shows a trend toward higher clustering of Treg TCRs in males, which could relate to the lower incidence of autoimmunity in men. The authors could test whether specific Treg clusters are male-specific and shared among male donors.

      As shown in Figure 6, a clear trend towards higher similarity among Treg CDR3aa sequences in males is evident, as indicated by the proportion of sequences included in clusters and in the overall similarity density. However, identifying “male-specific clusters” shared across donors is not straightforward in our analytical framework.

      In our approach, for each cell subset, CDR3aa sequences were downsampled 100 times to the smallest sample size, and clustering was repeated at each iteration. Therefore, the clusters’ identities are not consistent across iterations. The clusters depend on the specific subset of sequences selected at each downsampling step, as well as on their underlying Pgen distribution. Therefore, it is not possible to reliably assess whether specific clusters are systematically “male-shared”. This is because cluster composition is a function of stochastic resampling rather than of biological structure. For this reason, a comparison of cluster identities across donors would not produce interpretable results.

    1. Synthèse : Décryptage des Labels et Accréditations de l'Enseignement Supérieur sur Parcoursup

      Ce document de briefing synthétise les interventions d'Ariane Ferreri et de Thibaut Cojean (L'Étudiant) lors du webinaire organisé en partenariat avec la FCPE.

      Il vise à fournir une grille de lecture précise des labels, diplômes et certifications rencontrés sur la plateforme Parcoursup pour aider les familles à évaluer la qualité réelle des formations.

      Résumé Exécutif

      Face à une offre de formation de plus en plus vaste, la distinction entre reconnaissance académique officielle et promesses marketing est cruciale. Les points clés à retenir sont :

      • La primauté du Grade et du Visa : Seuls les diplômes bénéficiant d'un « grade » (Licence ou Master) ou d'un « visa » de l'État garantissent une équivalence académique permettant la poursuite d'études.

      • Le piège du RNCP : Un titre RNCP atteste d'une qualification professionnelle mais ne possède aucune valeur académique intrinsèque (pas de crédits ECTS automatiques, pas de garantie de poursuite d'études en master).

      • La vigilance sur le privé lucratif : Un projet de loi est en cours pour mieux réguler ce secteur, mais en attendant, l'examen des labels (EESPIG, CTI) et des audits (hexagone bleu sur Parcoursup) reste la meilleure protection.

      • L'importance des crédits ECTS : Ils sont l'unité de mesure européenne (60 par an) indispensable pour la mobilité et la validation des acquis.

      --------------------------------------------------------------------------------

      1. Les Fondamentaux de la Reconnaissance Académique

      Le système des crédits ECTS

      Les crédits ECTS (European Credit Transfer System) constituent une unité de mesure européenne de la formation.

      • Volume : Une année d'enseignement supérieur correspond à 60 crédits.

      • Utilité : Ils garantissent la reconnaissance du parcours auprès d'autres établissements, en France comme en Europe.

      Sans ECTS, une poursuite d'études (ex: passer d'une Licence à un Master) est compromise.

      • Validation : Les crédits sont affiliés à des modules spécifiques (matières, projets associatifs, etc.).

      Si un module échoue, seuls les crédits associés manquent à l'étudiant.

      Diplômes Nationaux vs Écoles Privées

      | Type de Diplôme | Caractéristiques | Exemples | | --- | --- | --- | | Diplôme National | Organisé par l'État, délivré au nom d'un ministère. | BTS, Licence, Master, Doctorat, BUT, DNA (Art). | | Diplôme Visé | Diplôme d'une école privée reconnu par l'État pour sa qualité. | Bachelors de certaines écoles de commerce. | | Grade de Licence/Master | Niveau de reconnaissance maximal pour un diplôme non national. | Diplômes d'Ingénieurs, certains bachelors d'excellence. |

      Note importante : Une école peut être reconnue par l'État mais proposer des formations qui ne le sont pas. Il faut impérativement vérifier la reconnaissance de la formation précise visée.

      --------------------------------------------------------------------------------

      2. Analyse Détaillée des Labels et Accréditations

      Les labels d'excellence et d'intérêt général

      • EESPIG (Établissement d'Enseignement Supérieur Privé d'Intérêt Général) : Label attribué à des établissements privés à but non lucratif qui remplissent une mission de service public.

      Il concerne environ 60 grandes écoles en France.

      • CTI (Commission des Titres d'Ingénieur) : Organisme unique habilité à autoriser les écoles à délivrer le titre d'ingénieur.

      Une « formation en ingénierie » n'est pas une « formation d'ingénieur » si elle n'est pas accréditée par la CTI.

      • DD&RS (Développement Durable et Responsabilité Sociétale) : Label difficile à obtenir attestant d'un engagement écologique et social.

      C'est un gage de sérieux académique et de structuration de la communauté enseignante.

      Les accréditations internationales (Écoles de Commerce)

      Pour les écoles de management, la « Triple Couronne » désigne la possession des trois accréditations suivantes :

      • AACSB (Américain)

      • EQUIS (Européen)

      • AMBA (Britannique)

      La France possède la plus forte proportion au monde d'écoles titulaires de cette triple accréditation, ce qui garantit un audit rigoureux des processus pédagogiques.

      Les certifications professionnelles (Vigilance requise)

      • RNCP (Répertoire National des Certifications Professionnelles) : Ce label atteste que la formation prépare à un métier reconnu par le marché du travail.

      Attention : il n'a aucune valeur académique.

      Une formation RNCP seule ne permet pas de poursuivre des études à l'université ou en master sans passer par une procédure complexe de validation des acquis (VAE/VAPP).

      • Qualiopi : C'est un label de processus qualité obligatoire pour les organismes recevant des fonds publics (notamment pour l'apprentissage).

      Il ne préjuge en rien de la qualité pédagogique ou de la valeur académique du diplôme.

      --------------------------------------------------------------------------------

      3. Guide de Vigilance sur Parcoursup

      Les indicateurs de fiabilité

      • L'Hexagone Bleu : Sur Parcoursup, la présence d'un logo hexagonal bleu indique que la formation est contrôlée par l'État.

      • Le drapeau bleu-blanc-rouge : Méfiance.

      N'importe quelle école peut l'utiliser sur une brochure commerciale sans que cela ne repose sur une règle officielle.

      • Le statut "Sous Contrat" : Pour les lycées privés (BTS, CPGE), ce statut garantit que les enseignants sont payés par l'État et appliquent le programme national.

      Les signaux d'alerte ("Red Flags")

      • Argumentaire centré exclusivement sur le RNCP : Signifie souvent l'absence de reconnaissance académique.

      • Formations 100% à distance en post-bac : Très risqué pour des jeunes sortant du lycée nécessitant un encadrement pour réussir la transition vers le supérieur.

      • Absence de corps professoral permanent : Une école n'utilisant que des intervenants extérieurs (professionnels) manque souvent de structure pédagogique.

      • Modèle économique basé uniquement sur l'alternance : Peut poser des questions sur la pérennité et l'investissement académique réel de l'établissement.

      --------------------------------------------------------------------------------

      4. Recommandations Méthodologiques pour les Familles

      Pour valider la qualité d'une formation, il est conseillé de multiplier les sources d'information :

      • Vérification des listes officielles : Consulter les sites de la CTI (ingénieurs), de la CGE (Conférence des Grandes Écoles) ou le portail du Ministère du Travail pour le RNCP.

      • Utilisation de LinkedIn : Contacter d'anciens étudiants pour connaître leur parcours réel et solliciter des entreprises du secteur pour savoir si elles reconnaissent et recrutent les diplômés de l'école visée.

      • Questionnement direct en JPO (Journées Portes Ouvertes) :

        • « Jusqu'à quand votre visa ou grade est-il valable ? » (un renouvellement court de 3 ans peut signaler des difficultés, contre 5 ans pour l'excellence).
      • « Quel est le réseau d'anciens et puis-je avoir des contacts ? »

      • « Où vont vos étudiants après le Bachelor ? » (Si la réponse est floue, la poursuite d'études est probablement difficile).

      Perspective législative

      Le gouvernement prépare une loi pour le printemps afin de mieux réguler l'enseignement privé lucratif.

      Ce projet prévoit de catégoriser les formations sur Parcoursup entre celles bénéficiant d'une certification automatique de qualité (grades, EESPIG) et celles soumises à des audits renforcés.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their constructive feedback, which has helped preparing a substantially improved manuscript. In response to concerns about the conceptual distinction between prediction and stimulus dependency, we have fundamentally restructured the paper around the notion of passive control systems. This involved rewriting the Abstract, Introduction, and large portions of the Results (~60% of text revised).

      Key changes:

      - New analyses on Goldstein et al. (2022) data. We demonstrate that our findings—including the insufficiency of proposed corrections—generalise to the original dataset (Figures S2B, S3B, S5C, S6B).

      - Clarified novel contribution. We now make explicit that prior control analyses (residualisation, bigram removal) do not address the concern, because hallmarks persist in passive systems that cannot predict.

      - Proposed criterion for future work. Pre-onset neural encoding can only count as evidence for prediction if it exceeds a passive baseline (e.g., acoustics).

      We believe the revision offers a clearer, more rigorous contribution and provides a constructive framework for evaluating claims of neural prediction.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

      Strengths:

      The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

      Weaknesses:

      I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

      We thank the reviewer for this comment, which highlights that the previous version wasn’t sufficiently clear. Conceptually, the difference is critical: it is the difference between passively encoding or representing the stimulus (like e.g., a spectrogram of the stimulus would), and actively generating predictions.

      We have substantially changed the framing of the paper to put the notion of control systems centre-stage. One such control system is the speech acoustics: they encode the stimulus (and thus its dependencies) but cannot predict. When we observe the "hallmarks of prediction" in acoustics, this demonstrates the hallmarks can arise without any prediction.

      This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

      Different weights are estimated per time point in the time-resolved regression. This allows the model to learn how the response to words unfolds, but also to learn different stimulus dependencies at each timepoint. Fitting on every second word would reduce but not eliminate the problem. Our control system approach provides a more principled test. We have clarified the mechanism in the Introduction (lines 82-90), explaining how correlations between neighbouring words allow the regression model to predict prior neural activity without assuming pre-activation.

      Reviewer #2 (Public Review):

      Summary:

      At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

      Strengths:

      The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

      Weaknesses:

      The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

      We thank the reviewer for their assessment.

      We believe our paper does punch the hole that can be punched, which is a hole in the method. Our control demonstrates that adjusting the features (X matrix) cannot address dependencies that persist in the signal itself (Y matrix). Because the hallmarks emerge in a system that cannot predict (even after linearly removing the previous stimulus) attributing pre-onset encoding performance to neural prediction (rather than stimulus structure) is fundamentally ambiguous, and different (e.g. variance partitioning) approaches would suffer from the same ambiguity. We have reframed the manuscript to make this argument more clearly.

      Reviewer #3 (Public Review):

      Summary:

      The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli—rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

      Strengths:

      (1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

      (2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

      (3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

      We’d like to thank the reviewer for their comments on our preprint.

      Weaknesses:

      (1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

      We thank the reviewer for this suggestion. The Goldstein dataset was not publicly available when we conducted this research. However, we have now applied our control analyses to their stimulus material, and found that the exact same problem applies to their dataset, too.

      We have added analyses of the Goldstein et al. (2022) podcast stimulus throughout the paper. Results are shown in Figures S2B, S3B, S5C, and S6B. Critically, we observe the same pattern: both hallmarks emerge in the acoustic control system, and residualisation fails to eliminate them. This demonstrates that our findings generalise to the very dataset used to establish pre-onset encoding as evidence for neural prediction.

      (2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

      We thank the reviewer for raising this point, as it reveals we failed to convey a central argument in the previous version. Goldstein et al.'s control analysis did not address the concern. We show that even after the control analyses that Goldstein et al. perform (removing bigrams, regressing out embedding dependencies) the "hallmarks of prediction" still emerge when applying the analysis to a passive control system that by definition does not predict: the speech acoustics. We now also show this in their data.

      To better convey this critical point, around the concept of "passive control systems". We now first establish that the hallmarks appear in acoustics (Figure 3), then show that residualisation fails to remove them (Figure 4). This makes explicit that any claim about "controlling for dependencies" must be validated against a system that cannot predict.

      (3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

      We thank the reviewer for this question, and we agree that the question whether pre-activation occurs is an important and interesting one. However, we ask a different question in our study: Our goal is not to definitively establish whether the brain predicts during language processing; it is to scrutinise what counts as evidence for prediction, and to correct for some highly influential claims made in the literature. The reviewer asks whether pre-activation remains "after accounting for these confounds." But the point we are trying to make is that in this analytical framework, one cannot analytically account for these confounds: corrections to the X matrix leave dependencies in the data itself intact, as the acoustic control demonstrates.

      We do offer recommendations for future work. The passive control systems approach can serve as a benchmark: pre-onset neural encoding (or decoding) can only count as evidence for prediction if it exceeds what is observed in a passive control system like acoustics (which is not what we observe). Additionally, the field could move toward less naturalistic stimuli with tighter experimental controls, reducing the correlations that make this attribution so difficult. Developing a new definitive test is beyond the scope of our paper, but we believe applying this benchmark is a necessary first step.

      To make this clearer, we have rewritten the Discussion to explicitly state this criterion (lines 331-340) and to outline these recommendations for future work (lines 337-340). We have also added a paragraph extending our argument to decoding approaches (lines 343-354), noting that the same ambiguity applies regardless of analytical direction.

      Recommendations for Authors:

      Reviewer #1 (Recommendations for Authors):

      As per my "Weakness" point, I would appreciate engagement with the conceptual point related to the difference between prediction and stimulus correlations. Most importantly, I hope the authors will spell out more explicitly which predictions their proposal makes, and how exactly those would be present in an encoding model.

      Our proposal makes a clear prediction: if pre-onset encoding can be explained by stimulus dependencies (essentially a confound in the analysis) the same hallmarks should emerge in passive control systems that encode the stimulus but do not predict. We test this with word embeddings and speech acoustics, and both show hallmarks despite not doing any prediction.

      Reviewer #2 (Recommendations for Authors):

      I greatly enjoyed reading the paper and only have minor quibbles. The work is overdue and will no doubt be a valuable addition to the literature to push back on over-hyped claims about the implications of pre-word predictivity in neural response. I have few issues with the methods that the paper uses, they seem sensible and in line with previous work that has investigated these questions, and I did not find typos.

      One point I would like to raise is whether or not there is a more effective solution to resolving the issues behind residualization that the paper demonstrates. The authors show that removing next-word information does not effectively resolve the problem that local relationships in the stimulus dataset pose. The challenge to me here seems to be that it is difficult to get a model to "not learn" a relationship that is learnable. I wonder if a better solution to this is to not try to get a model to exclude a set of information but instead to do some sort of variance partitioning where you train a model to predict the next-word representation from the current-word representation (as in the self-predictivity analysis) and then build an encoding model out of the predicted representation. Then, compare the pre-word-onset encoding performance of the prediction with the pre-word-onset encoding performance of the original representation. If the performance of the two models roughly matches, that would be strong evidence that most of what these models are capturing before word onset is just explainable by the stimulus dependencies, no?

      We would like to thank the reviewer for their kind words and positive appraisal!

      The proposed analysis is that if a linear proxy representation, w_hat_t – predicted linearly from w_{t-1} – yields pre-onset predictivity comparable to the actual w_t vector, this would support that the effect can be explained by stimulus dependencies. While this is an interesting alternative analysis, we would be cautious about the inverse conclusion: that if w_t outperforms the linear proxy w_hat_t, the residual variance must reflect true neural prediction.

      This is because of our control system results. We show that even when we remove the "predictable" shared variance – which is similar to computing the difference between w_t and w_hat_t – the unique information still yields pre-onset predictivity, albeit reduced, in the passive acoustics that by definition cannot predict. Therefore, instead of developing an ever-more-clever way to "correct" for the problem by adjusting the X matrix, we focus on showing that the problem lies in the stimulus itself. For the revision, we focused on reframing the problem and hope we have punched a fuller hole in the logic by breaking down the fundamental issue more clearly and showing it applies to the stimulus material of Goldstein et al. (2022) as well.

      Additionally, I would say that I was a bit confused about what was going on in the methods figures, to the point where I do not see the value in having them, but thankfully, the text was clear enough to resolve that confusion.

      We are sad the methods illustration wasn’t helpful. In presentations we have found that the illustrations were generally helpful to bring the analysis across, e.g. the aspect of keeping the analysis identical but simply replacing the brain data with either word vectors (current Figure 2) and acoustics (current Figure 3). In the revision we have reorganised the schematics slightly, we introduce the acoustics as a control system earlier, to separately introduce residualisation and its insufficiency (Figure 4). We hope this helps

      Reviewer #3 (Recommendations for Authors):

      (1) My major concern is the extent to which this study offers new insights beyond what was already demonstrated in Goldstein's work. First, the embedding dependency highlighted by the authors seems somewhat expected, given how these embeddings are constructed: GloVe embeddings are based on word co-occurrence statistics, and GPT embeddings are combinations of embeddings of preceding words. More importantly, Goldstein et al. addressed this issue by regressing out neighboring word embeddings. This control was effective, as also confirmed by the current manuscript, and their main results remain. Therefore, the embedding dependency appears to have been properly accounted for in the earlier study.

      Building on the previous point, I appreciate the analysis of dependencies across representational domains, which I see as the main novel contribution of this manuscript. I would encourage the authors to explore this aspect more deeply. If I understand correctly, stimulus dependencies may persist even after regressing out neighboring word embeddings due to two potential factors:

      (a) Temporal dependencies in embeddings: since the regression of neighbor words is performed at the word level rather than over time, temporal dependency may remain.

      (b) Cross-feature dependencies - specifically, correlations between embeddings and acoustic features.

      Regarding the first factor, it is not entirely clear to me whether this is a real problem—i.e., whether word-level regression fails to remove temporal dependencies. A simulation could help clarify this and support the argument. While it's not essential, it would be valuable if the authors could propose a method to address this issue, or at least outline it as a direction for future work.

      For the second point, it would be helpful for the authors to explicitly explain the potential relationship between word embeddings and acoustic features. Additionally, while correlations between features are a common problem in speech research, they are typically addressed by regressing out acoustic features early in the analysis (Gwilliams et al., 2022). It would strengthen the current findings if the authors could test whether the self-predictability persists even after controlling for neighboring embeddings and acoustic features.

      We appreciate the extensive and detailed engagement with our work, which has been very useful in highlighting key unclarities and gaps we had to address.

      We do believe our study goes well beyond what was shown by Goldstein, by identifying a fundamental limitation in their analysis, and showing that their purported control analyses do not in fact control for the problem. We’ll address the reviewers' sub-questions in turn.

      (i) Why this offers crucial insights beyond Goldstein et al.

      While Goldstein et al. indeed addressed embedding dependencies via residualization (or in their case projection), their conclusion relied on the assumption that any neural encoding surviving this "fix" must reflect genuine predictive pre-activation. Our study invalidates this assumption. By applying the residualization fix, we show that the "hallmarks of prediction" persist just as robustly in a passive control system that cannot predict (the speech acoustics) as in the neural data. (We also show this for bigram removal.)

      This provides a key new insight: persistent pre-onset predictivity after “correction” is not evidence that the dependency issue was solved. Instead, because the same effect persists in a system that cannot predict (acoustics), the persistence of the hallmarks cannot be attributed to prediction. It demonstrates that the standard "fix" is mathematically insufficient to remove the confound, rendering the original evidence for neural prediction fundamentally ambiguous.

      (ii) Why do dependencies/hallmarks persist after residualization?

      Residualization successfully removes the linear dependency between the current embedding (w_t) and the previous embedding (w_{t-1}) within the feature space. However, it does not (and cannot) remove the dependency from language itself, and therefore from the brain which (in some format) encodes the linguistic stimulus. Language is massively redundant. Knowing the current word tells you something about what came before – acoustically, syntactically, semantically. As long as the embedding identifies the word, the regression model will re-learn this relationship. For instance, in the case of acoustics, even when using the corrected embedding, the regression will re-learn that certain words (e.g., "Holmes") tend to follow certain acoustic patterns (e.g., the acoustics of "Sherlock"). “This shows that correcting the embeddings is insufficient: the dependencies exist in language itself, and the model will re-learn them from any signal that encodes that language.”

      (iii) Why not regress out the acoustics?

      This is also why "regressing out acoustics" (as the reviewer suggests) would miss the point. We do not claim that acoustic features leak into the neural signal or that acoustics are a specific confound to be removed. Rather, we use acoustics as a “passive baseline”: a system that encodes the stimulus but cannot predict. That the method yields "hallmarks of prediction" in this baseline demonstrates these hallmarks are not valid evidence for prediction—regardless of what additional features one regresses out. This motivates our proposed criterion: future studies seeing evidence for neural pre-activation should not rest on finding pre-onset encoding per se, since passive systems show this too. Rather, it should require demonstrating that the brain signal contains more information about the upcoming word than the passive stimulus baseline.

      As these aspects are fundamental to the interpretation of our study, we have fundamentally re-organised and re-wrote large parts of the paper. We hope it is much clearer now.

      (2) To better compare to Goldstein's work, the author may consider performing the same analyses using their publicly available dataset.

      This is a good suggestion. When we initially conducted this research, the Goldstein dataset was not yet publicly available. It now is, and we have applied our analyses to their stimulus material. The same problem emerges: the hallmarks of prediction appear in the acoustics of their podcast stimuli. Even after applying the control analyses, pre-onset predictivity is robust in their acoustics (indeed, in correlation terms, higher than reported for the neural data, so there is not more predictivity in the brain than in the stimulus material), confirming that the issue we identify applies to the original dataset. Results are shown in Figures S2B, S3B, S5C, and S6B.

      (3) It is also interesting to show the predictability effect after word onsets for self-predictability analyses, for example, in Figure 2C. The predictability effect is not only reflected in pre-onset responses but also in post-onset responses, i.e., larger responses for unpredicted words. Whether the stimulus dependency mirror this effect?

      Our paper focuses specifically on temporal dependencies – the capacity of the current word to predict the previous stimulus signal (e.g., previous acoustics, previous embeddings) – and how this mimics neural pre-activation. Post-onset analyses, by contrast, concerns the mapping between the current word and its concurrent signal, which involves fundamentally different mechanisms (e.g., mapping fidelity, frequency effects, acoustic clarity, word length) and would require the consideration of covariates of the attributes of the word post-onset to meaningfully interpret. Post-onset, there can be differences between predictable and non predictable words – e.g. sometimes unpredictable words are pronounced with more emphasis – which is why surprisal studies include a large range of covariates. However, this is not about stimulus dependencies or pre-activation, so we consider it is beyond scope of our study.

      (4) The authors might consider reporting the encoding performance for the residual word embeddings, similar to Figure S6B in Goldstein's paper. This would allow us to determine whether pre-activation persists in the MEG responses and compare its pattern with the predictability of pre-onset acoustics.

      We do report this analysis, in the revised supplement it is shown in Figure S7. We placed it in the supplement precisely because residualized embeddings are not the "fix" they appear to be: as we show, they still yield strong pre-onset predictivity in the passive acoustic baseline (Figure 4, S6), undermining their use as a control.

      (5) The series of previous pre-activation analyses proposed fruitful findings, e.g., the difference between brain regions (Fig. S4, (Goldstein et al., 2022)) and the difference between listeners and speakers (Figure 2, (Zada et al., 2024)). Whether these observed differences can be explained by the stimulus dependency?

      We appreciate this question. Our goal is to address the general logic of using pre-onset encoding as evidence for prediction, rather than to critique every finding in specific papers, especially as it pertains to a specific author. But briefly:

      Speaker vs. Listener differences (Zada et al., 2024): Zada et al. report distinct temporal profiles: speaker encoding peaks pre-onset (planning?), whereas listener encoding peaks post-onset but shows a pre-onset "ramp." Our critique applies to interpreting this ramp as "prediction." However, this interpretation is not central to their paper, which focuses on speaker-listener coupling via shared embedding spaces. We leave the implications (which are clear enough) to the reader.

      Regional differences (Goldstein et al., 2022): Encoding timecourses do vary across electrodes, as we also observe across MEG sources (and participants). But our point is logical: because pre-onset encoding does not necessarily reflect prediction, finding a channel with stronger pre-onset encoding does not mean that channel performs “more prediction”. For instance, one subject in the Armeni dataset showed higher pre-onset than post-onset encoding (and indeed activity) overall – but it would be implausible to conclude this subject "only predicts" and does not “process” or “listen”. More likely, this reflects differences in signal-to-noise, integration windows, or source contributions. The exact sources of these morphological differences are interesting but unclear, and speculating on them is beyond our scope.

      (6) I appreciate that the authors have shared their code; however, some parts appear to be missing. For example, the script encoding_analysis.py only includes package-loading code.

      Thank you for noticing, we have updated our code database.

      (7) What do the error bars in the figures represent - for example, in Figure 1C? How many samples were included in the significance tests? The difference between the two curves appears small, yet it is reported as significant. Additionally, Figure S1 shows large differences between subjects and between the two MEG datasets. Do the authors have any explanation for these differences?

      The shaded areas in our previous Figure 1c) show 95% confidence intervals computed over the 100 MEG sources identified to be part of the bilateral language system and the 10 cross-validation splits.

      We do not have an elaborate explanation for the differences in encoding performance across the three subjects in the few-subject dataset. Instead, we interpret these differences as a likely consequence of substantial inter-individual variability in evoked responses, even at the source level, arising from differences in cortical folding and the orientation of underlying current dipoles. We deem this a likely explanation since different electrodes in Goldstein’s ECoG data also showed very different encoding profiles.

      With respect to the multi-subject dataset, we suspect that the large differences stem most likely from two substantial differences: First, the acoustics were purposefully manipulated by the experimenters to reduce temporal dependence. This made it harder for listeners to concentrate on the stories and thereby might have potentially led to lower quality neural data. Furthermore, it reduced one form of stimulus dependency, namely the acoustic temporal dependencies, which could be exploited by the encoding model to reach higher encoding accuracies. Secondly, MEG has a notoriously poor signal-to-noise ratio, and the amount of data per participant (7.745 words as opposed to 85.719 in the few-subject dataset) might not have been enough to produce reliably high encoding results.

      Finally, the current study is clear and convincing, and my suggestions are not intended to question its novelty or robustness. Rather, I believe the authors are in a strong position to address a critical question in language processing: whether pre-activation occurs. The authors have thoughtfully considered important confounds related to pre-onset responses. Adding some approaches to regressing out these confounds could be particularly helpful for determining whether a true pre-onset response remains.

      We thank the reviewer again for their constructive feedback, suggestions and questions. To clarify, however, our goal is *not* to definitively attest to whether pre-activation occurs. Our goal is simply to scrutinise a specific method to test for linguistic prediction. This method purports to be an improvement on conventional post-onset (e.g. surprisal-based) methods, as it can directly investigate effects occurring prior to word onset. We have demonstrated fundamental limitations in the underlying logic of this method. We propose passive control systems as baselines against which claims of prediction should be evaluated. Against this baseline, the current evidence does not show unequivocal support for prediction: pre-onset encoding in the brain does not exceed that in the passive control. However, we do not conclude from this that pre-activation does not exist — that would require a different study entirely. Our aim is more methodological: to establish what should count as evidence for prediction, not to settle whether prediction occurs.

      We would like to thank the reviewers and editors for their thoughtful feedback, which has been tremendously helpful in improving the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the editor and both reviewers for their time and thoughtful feedback on our manuscript. We have carefully addressed all the concerns raised in the responses below and incorporated the suggested revisions into the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigated the population structure of the invasive weed Lantana camara from 36 localities in India using 19,008 genome-wide SNPs obtained through ddRAD sequencing.

      Strengths:

      The manuscript is well-written, the analyses are sound, and the figures are of great quality.

      Weaknesses:

      The narrative almost completely ignores the fact that this plant is popular in horticultural trade and the different color morphs that form genetic populations are most likely the result of artificial selection by humans for certain colors for trade, and not the result of natural selfing. Although it may be possible that the genetic clustering of color morphs is maintained in the wild through selfing, there is no evidence in this study to support that. The high levels of homozygosity are more likely explained as a result of artificial selection in horticulture and relatively recent introductions in India. Therefore, the claim of the title that "the population structure.. is shaped by its mating system" is in part moot, because any population structure is in large part shaped by the mating system of the organism, but further misleading because it is much more likely artificial selection that caused the patterns observed.

      The reviewer raises the possibility that the observed genetic patterns may have originated through the selection of different varieties by the horticultural industry. While it is plausible that artificial selection can lead to the formation of distinct morphs, the presence of a strong structure between them in the wild populations cannot be explained just based on selection. The observed patterns in the inbreeding coefficient and heterozygosity can indeed arise from multiple factors, including past bottlenecks, selection, inbreeding, and selfing. In the wild, different flower colour variants frequently occur in close physical proximity and should, in principle, allow for cross-fertilization. Over time, this gene flow would be expected to erode any genetic structure shaped solely by past selection. However, our results show no evidence of such a breakdown in structure. Despite co-occurring in immediate proximity, the flower colour variants maintain distinct genetic identities. This suggests the presence of a barrier to gene flow, likely maintained by the species' mating system. Moreover, the presence of many of these flower colour morphs in the native range—as documented through observations on platforms like iNaturalist—suggests that these variants may have a natural origin rather than being solely products of horticultural selection.

      While it is plausible that horticultural breeding involved efforts to generate new varieties through crossing—resulting in the emergence of some of the observed morphs—even if this were the case, the dynamics of a self-fertilizing species would still lead to rapid genetic structuring. Following hybridization, just a few generations of selfing are sufficient to produce inbred lines, which can then maintain distinct genetic identities. As discussed in our manuscript, such inbred lines could be associated with specific flower colour morphs and persist through predominant self-fertilization. This mechanism provides a compelling explanation for the strong genetic structure observed among co-occurring flower colour variants in the wild.

      To further validate this, we conducted a bagging experiment on Lantana camara inflorescences to exclude insect-mediated cross-pollination. The results showed no significant difference in seed set between bagged and open-pollinated flowers, supporting the conclusion that L. camara is primarily self-fertilizing in India. These results are included in the revised manuscript.

      As the reviewer rightly points out, the mating system of a species plays a crucial role in shaping patterns of genetic structure. However, in many natural populations, structuring patterns are often influenced by a combination of factors such as selection, barriers to gene flow, and genetic drift. In some cases, the mating system exerts a more prominent influence at the microgeographic level, while in others, it can shape genetic structure at broader spatial scales. What is particularly interesting in our study is that - the mating system appears to shape genetic structure at a subcontinental scale. Despite the species having undergone other evolutionary forces—such as a genetic bottleneck and expansion due to its invasive nature—the mating system exerts a more pronounced effect on the observed genetic patterns, and the influence of the mating system is remarkably strong, resulting in a clear and consistent genetic structure across populations.

      Reviewer #1 (Recommendations for the authors):

      Lantana camara is a globally invasive plant as the authors mention in their manuscript, but this study only focuses on India. This should be reflected in the title.

      The reviewer has suggested that the title should reflect the study area. Since our sampling covers nearly all regions in India, we believe the patterns observed here are likely representative of those in other parts of the invaded range. For this reason, we would prefer to retain the current heading.

      It would be helpful if the pictures of the flowers in Figure 3 were larger to more clearly see the different colors.

      As per the reviewers suggestion we have increased the size of the images to improve clarity.

      Figure 4 could probably be moved to supplemental material, it does not add much to the results.

      We feel it is important to reiterate that the patterns we observe in Lantana are consistent with what one would expect in any predominantly self-fertilizing species. It act as an additional proof and therefore, we believe it is important to retain this figure, as it effectively conveys this link.

      Reviewer #2 (Public review):

      Summary:

      The authors performed a series of population genetic analyses in Lantana camara using 19,008 genome-wide SNPs data from 359 individuals in India. They found a clear population structure that did not show a geographical pattern, and that flower color was rather associated with population structure. Excess of homozygosity indicates a high selfing rate, which may lead to fixation of alleles in local populations and explain the presence of population structure without a clear geographic pattern. The authors also performed a forward simulation analysis, theoretically confirming that selfing promotes fixation of alleles (higher Fst) and reduction in genetic diversity (lower heterozygosity).

      Strengths:

      Biological invasion is a critical driver of biodiversity loss, and it is important to understand how invasive species adapt to novel environments despite limited genetic diversity (genetic paradox of biological invasion). Lantana camara is one of the hundred most invasive species in the world (IUCN 2000), and the authors collected 359 plants from a wide geographical range in India, where L. camara has invaded. The scale of the dataset and the importance of the target species are the strengths of the present study.

      Weaknesses:

      One of the most critical weaknesses of this study would be that the output modelling analysis is largely qualitative, which cannot be directly comparable to the empirical data. The main findings of the SLiM-based simulation were that selfing promotes the fixation of alleles and the reduction of genetic diversity. These are theoretically well-reported knowledge, and such findings themselves are not novel, although it may have become interesting these findings are quantitatively integrated with their empirical findings in the studied species. In that sense, a coalescent-based analysis such as an Approximate Bayesian Computation method (e.g. DIY-ABC) utilizing their SNPs data would be more interesting. For example, by ABC-based methods, authors can infer the split time between subpopulations identified in this study. If such split time is older than the recorded invasion date, the result supports the scenario that multiple introductions may have contributed to the population structure of this species. In the current form of the manuscript, multiple introductions were implicated but not formally tested.

      Through our SLiM simulations, we aimed to demonstrate that a pattern of strong genetic structure within a location (similar to what we observed in Lantana camara) can arise under a predominantly self-fertilizing mating system. These simulations were not parameterized using species-specific data from Lantana but were intended as a conceptual demonstration of the plausibility of such patterns under selfing using SNP data. While the theoretical consequences of self-fertilisation have been widely discussed, relatively few studies have directly modelled these patterns using SNP data. Our SLiM simulations contribute to this gap and support the notion that the observed genetic structuring in Lantana may indeed result from predominant self-fertilisation. Therefore, we conducted these simulations ourselves for invasive plants to test whether the patterns we observed are consistent with expectations for a predominantly self-fertilising species.

      Additionally, as suggested by the reviewer, we have performed demographic history simulations using fastsimcoal2 to investigate the divergence among different flower colour morphs. The results have been incorporated into the revised manuscript.

      First, the authors removed SNPs that were not in Hardy-Weinberg equilibrium (HWE), but the studied populations would not satisfy the assumption of HWE, i.e., random mating, because of a high level of inbreeding. Thus, the first screening of the SNPs would be biased strongly, which may have led to spurious outputs in a series of downstream analyses.

      Applying a HWE filter is a common practice in genomic data analysis because it helps remove potential sequencing or genotyping artefacts, which can otherwise bias downstream analyses. However, we understand that HWE filtering can also remove biologically informative loci and potentially bias the analysis, especially when a stringent cutoff is used. A strict filter might retain only loci that perfectly fit Hardy–Weinberg expectations and exclude sites influenced by real evolutionary processes like selection and/or inbreeding.

      To balance this, we used a mild HWE filter, aiming to remove clear artefacts while retaining loci that may reflect genuine biological signals. Another reason for applying it is that many downstream tools, for example, admixture, assume the markers are neutral and not strongly deviating from HWE (although this assumption may not always hold). This helps in avoiding the complexity of the model.

      Second, in the genetic simulation, it is not clear how a set of parameters such as mutation rate, recombination rate, and growth rate were determined and how they are appropriate.

      We have cited the references for these values in the manuscript. However, for Lantana, many such baseline data are not available, so we used general values reported for plants, which is an accepted approach when working with understudied species. Moreover, the aim of these simulations was to develop a general understanding of how mating systems influence genetic diversity in invasive plants, rather than to parameterize the simulations specifically for Lantana.

      While we acknowledge that this simulation does not provide an exact representation of the species' evolutionary history, the goal of the simulation was not to produce precise estimates but rather to illustrate the feasibility of such strong genetic structuring resulting from self-fertilisation alone.

      Importantly, while authors assume the selfing rate in the simulation, selfing can also strongly influence the effective mutation rate (e.g. Nordborg & Donnelly 1997 Genetics, Nordborg 2000 Genetics). It is not clear how this effect is incorporated in the simulation.

      In genetic simulations, it is often best to begin with simpler scenarios involving fewer parameters, and we followed this approach. As the reviewer rightly pointed out, selfing can influence multiple factors such as mutation and recombination rates. However, to first understand the broad effects, we chose to work with simpler scenarios where both mutation and recombination rates were kept constant.

      Third, while the authors argue the association between flower color and population structure, their statistical associations were not formally tested.

      We thank the reviewer for this valuable suggestion. We have performed a MANOVA to test the association between flower colour and genetic structure. These results are incorporated in the revised manuscript.

      Also, it is not mentioned how flower color polymorphisms are defined. Could it be possible to distinguish many flower color morphs shown in Figure 1b objectively?

      We carefully considered this and defined our criteria based on flower colour. Specifically, we named morphs according to the colour of both young and old flowers. If both stages shared the same colour, we used that colour as the name. As shown in Figure 1b, it is possible to reliably distinguish between the different flower colour morphs. While one could also measure flower colour using a photometer, we believe both approaches yield similar results.

      I am concerned particularly because the authors also mentioned that flower color may change temporally and that a single inflorescence can have flowers of different colors (L160).

      The flower colour changes within an inflorescence, with young flowers shifting colour after pollination. However, this trend is consistent within a plant; for example, the yellow–pink morph always changes from yellow to pink. Based on this consistency, we incorporated a naming system that considers both the colour of younger and older flowers.

      Reviewer #2 (Recommendations for the authors):

      Figure 4: Figures a and b are not the "signatures of high inbreeding", because such patterns could also simply happen due to geographical isolation. The title of the figure could be changed. Figure 4c should be presented as a histogram.

      We have incorporated this suggestion into the manuscript and revised the figure title accordingly. However, we believe that presenting Figure 4c in its current form is more informative.

      L459 "in the introduced range, Lantana is self-compatible": is it self-incompatible in the native range? If it is known, it could be mentioned in the manuscript.

      A previous study from India demonstrated that self-fertilisation is possible in Lantana, providing an additional line of evidence for our findings. However, Lantana remains poorly studied in its native range, and to the best of our knowledge, only a single study has examined its pollination biology there, which we have cited in this paper.

    1. Reviewer #1 (Public review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (CRY) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of CRY action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that CRY acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this CRY action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      While the authors have made improvements in this resubmitted manuscript, there are still multiple concerns about the paper. I think the authors provide enough evidence suggesting that CRY plays a role in sleep under short photoperiod. The data also supports that CRY acts in GABAergic neurons. However, there are still major issues with the quality of the confocal images presented throughout the paper. In many cases it appears that the images are oversaturated with poor resolution, making it hard to understand what is going on. In addition, none of the drivers used in this study are specific to the neurons the authors aim to manipulate. Therefore, the identity of the GABAergic neurons involved in this CRY dependent sleep mechanism remains unclear. Similarly, whether l-LNvs are the target of this GABA mediated sleep regulation under short photoperiod is not fully demonstrated. The data presented suggests that but does not prove it.

      Major concerns:

      (1) While the authors provided sleep parameters like consolidation or waking activity for some experiments. These measurements are still not shown for several experiments (for example Figures 2E, 3, 4, 5, and 6). These data are essential, these metrics must be reported for all sleep experiments.

      (2) Line 144 "We fed flies with agonists of GABA-A (THIP) and GABA-B receptor (SKF-97541) (Ki and Lim, 2019; Matsuda et al., 1996; Mezler et al., 2001). Both drugs enhance sleep in WT," The proper citation is needed here, Dissel et al., 2015 PMID:25913403. Both THIP and SKF-97541 were used in that paper.

      (3) Figure 2C and 2F: it appears that the control data is the same in both panels. That is not acceptable.

      (4) Figure 4A: With the quality of the images, it is impossible to assess whether GABA levels are increased at the l-LNvs soma.

      (5) Fig 4 S1A shows colabeling of l-LNvs and Gad1-Gal4 expressing neurons. They are almost 100% overlapping signals. This would indicate that the l-LNvs are GABAergic themselves, or that there is a problem with this experiment.

      (6) Fig 4 S1B: Again, I can see colabelling of the GFP and PDF staining, suggesting that Gad1-Gal4 expresses in l-LNvs.

      (7) Line 184: "Consistently, knocking down Rdl in the l-LNvs rescues the long sleep phenotype of cry mutants (Figure 4-figure supplement 1D)." This statement is incorrect as the driver used for this experiment, 78G01-GAL4 is not specific to the l-LNvs, so it is possible that the phenotypes observed are not coming from these neurons.

      (8) Figure 4G-K: None of these manipulations are specific to the l-LNvs. The authors describe 10H10-GAL4 and 78G01-GAL4 as l-LNvs specific tools, but this is not the case. Why not use the SS00681 Split-GAL4 line described in Liang et al., 2017 PMID: 28552314? It is possible that some of the effects reported in this manuscript are not caused by manipulating the l-LNvs.

      (9) Similarly for the manipulation of s-LNvs, the authors cannot rule out effect that are coming from other cells as R6-GAL4 is not specific to s-LNvs.

      (10) The staining presented in Fig 5 S1 is not very convincing. Difficult to see whether Gad1-GAL4 only expresses in the s-LNvs.

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Chen et al. identified a role for the circadian photoreceptor CRYPTOCHROME (cry) in promoting wakefulness under short photoperiods. This research is potentially important as hypersomnolence is often seen in patients suffering from SAD during winter times. The mechanisms underlying these sleep effects are poorly known.

      Strengths:

      The authors clearly demonstrated that mutations in cry lead to elevated sleep under 4:20 Light-Dark (LD) cycles. Furthermore, using RNAi, they identified GABAergic neurons as a primary site of cry action to promote wakefulness under short photoperiods. They then provide genetic and pharmacological evidence demonstrating that cry acts on GABAergic transmission to modulate sleep under such conditions.

      Weaknesses:

      The authors then went on to identify the neuronal location of this cry action on sleep. This is where this reviewer is much more circumspect about the data provided. The authors hypothesize that the l-LNvs which are known to be arousal-promoting may be involved in the phenotypes they are observing. To investigate this, they undertook several imaging and genetic experiments.

      Major concerns:

      (1) Figure 2 A-B: The authors show that knocking down cry expression in GABAergic neurons mimics the sleep increase seen in cryb mutants under short photoperiod. However, they do not provide any other sleep parameters such as sleep bout numbers, sleep bout duration, and more importantly waking activity measurements. This is an essential parameter that is needed to rule out paralysis and/or motor defects as the cause of increased "sleep". Any experiments looking at sleep need to include these parameters.

      Thank you for bringing up these points. We have now included these sleep parameters in Figure 2—figure supplement 3.

      (2) For all Figures displaying immunostaining and imaging data the resolution of the images is quite poor. This makes it difficult to assess whether the authors' conclusions are supported by the data or not.

      We apologize for the poor resolution. This is probably due to the compression of the figures in the merged PDF file. We are now uploading the figures individually and hopefully this can resolve the resolution issue.

      (3) In Figure 4-S1A it appears that the syt-GFP signal driven by Gad1-GAL4 is colabeling the l-LNvs. This would imply that the l-LNvs are GABAergic. The authors suggest that this experiment suggests that l-LNvs receive input from GABAergic neurons. I am not sure the data presented support this.

      We agree that this piece of data alone is not sufficient to demonstrate that the l-LNvs receive GABAergic inputs rather than the l-LNvs are GABAergic. However, when nlsGFP signal is driven by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), we do not observe any prominent signal in the l-LNvs (Figure 5A and B; Figure 5-figure supplement 1A). We have also co-labeled using Gad1GAL4 and PdfLexA (Figure 5-figure supplement 1B). As can be seen, Gad1GAL4-driven GFP signal is present only in the s-LNvs but not the l-LNvs. This further supports the idea that the l-LNvs are not GABAergic, and that the syt-GFP signal likely arises from GABAergic neurons projecting to the l-LNvs.

      (4) In Figure 4-S1B. The GRASP experiment is not very convincing. The resolution of the image is quite poor. In addition, the authors used Pdf-LexA to express the post t-GRASP construct in l-LNvs, but Pdf-LexA also labels the s-LNvs, so it is possible that the GRASP signal the authors observe is coming from the s-LNvs and not the l-LNvs. The authors could use a l-LNvs specific tool to do this experiment and remove any doubts. Altogether this reviewer is not convinced that the data presented supports the conclusion "All in all, these results demonstrate that GABAergic neurons project to the l-LNvs and form synaptic connections." (Line 176). In addition, the authors could have downregulated the expression of Rdl specifically in l-LNvs to support their conclusions. The data they are providing supports a role for RDL but does not prove that RDL is involved in l-LNvs.

      Thank you for these wonderful suggestions. Again we apologize for the poor resolution and hopefully by uploading the images separately we can resolve this issue. We agree that the GRASP signal could be coming from the s-LNvs and not the l-LNvs but unfortunately we are not able to find a LexA that is specifically expressed in the l-LNvs. We believe the trans-Tango data further support the idea that GABAergic neurons project to and form synaptic connections with the l-LNvs. Nonetheless, we have changed our conclusion to “All in all, these results strongly suggest that GABAergic neurons project to the l-LNvs and form synaptic connections” to be more rigorous. In addition, we have obtained R78G01GAL4 which is specifically expressed in the l-LNvs, and using this GAL4 to knock down Rdl rescues the long-sleep phenotype of cry mutants (Figure 4—figure supplement 1D).

      (5) In Figures 4 A and C: it appears that GABA is expressed in the l-LNvs. Is this correct? Can the authors clarify this? Maybe the authors could do an experiment where they co-label using Gad1-GAL4 and Pdf-LexA to clearly demonstrate that l-LNvs are not GABAergic. Also, the choice of colors could be better. It is very difficult to see what GABA is and what is PDF.

      Thank you for this wonderful suggestion. We have now co-labeled using Gad1GAL4 and PdfLexA (Figure 5-figure supplement 1B). As can be seen, Gad1GAL4-driven GFP signal is present only in the s-LNvs but not the l-LNvs. We suspect the GABA signal at the l-LNvs may arise from the GABAergic projections received by these cells. We have now changed the color of the GABA/PDF signals in these images and have reduced the intensity of the PDF signal. Hopefully, it would be easier to visualize in this revised version.

      (6) Figure 4G: Pdf-GAL4 expresses in both s-LNvs and l-LNvs. So, in this experiment, the authors are silencing both groups, not only the l-LNvs. Why not use a l-LNvs specific tool?

      Thank you for bring up this important point. We have previously used c929GAL4 to express Kir2.1 and this led to lethality. We have now used two l-LNv-specific GAL4 drivers (R78G01GAL4 and R10H10GAL4) that we newly obtained to express Kir2.1 but did not observe significant effect on sleep. Please see Author response image 1 for the results.

      Author response image 1.

      Daily sleep duration of male flies expressing Kir2.1 in l-LNvs using R78G01GAL4 (A)(n = 40, 41, 30 flies) and R10H10GAL4 (B) (n = 40, 41, 32 flies) and controls, monitored under 4L20D. One-way ANOVA with Bonferroni multiple comparison test was used to calculate the difference between experimental group and control group.

      (7) Figure 4H-I: The C929-GAL4 driver expresses in many peptidergic neurons. This makes the interpretation of these data difficult. The effects could be due to peptidergic cells being different than the l-LNvs. Why not use a more specific l-LNvs specific tool? I am also confused as to why some experiments used Pdf-GAL4 and some others used C929-GAL4 in a view to specifically manipulate l-LNvs? This is confusing since both drivers are not specific to the l-LNvs.

      Thank you for bring up these important points. We have now used the l-LNv-specific R10H10GAL4 and the results are more or less comparable with that of c929GAL4 (Figure 4I and K), i.e. activating the l-LNvs blocks the long-sleep phenotype of cry mutants. The reason PdfGAL4 is used in 4G is because c929GAL4 leads to lethality while the l-LNv-specific GAL4 lines do not alter sleep.

      (8) Figure 5-S1B: Why does the pdf-GAL80 construct not block the sleep increase seen when reducing expression of cry in Gad1-GAL4 neurons? This suggests that there are GABAergic neurons that are not PDF expressing involved in the cry-mediated effect on sleep under short photoperiods.

      Yes, this is indeed the conclusion we draw from this result, and we commented on this in the Discussion: “Moreover, inhibiting cry RNAi expression in PDF neurons does not eliminate the long-sleep phenotype of Gad1GAL4/UAScryRNAi flies. Therefore, we suspect that cry deficiency in other GABAergic neurons is also required for the long-sleep phenotype. Given that the s-LNvs are known to express CRY and appear to be GABAergic based on our findings here, we believe that CRY acts at least in part in the s-LNvs to promote wakefulness under short photoperiod.”

      In conclusion, it is not clear that the authors demonstrated that they are looking at a cry-mediated effect on GABA in s-LNvs resulting in a modulation of the activity of the l-LNvs. Better images and more-suited genetic experiments could be used to address this.

      Thank you very much for all the comments. They are indeed quite helpful for improving our manuscript. Hopefully, with images of higher quality and the additional experiments described above, we have now provided more evidence supporting our major conclusion.

      Reviewer #2 (Public Review):

      Summary:

      The sleep patterns of animals are adaptable, with shorter sleep durations in the winter and longer sleep durations in the summer. Chen and colleagues conducted a study using Drosophila (fruit flies) and discovered that a circadian photoreceptor called cryptochrome (cry) plays a role in reducing sleep duration during day/night cycles resembling winter conditions. They also found that cry functions in specific GABAergic circadian pacemaker cells known as s-LNvs inhibit these neurons, thereby promoting wakefulness in the animals in the winter. They also identified l-LNvs, known as arousal-promoting cells, as the downstream neurons.

      Strengths:

      Detailed mapping of the neural circuits cry acts to mediate the shortened sleep in winter-like day/night cycles.

      Weaknesses:

      The supporting evidence for s-LNvs being GABAergic neurons is not particularly strong. Additionally, there is a lack of direct evidence regarding changes in neural activity for s-LNvs and l-LNvs under varying day/night cycles, as well as in cry mutant flies.

      Thank you very much for all the comments. We have now expressed nlsGFP by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), and positive signals in the s-LNvs can be observed (Figure 5A and B; Figure 5-figure supplement 1A). Hopefully, this can provide some further support regarding the s-LNvs being GABAergic neurons.

      We have now examined GCaMP signals in the l- and s-LNvs of WT and cry mutants under 4L20D/12L12D. Please see Author response image 2 for the results. As can be seen, both WT and cry mutants show photoperiod-dependent changes. Interestingly, cry mutants show more prominent reduction of GCaMP signal in the l-LNvs compared to WT under 12L12D vs. 4L20D, but the sleep duration phenotype is observed only under 4L20D. Moreover, GCaMP signal is elevated in the s-LNvs of cry mutants relative to WT under 4L20D but decreased under 12L12D. These results indicate that there are distinct mechanisms regulating sleep under short vs. normal photoperiod (with CRY being dispensable under 12L12D), and the role of CRY in modulating the activity of these neurons are also photoperiod-dependent. Further in-depth characterizations are need to delineate these complex issues.

      Author response image 2.<br /> Quantification of GCaMP6m signal intensity normalized to that of tdTomato under 12L12D and 4L20D (n = 25-45 cells). Student’s t-test: compared to WT, #P < 0.05, ##P < 0.01; 12L12D vs. 4L20D, *P < 0.05, ***P < 0.001.

      Reviewer #3 (Public Review):

      Summary:

      In humans, short photoperiods are associated with hypersomnolence. The mechanisms underlying these effects are, however, unknown. Chen et al. use the fly Drosophila to determine the mechanisms regulating sleep under short photoperiods. They find that mutations in the circadian photoreceptor cryptochrome (cry) increase sleep specifically under short photoperiods (e.g. 4h light: 20 h dark). They go on to show that cry is required in GABAergic neurons. Further, they suggest that the relevant subset of GABAergic neurons are the well-studied small ventral lateral neurons that they suggest inhibit the arousal-promoting large ventral neurons via GABA signalling.

      Strengths:

      Genetic analysis to show that cryptochrome (but not other core clock genes) mediates the increase in sleep in short photoperiods, and circuit analysis to localise cry function to GABAergic neurons.

      Weaknesses:

      The authors' conclusion that the sLNvs are GABAergic is not well supported by the data. Better immunostaining experiments and perhaps more specific genetic driver lines would help with this point (details below).

      (1) The sLNvs are well known as a key component of the circadian network. The finding that they are GABAergic would if true, be of great interest to the community. However, the data presented in support of this conclusion are not convincing. Much of the confocal images are of insufficient resolution to evaluate the paper's claims. The Anti-GABA immunostaining in Fig 4 and 5 seem to have a high background, and the GRASP experiments in Fig 4 supplement 1 low signal.

      We apologize for the poor resolution. This is probably due to the compression of the figures in the merged PDF file. We are now uploading the figures individually and hopefully this can resolve the resolution issue. Unfortunately, the GABA immunostaining does not work very well in our hands and thus the background is high. We have now adjusted the images by changing the minimum lookup table (LUT) value in the green channel to 213, which removes all pixels below 213. This can remove background without changing the gray values, so the analysis is not affected. We have modified all images the exact same way and hopefully this can improve the contrast. Furthermore, we have now expressed nlsGFP by two independent Gad1-GAL4 lines (one generated by P element insertion while the other generated by GAL4 inserted into the Gad1 locus), and positive signals in the s-LNvs can be observed (Figure 5A and B; Figure 5-figure supplement 1A). Hopefully, this can provide some further support regarding the s-LNvs being GABAergic neurons.

      Transcriptomic datasets are available for the components of the circadian network (e.g. PMID 33438579, and PMID 19966839). It would be of interest to determine if transcripts for GAD or other GABA synthesis/transport components were detected in sLNvs. Further, there are also more specific driver lines for GAD, and the lLNvs, sLNVs that could be used.

      Thank you for these wonderful suggestions. Based on PMID 19966839, both the s-LNvs and l-LNvs express Gad1 and VGAT at a relatively low level, although here in our study Gad1GAL4 expression is observed only in the s-LNvs and not l-LNvs. We have commented on this in the 4th paragraph of Discussion: “One study using cell-type specific gene expression profiling demonstrates Gad1 and VGAT expression in both s-LNvs and l-LNvs, although with relatively low signal (Nagoshi et al., 2010). Here we observed that Gad1GAL4 is expressed in the s-LNvs, and their GABA intensity is reduced when we use R6GAL4 to knock down VGAT in these cells.” PMID 33438579 does not report expression of these genes in either s-LNvs or l-LNvs, likely due to insufficient sequencing depth. Furthermore, we have now used two l-LNv-specific GAL4 lines (R78G01GAL4 and R10H10GAL4) to conduct some of the experiments that we previously used c929GAL4 for, and obtained comparable results (Figure 4I and K).

      (2) The authors' model posits that in short photoperiods, cry functions to suppress GABA secretion from sLNvs thereby disinhibiting the lNVs. In Fig 4I they find that activating the lLNvs (and other peptidergic cells) by c929>NaChBac in a cryb background reduces sleep compared to activating lLNVs in a wild-type background. It's not clear how this follows from the model. A similar trend is observable in Fig 4H with TRP-mediated activation of lNVs, although it is not clear from the figure if the difference b/w cryb vs wild-type background is significant.

      Thank you for bring up this important point. This does appear to be counterintuitive. We suspect that in cry mutants, there is more inhibition occurring at the l-LNvs and thus the system may be particularly sensitive to their activation. Therefore, activating these neurons on the mutant background can result in a more prominent wake-promoting effect compared to that of WT.

      Recommendations for the authors:

      Our major concern centers around the claim that the sLNvs are GABAergic and secrete GABA onto the lLNVs. As it stands, this is not well supported by the data.

      The authors could substantiate these findings by using more specific driver lines for GAD / vGAT (MiMic based lines are available that should better recapitulate endogenous expression). Transcriptomic data for circadian neurons are available, the FlyWire consortium also predicts neurotransmitter identities for specific neural circuits. These datasets could be mined for evidence to support the claim of sLNvs being GABAergic

      Thank you for these wonderful suggestions. We have now used MiMic-based lines for Gad1 (BS52090, Mi{MIC}Gad1MI09277) and VGAT (BS23022, Mi{ET1}VGATMB01219) to knock down cry but unfortunately were not able to observe changes in sleep. Please see Author response image 3 for the results.

      Author response image 3.

      Daily sleep duration of male flies with cry knocked down in GABAergic neurons by Gad1GAL4 (A) (n = 30, 38, 50, 18, 31 flies) or VGATGAL4 (B) (n = 28, 38, 50, 18, 30 flies) monitored under 4L20D.One-way ANOVA with Bonferroni multiple comparison test: compared to UAS control, ###P < 0.001.

      Furthermore, we have now included another Gad1GAL4 line which is generated by knocking GAL4 transgene into the Gad1 locus. We are also able to observe increased sleep when using this GAL4 to knock down cry, and positive signals in the s-LNvs can be observed when using this GAL4 to drive nlsGFP (Figure 2B; Figure 5-figure supplement 1A).

      Based on PMID 19966839, both the s-LNvs and l-LNvs express Gad1 and VGAT at a relatively low level, although here in our study Gad1GAL4 expression is observed only in the s-LNvs and not l-LNvs. We have commented on this in the 4th paragraph of Discussion: “One study using cell-type specific gene expression profiling demonstrates Gad1 and VGAT expression in both s-LNvs and l-LNvs, although with relatively low signal (Nagoshi et al., 2010). Here we observed that Gad1GAL4 is expressed in the s-LNvs, and their GABA intensity is reduced when we use R6GAL4 to knock down VGAT in these cells.” The FlyWire does not have prediction for this particular circuit that we are interested in.

      Further, many of the immunostaining images have high background / low signal - so better confocal images would help, as would the use of more specific driver lines for the lNVs as it is sometimes hard to distinguish the lLNvs from sLNvs.

      We have now adjusted all images by changing the minimum lookup table (LUT) value in the green channel to 213 and that of the red channel to 279, which removes all pixels below 213 and 279, respectively. This can remove background without changing the gray values, so the analysis is not affected. We have modified all images the exact same way and hopefully this can improve the signal to noise ratio. We were not able to find a LexA line that is specifically expressed in the l-LNvs but we have found two l-LNv-specific GAL4 lines (R78G01GAL4 and R10H10GAL4). We used these lines to conduct some of the experiments that we previously used c929GAL4 for, and obtained comparable results (Figure 4I and 4K).

      Additional specific comments are in the reviews above.

      Minor points:

      (1) Line 55: CRYPTOCHROME is misspelled.

      This has been fixed.

      (2) Line 140: The authors need to provide the appropriate references for the use of THIP and SKF-97541.

      This has been added.

      (3) Line 149: there are multiple GABA-A receptors in flies, the authors should acknowledge that. What about LccH3 or Grd?

      Thank you for bring up this important point. Here we focused only on Rdl because it is the only GABA-A receptor known to be involved in sleep regulation. We have modified our description regarding this issue: “We tested for genetic interaction between cry and Resistant to dieldrin (Rdl), a gene that encodes GABA-A receptor in flies and has previously been shown to be involved in sleep regulation.”

    1. 1. Any at least part-time student is eligible to participate2. AU faculty, staff and their spouses/children are eligible to play3. Players may be added to rosters during the first week of the season, but not after that.4. All players must be recorded on the game sheet before they are allowed to participate.1. Teams:a. Teams consist of 2 players on the court at one time.START OF PLAY1. Before starting the game, the opponents toss a coin with the winner choosing:(a) to serve first/to receive first, or(b) the side2. In subsequent games, the winning side serves first.SCORING SYSTEM1. A match consists of the best of 3 games of 21 points (games cap at 30 points). Each gamestarts at 0-0. If the match goes to the third game that third game will be played to 15.2. Every time there is a serve – there is a point scored.a. A rally is won when a shuttle is hit over the net and onto the floor of the opponent'scourt.b. A rally is lost if the shuttle is hit into the net, or over the net but outside of theopponent's court. A rally is also lost if the shuttle touches the player's clothing orbody, or if it is hit before it crosses over the net.3. The side winning a rally adds a point to its score.4. At 20 all, the side which gains a 2 point lead first, wins that game.5. At 29 all, the side scoring the 30th point, wins that game.6. The side winning a game serves first in the next game.INTERVAL AND CHANGE OF ENDS1. A 2 minute interval between each game is allowed.2. Teams switch sides at the end of each game.3. In the third game, teams switch sides when the leading score reaches 8 points.SERVING1. The first serve of the game is always made from the right side of the court to the oppositediagonal side. Only the player standing in the proper service court may return the serve.2. The server must obey laws designed to force underhand delivery of the serve, and the receivermust stand still until the service is struck.3. Following the serve, players may move anywhere on their side of the net.4. A player continues to serve (alternating service courts) as long as a player scores points.5. The service passes consecutively to the players as shown in the diagram (see below).

      1

    1. Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors developed a new autofocusing method, LUNA (Locking Under Nanoscale Accuracy), to address severe focus drift-a major challenge in time-lapse microscopy. Using this method, they tackle a fundamental question in bacterial cold shock response: whether cells halt growth and division following an abrupt temperature downshift. Overall, the experimental design, modeling, and data analysis are solid and well executed. However, several points require clarification or further support to fully substantiate the authors' conclusions.

      Strengths:

      (1) The LUNA method outperforms existing autofocusing systems with nanoscale precision over a large focusing range. The focusing time is reasonable for the presented experiments, and the authors note potential improvements by using faster motors and optimized control algorithms, suggesting broad applicability. The theoretical simulations and experimental validation provide solid support for the robustness of the method.

      (2) Using LUNA, the authors address a long-standing question in bacterial physiology: whether cells arrest growth and division after an abrupt cold shock. Single-cell analyses monitoring the entire course of cold adaptation and steady-state growth reveal features that are obscured in bulk-culture studies: cells continue to grow at reduced rates with smaller cell sizes, resulting in an apparently unchanged population-level OD. The experiments are well designed and analyses are generally solid and largely support the authors' conclusions.

      (3) The authors also propose a model describing how population-level OD measurements depend on cell dry mass density, volume, and concentration. This provides a valuable conceptual contribution to the interpretation of OD-based growth measurements, which remain a gold-standard method in microbiology.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) It is unclear whether the author's model explaining the population-level OD during acclimation is broadly applicable. Most analyses focus on a shift from 37˚C to 14˚C, where the model agrees well with experimental data. However, in the 37˚C to 12˚C experiment, OD600 decreases after cold shock (Fig. 5e), and the computed OD does not match the experimental measurements (Fig. S16a). Although the authors attribute this discrepancy to a "complicated interplay," no further explanation is provided, which limits confidence in the model's general applicability.

      Thank you for this careful evaluation regarding the model generality. In the experiment with a temperature shift from 37°C to 12°C, the measured OD600 values were 0.243 at 0 hours and 0.242 at 5 hours. In comparison, our model-computed OD600 values were 0.243 at 0 hours and 0.271 at 5 hours. The absolute difference between the measured and computed values at 5 hours is therefore 0.028.

      Given the typical experimental variability in OD600 measurements and the limited linear range of the OD-to-biomass approximation (generally considered reliable below ~0.5), this deviation is quantitatively modest. We appreciate your valuable feedback and are happy to provide further clarification if needed.

      (2) The manuscript proposes that cell-cycle progression becomes synchronized across the population after cold shock, but the supporting evidence is not fully convincing. If synchronization refers primarily to the uniform reduction in growth rate following cold shock, this could plausibly arise from global translation inhibition affecting all cells. However, the additional claim that "cells encountering a relatively late CSR will accelerate division to maintain synchronization" is not strongly supported by the presented data.

      We appreciate your critical reading, which has helped us identify ambiguities in our terminology and strengthen the clarity of our work. Regarding the term “synchronization”, we would like to clarify that it refers to two different scenarios: (i) the synchrony in the timing of growth rate changes after cold shock. The cells initiate the slowdown in growth almost simultaneously, suggesting a highly coordinated, non-stochastic population-level response to cold shock; (ii) the synchrony in division cycle progression.

      In the sentence you referenced “cells encountering a relatively late CSR will accelerate divisions to maintain synchronization”, we intended to describe that cells maintain consistent progression of the division cycle after cold shock, meaning that after the same number of elapsed cycles, different cells are at a similar stage in their division timing (Figure 4f, 4g, Figure S14). The term “accelerate” refers to our observation that cells which complete a given cycle later than others tend to have shorter subsequent inter-division intervals, thereby “catching up” to maintain alignment in cycle number across the population. We acknowledge that using “synchronization” in this scenario may be ambiguous, and we will replace it with more precise phrasing “progression of division cycle” to accurately convey this finding.

      (3) Several technical terms used in the method development section are not clearly defined and may be unfamiliar to a broad readership, which makes it difficult to fully understand the methodology and evaluate its performance. Examples include depth of focus, focusing precision, focusing time, focusing frequency, and drift threshold value. In addition, the reported average focusing time per location (~0.6 s) lacks sufficient context, limiting the reader's ability to assess its significance relative to existing autofocusing methods.

      Thank you for your valuable comments and suggestions. In response, we have added more detailed descriptions in the Methods section of the revised version.

      The reviewer noted that the reported average focusing time (~0.6 s) lacks sufficient context, which may limit readers’ ability to assess its significance relative to existing autofocusing methods. We would like to clarify that the core innovation of this work lies in the proposed theoretical framework for autofocusing, which offers advantages over existing methods in terms of focusing precision and range. While focusing time is a practically relevant performance metric, it is primarily presented here as an implementation-dependent parameter rather than a central theoretical contribution of this study. In our experimental setup, an average focusing time of 0.6 s proved sufficient for routine timelapse imaging in microscopy, thereby demonstrating the practical usability of LUNA.

      Reviewer #2 (Public review):

      Summary:

      This study presents LUNA, an autofocus method that compensates for focus drift during rapid temperature changes. Using this approach, the authors show that E. coli cells continue to grow and divide during cold shock, revealing a coordinated, multi-phase adaptation process that could not be deduced from traditional population measurements. They propose a scattering-theory-based model that reconciles the paradox between growth differences of the bacteria at the single-cell level vs population level.

      Strengths:

      (1) The LUNA approach is pretty creative, turning coma aberration from what is normally a nuisance into an exploit. LUNA enabled long-term single-cell imaging during rapid temperature downshifts.

      (2) The authors show that the long-assumed growth arrest during cold shock from population-level measurements is misleading. At the single-cell level, bacteria do not stop growing or dividing but undergo a continuous, three-phase adaptation process. Importantly, this behavior is highly synchronized across the population and not based on bet-hedging.

      (3) Finally, the authors propose a model to resolve a long-standing paradox between single-cell vs population behavior: if cells keep growing, why does optical density (OD) of the culture stop increasing? Using light-scattering theory, they show that OD depends not only on cell number but also on cell volume, which decreases after cold shock. As a result, OD can remain flat, or even decrease, despite continued biomass accumulation. This demonstrates that OD is not a reliable proxy for growth under non-steady conditions.

      We thank the reviewer for acknowledging the strengths of our study.

      Weaknesses:

      (1) While the authors theoretically explain the advantages of LUNA over existing autofocus methods, it is unclear whether practical head-to-head comparisons have been performed, apart from the comparison to Nikon PFS shown in Video S1. As written, the manuscript gives the impression that only LUNA can solve this problem, but such a claim would require more systematic and rigorous benchmarking against alternative approaches.

      Thank you for your insightful comment regarding the comparison of LUNA with other autofocus methods.

      In our study, we primarily compared LUNA with the Nikon PFS system (as shown in Video S1) because Nikon PFS is one of the most widely used commercial autofocus systems in single-cell time-lapse imaging, and its manufacturer provides well-defined performance parameters (e.g., focusing precision within 1/3 depth-of-focus, response time <0.7 s), which facilitates a quantitative comparison. For other commercial systems, such as Olympus ZDC, Zeiss Definite Focus, Leica AFC, and ASI CRISP, the publicly available specifications are often less clearly defined, or are measured under inconsistent conditions, making a direct head-to-head comparison challenging and potentially misleading. Additionally, in our preliminary experiments, we also tested an Olympus microscope and observed severe focus drift during slow cooling processes. From a physical perspective, LUNA is specifically designed to meet the demanding requirements of single-cell experiments, including a wide focusing range and high precision, while existing commercial systems may not physically achieve the combination of range and accuracy needed for such extreme conditions.

      (2) No mutants/inhibitors used to test and challenge the proposed model.

      We agree that such approaches would provide valuable mechanistic insights and further strengthen the validation of the model presented in this study. In the current work, our primary goal was to introduce LUNA autofocusing method and demonstrate its capability to resolve bacterial cold shock response at the single-cell level with unprecedented precision. As such, we focused on characterizing the wild-type physiological dynamics under cold shock, which already revealed several previously unreported phenomena. We acknowledge that the use of genetic mutants or chemical inhibitors targeting specific cold shock proteins or regulatory pathways would be a logical and powerful next step to dissect the underlying molecular mechanisms and test the causality of the observed growth dynamics. We plan to address this in future work by incorporating such perturbations to further test and refine the model.

      (3) Cells display a high degree of synchronization, but they are grown in confined microfluidic channels under highly uniform conditions. It is unclear to what extent this synchrony reflects intrinsic biology versus effects imposed by the microfluidic environment.

      The reviewer raises a pertinent question regarding whether the observed high degree of cell synchronization represents an intrinsic biological phenomenon or an artifact induced by the microfluidic environment.

      Over the past decade, microfluidic chips, including the specific design used in our work, have become a widely accepted and powerful tool in microbial physiology research. A broad consensus has emerged within the community that the microenvironment within these microchannels does not significantly interfere with or perturb the natural physiological behavior of microorganisms (Dusny, C. & Grünberger, Curr Opin Biotechnol. 63, 26-33 (2020)). This understanding is also supported by the fact that key findings obtained with microfluidic single-cell technologies are reproducible by other methods. For example, the adder model of cell-size homeostasis in E. coli firstly observed in microfluidic chips has been repeatedly validated by different methods (Taheri-Araghi, S. et al. Curr. Biol. 25, 385-391 (2015)). Therefore, while we acknowledge the importance of considering environmental effects, we are confident that the synchronization we report reflects the genuine biological dynamics of E. coli cells.

      (4) To further test and generalize the model, it would be informative to also examine bacterial responses at intermediate temperatures rather than focusing primarily on a single cold-shock condition.

      We thank the reviewer for this thoughtful suggestion. In designing our experiments, we aimed to study the bacterial cold shock response at the single-cell level. A key feature of this response is that it is typically triggered only when the temperature drops below a certain threshold within a short time duration. We therefore chose to lower the temperature from 37 °C to 14 °C as rapidly as possible. This approach allowed us to leverage the unique capabilities of LUNA while also providing an opportunity to explore this biological process in greater detail.

      We agree that investigating bacterial responses across intermediate temperatures would be highly informative for understanding how temperature changes affect cellular physiology. However, this direction addresses a distinct scientific question that lies beyond the scope of the current work. We fully acknowledge its value and do have the intention to explore it in future studies.

    1. Author response:

      (1) Claim regarding NNDSVD initialization

      Reviewer #1:

      The authors state that "MPS is the first implementation of Constrained Non-negative Matrix Factorization (CNMF) with Nonnegative Double Singular Value Decomposition (NNDSVD) initialization." However, NNDSVD initialization is the default method in scikit-learn's NMF implementation and is also used in CaIMAN. I recommend rephrasing this claim in the abstract to more accurately reflect MPS's novelty, which appears to lie in the specific combination of constrained NMF with NNDSVD initialization, rather than being the first use of NNDSVD initialization itself.

      We agree that our original phrasing was too broad. NNDSVD-family initialization is widely used in NMF implementations (e.g., scikit-learn) and is available within some pipeline components. We revised the abstract and main text to clarify our intended contribution: MPS seeds CNMF directly with NNDSVD-derived nonnegative factors as the primary initialization strategy, rather than relying on heuristic or greedy ROI-based seeding, integrated within a memory-efficient, end-to-end workflow for long-duration miniscope recordings.

      (2) Installation issue on macOS

      Reviewer #1:

      At present, there are practical issues that limit the usability of the software. The link to the macOS installer on the documentation website is not functional. Furthermore, installation on a MacBook Pro was unsuccessful, producing the following error: "rsync(95755): error: ... Permission denied ...unexpected end of file."

      We thank the reviewer for identifying the broken installer link and the macOS installation error. We fixed the macOS installer link on the documentation website and updated installation instructions to explicitly address common macOS permission-related failures (including rsync "Permission denied" errors that arise when attempting to write into protected directories without appropriate privileges). We re-tested installation on clean macOS systems and confirmed successful installation under the revised instructions.

      (3) Validation, benchmarking, and cross-pipeline comparison

      Reviewer #2:

      A major limitation of this manuscript is that the authors don't validate the accuracy of their source extraction using ground-truth data or any benchmark against existing pipelines... Without this kind of validation, it is impossible to truly determine whether MPS produces biologically acceptable results... Considering one of the main benefits of MPS is its low memory demand and ability to run on unsophisticated hardware, the authors should include a figure that shows how processing times and memory usage scale with dataset sizes and differing pipelines... runtime comparisons on identical datasets processed through MPS, CaImAn, Minian, or CaliAli would be necessary to substantiate performance claims of MPS being "10-20X faster".

      We thank the reviewers for their careful reading and for raising the question of biological validity, which we agree is central to any calcium imaging analysis tool. We would like to clarify, however, that MPS does not introduce a novel source extraction algorithm, and therefore the question of biological validity is not one that MPS alone can answer - nor should it be expected to. MPS is built on CNMF, the same mathematical framework underlying CaImAn and Minian. The contribution of MPS lies in its initialization strategy and parallelization architecture, which allow this proven framework to operate in the multi-hour recording regime.

      To address the reviewers' request for a direct qualitative comparison, we will run MPS, CaImAn, Minian, and MIN1PIPE on a representative 10-minute real recording with clearly visible neurons. The figure will show the spatial components (ROI footprints) and representative temporal traces (ΔF/F) for all four pipelines on identical data. We anticipate that the spatial layouts and temporal dynamics will be highly concordant across pipelines, demonstrating that MPS produces biologically consistent output. We believe this side-by-side comparison will provide a clear demonstration that MPS output is comparable in quality to established tools on tractable recordings.

      Regarding runtime comparison across pipelines, we will provide a table showing approximate processing times at three recording durations (5, 20, and 180 minutes). On short recordings, all pipelines are expected to complete successfully at different rates, whereas on long-duration recordings, this pipeline behavior is expected to diverge. We acknowledge that any single runtime benchmark reflects specific hardware and dataset characteristics and may not generalize to all configurations. We will therefore present these data as illustrative rather than definitive and will direct readers to the MPS documentation for guidance on hardware-specific tuning.

      (4) Dataset description and scope of generalizability

      Reviewer #2:

      The current datasets used for validating MPS are not described in the manuscript. The manuscript appears to have 28 sessions of calcium imaging, but it is unclear if this is a single cohort or even animal, or whether these data are all from the same brain region. Importantly, the generalizability of parameter choices and performance could vary for others based on brain region differences, use of alternative calcium indicators...

      We agree that the dataset description should be centralized and unambiguous. We added a dedicated Methods subsection stating that all results are based on a single, controlled experimental dataset consisting of 28 long-duration miniscope sessions acquired under consistent conditions (same brain region, calcium indicator, optical configuration, and acquisition parameters). This section explicitly specifies the number of animals, brain region, frame rate, field of view, session duration, and total data volume. We also clarified that conclusions are intended to evaluate MPS performance in this controlled long-duration setting rather than to claim universal parameter generalizability across brain regions, indicators, or optical systems.

      (5) Parameter guidance and documentation

      Reviewer #2:

      ...users should not be expected to blindly trust default or suggested parameter selections. Instead, users need guidance on what each modifiable parameter does to their data and how each step analysis output should be interpreted. Currently, the documentation and FAQ website linked to MPS installation does not do an adequate job of describing parameters or their optimization...

      We agree that users should not blindly trust default or suggested parameters. We substantially expanded and centralized documentation by adding a parameter-selection walkthrough that explains what each modifiable parameter does, how it affects intermediate and final outputs, and how diagnostic plots generated at each stage should be interpreted. Rather than prescribing dataset-specific parameter values, we explicitly framed parameter selection as an iterative, hypothesis-driven process informed by experimental factors such as calcium indicator kinetics, lens size and numerical aperture, field of view, recording duration, and expected neuronal density. We consolidated previously dispersed explanations from the GitHub repository into a single documentation site and expanded figure descriptions to guide interpretation by less experienced users. A representative sample dataset and accompanying analysis code were made publicly available at https://github.com/ariasarch/MPS_Sample_Code to support parameter exploration on tractable data.

      (6) Packaging and distribution

      Reviewer #1:

      ...current best practices in software development increasingly rely on continuous integration and continuous deployment (CI/CD) pipelines to ensure reproducibility, testing, and long-term maintenance. In this context, it has become standard for Python packages to be distributed via PyPI or Conda. Without dismissing the value of standalone installers, the overall quality and sustainability of MPS would be greatly enhanced by also supporting conventional environment-based installations.

      Regarding distribution more broadly: while our one-click installers are intended to reduce setup burden for non-programmers, we recognize the value of conventional environment-based distribution for longterm sustainability. We are exploring the feasibility of adding a standard PyPI and/or Conda installation pathway alongside the standalone installers. To ensure reproducibility across environments, all package dependencies are now explicitly version-pinned at installation time, eliminating environment drift as a source of irreproducibility.

      We would note, however, that PyPI distribution alone does not fully resolve the reproducibility challenges inherent to scientific Python software. Even with version-pinned dependencies, downstream changes in the Python interpreter itself, compiled extension modules, and platform-specific build toolchains can silently alter numerical behavior in ways that are difficult to anticipate or control. Our standalone installers address this by shipping a complete, fixed execution environment, and we believe this remains a meaningful architectural advantage for ensuring long-term reproducibility - particularly for non-developer users who may not be in a position to diagnose subtle environment-related failures. We see PyPI/Conda support and standalone installers as complementary rather than equivalent approaches, and will pursue both where feasible.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We appreciate the authors' efforts in addressing the concerns raised, particularly including a variance partitioning approach to analyse their data. Detailed feedback on the revised manuscript are below and we include a brief list of comments that we think the authors could address in the text: 

      (1) Justify metric selection - Could you please include in the text and explanation for why only five behavioural metrics were highlighted out of the many you calculated?

      We have added explanations throughout the manuscript clarifying the rationale for selecting these behavioral parameters, including in lines 467ff. and 531ff. In short, the five highlighted metrics were chosen because they capture key aspects of the behavioral repertoire and, importantly, can be consistently measured across all experimental conditions. Other parameters were excluded as they were only applicable under specific contexts and thus not suitable for cross-condition comparisons.

      (2) Discuss ICC variation - We note that there is variation among the ICC scores for the different metrics you've studied. While this is expected, we ask that you acknowledge in the text that some traits show high repeatability and others low, and reflect this variation in the conclusions.

      We have added an additional paragraph in the Discussion (lines 743ff.) addressing the variation in ICC values among behavioral traits. This new section highlights that some metrics show high repeatability while others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions about individual behavioral stability across contexts.

      (3) Tone down general claims - Because of the above point, we recommend that you avoid overstating that individuality persists across all behaviours. Please clarify this in the Abstract and main text that it applies to some traits more than others.

      We carefully reviewed the entire manuscript and revised the phrasing wherever necessary to avoid overgeneralization. Statements about individuality have been adjusted to clarify that consistent individuality can be measured in some behavioral traits more strongly than to others, both in the Abstract and throughout the main text.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths: 

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features betweendifferent contexts. 

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of highthroughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal retesting, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23°C and 32°C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32°C variance is predictable by the 23°C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and interindividual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to ingroup ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from rewording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or openhardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms. 

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the betweenfly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible. 

      We thank the reviewer for the careful and thoughtful assessment of our work.

      We have added an additional paragraph in the Discussion (lines 743ff.) explicitly addressing the variation in ICC values among behavioral traits. This section emphasizes that while some metrics show high repeatability, others exhibit lower consistency, and we discuss how this heterogeneity informs our conclusions regarding individual behavioral stability across contexts.

      Regarding the reviewer’s concern about the analytical approach, we would like to clarify that the hierarchical linear mixed model (LMM) was applied in a univariate framework—each behavioral metric was analyzed separately to estimate its individual ICC value. This approach allows us to quantify repeatability for each trait across environmental contexts while accounting for individual identity as a random effect. Although this is not a multivariate model in the strict sense, it represents an improvement over the prior pairwise correlation approach because it explicitly partitions within- and between-individual variance.

      As for the selection of behavioral metrics, the five parameters highlighted (% time walked, walking speed, vector strength, angular velocity, and centrophobicity) were chosen because they represent key, biologically interpretable dimensions of locomotor and spatial behavior and, importantly, could be measured reliably across all tested conditions. Several other parameters that we routinely analyze (e.g., Linneweber et al., 2020) could not be calculated in all contexts—for instance, under darkness or when visual cues were absent—and therefore were excluded to maintain consistency across assays.

      We agree that a truly holistic multivariate comparison across all extracted parameters would be valuable; however, given the contextual limitations of some metrics, such an analysis was not feasible in the present framework. We have clarified these points in the revised manuscript to avoid potential misunderstandings.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      We thank the reviewer for this helpful comment and agree that not all behavioral traits exhibit the same degree of inter-context consistency. We have clarified this point in the revised Abstract and ensured that it is also reflected in the main text. The Abstract now reads: 

      “We find that individuality is highly context-dependent, but even under the most extreme environmental alterations tested, consistency of behavioral individuality always persisted in at least one of the traits. Furthermore, our quantification reveals a hierarchical order of environmental features influencing individuality. We confirmed this hierarchy using a generalized linear model and a hierarchical linear mixed model. In summary, our work demonstrates that, similar to humans, fly individuality persists across different contexts (albeit worse than across time), and individual differences shape behavior across variable environments. The presence of consistency across situations in flies makes the underlying developmental and functional mechanisms amenable to genetic dissection.” 

      This revision clarifies that individuality is not uniformly expressed across all behavioral metrics, but rather in a subset of traits with higher repeatability, which are the most promising targets for future genetic analyses.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

      We thank the reviewer for drawing attention to this inconsistency in terminology. We apologize for the oversight and have corrected it throughout the manuscript to ensure uniform usage.

      Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations: 

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context. 

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anticonservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and withinindividual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      I am delighted to see the authors have included hierarchical models in their analysis. I really think this strengthens the paper and their conclusions while simultaneously making it more accessible to folks that typically use these types of methods to investigate these patterns of individual behavior. It's also cool, and completely jives with my own experience measuring individual behavior in that the activity metrics show the highest repeatability compared to the more flexible behaviors (such as "exploration"). I think it's quite striking and interesting to see such moderate repeatability estimates in these behaviors across what could be very different environmental scenarios. I think this is a very strong and meaty paper with a lot of information to digest producinghowever a very elegant and convincing take-home message: individuals are unique in their behavior even across very different environments.

      We sincerely thank the reviewer for the positive and encouraging feedback, as well as for their valuable input throughout the review process. We are very pleased that the inclusion of hierarchical models and the resulting interpretations resonated with the reviewer’s own experience and perspective.

    1. Reviewer #1 (Public review):

      Summary:

      The manuscript examines the factors that restrict the induction of IL-17-producing T cells during Mycobacterium tuberculosis (Mtb) infection. The authors show that neither the infectious route nor the duration of infection is responsible. But they do show that mice that lack the Th1-defining transcription factor, a finding consistent with prior reports in the field of immunology. They also show that 2 highly attenuated Mtb mutants in ESX-1 and PDIM, two well-known Mtb virulence factors, do induce IL-17-producing T cells. In contrast, Mtb mutants in mmpl4 are also similarly attenuated, but do not induce IL-17-producing T cells, suggesting that this property is not simply a result of attenuation but due to specific properties of ESX-1 and PDIM-deficient mutants.

      Strengths:

      (1) It is interesting that mice infected with ESX-1 and PDIM mutants have increased induction of Th17 cells.

      (2) The data are solid and convincing throughout.

      Weaknesses:

      There are two main criticisms:

      (1) It is not clear how much the factors uncovered here are true beyond B6 mice. B6 mice, compared to humans, are known to be very Th1-skewed, and Tbet is a strong inhibitor of Th17-specific T cells. Many people make IL-17-producing T cells in response to Mtb infection.

      (2) Very few novel insights are mechanistically revealed about how Th17 induction is restricted by Mtb. Tbet induction is known to restrict Th17 development, and this is a T-cell intrinsic mechanism. In contrast, the IL-23 association revealed seems to be extrinsic to T cells and to act on T cells. How, if at all, are these factors related to each other in restricting Th17 induction? Also, the conclusion that it is not a result of attenuation is not completely convincing.

      Other points:

      (1) The authors show that mice infected with a deficiency in ESX-1 have more IL-17-producing CD4 T cells in response to stimulation with an ESAT-6 peptide pool (Figure 3B). Because ESAT-6 is encoded by ESX-1, why do mice infected with this Mtb mutant have any ESAT-6-specific T cells? Is it an incomplete knockdown?

      (2) The manuscript states, "Under the conditions where Th17s are highly induced, mice infected with either ΔESX-1 or PDIM lacking Mtb, the Il17a-/- mice had ~3-5 fold higher CFU than WT mice (Figures 3F-G). These results indicate that the induction of Th17s is not dependent on the attenuation of Mtb in general, but instead Mtb utilizes ESX-1 and PDIM to suppress the induction of a Th17 response that enhances protection against Mtb infection." I don't think the last sentence is necessarily true. I can imagine a scenario in which the induction of the Th17s is, in fact, due to the attenuation, and the Th17 induction still contributes to protection.

      (3) ESX-1, PDIM, and mmpl4 mutants all have similarly reduced CFUs in the lung, but what about the LN? The bacterial burden in the LN may be more important for regulating T-bet, IL-23, and Th17 differentiation, since the LN is where T cell priming occurs, than the CFU in the lung. Perhaps ESX-1 and PDIM mutants have reduced CFU in the LN, but mmpl4 does not. This difference in LN burdens may be the primary driver of Th17 priming, as high avidity interactions are thought to be an important driver of T-bet induction.

      (4) Do LN cDC1 and high levels of IL-12 p35 in mice infected with the mmpl4 mutant? Likewise, LN cDC2's express low levels of IL-12 p19 (akin to those infected with WT Mtb)? If these observations for ESX-1 and PDIM mutants are mechanistically linked to the increased numbers of Th17 cells, then you would expect mice infected with mmpl4 mutants to be more like those infected with WT Mtb than those infected with ESX-1 and PDIM mutants.

      (5) ESX-1 and PDIM are very different virulence factors - a protein secretory pathway and cell wall lipid, respectively? Mechanistically, how would mutants in these pathways give very similar outcomes regarding Th17 cells unless it was simply as an aspect of their attenuation? Perhaps, mmpl4 mutants simply differ in some aspects of their attenuation, such as bacterial burdens in LNs, or their interaction with cDCs?

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors tackle an important question of why IL-17 production and TH17 responses are lower than expected during Mtb infection. The authors identify an axis of cross-regulation between TH1 and TH17 cells and provide data to support roles for Mtb virulence factors ESX1 and PDIM in promoting TH1 responses and/or suppressing TH17 responses.

      Strengths:

      The strengths include the significance of the work, the combination of host and Mtb genetic models to dissect the mechanistic basis for regulation of IL-17 production from T cells during infection, and the rigor of the experiments. There are a number of exciting findings from the work, including the cross-talk between T cell responses and the impact of ESX1 and PDIM on these responses.

      Weaknesses:

      The following conclusions and interpretations should be revisited, rephrased, and re-evaluated:

      (1) The manuscript neglects to analyze T cell responses in the dLN, which is the critical site where these responses are initiated (only DC cytokine production is measured in the dLN). The differences in the lungs could reflect trafficking of T cells to the lungs, local lung T cell responses, or durability of the T cell responses in the lungs. The authors state in the last results section that "These results indicate that the ESX-1 and PDIM virulence factors impact naïve T cell differentiation at the draining mediastinal lymph node..." but T cell responses are never measured in the dLN.

      (2) Figure 2: The authors state that "Importantly, IFN-γ deficient mice did not exhibit elevated levels of IL-17A producing CD4 T cells demonstrating that IFN-γ production is not the mechanism by which Th1 T cells limit a Th17 response during Mtb infection", but the difference is significantly different and even more obvious in Panel B. In fact, if the Panel D y-axis was on a log scale, the Ifng-/- would likely look more like Tbet-/- than WT. Based on this data, it seems like IFNg is having an effect and should not be completely discounted. Does the deletion of Ifng affect the number of Tbet+ T cells?

      In addition, the deletion of Tbet results in an increased number of IFNg+IL-17+ double positive T cells (Figure 2B), in addition to a sizable IFNg single positive T cell population maintained in the Tbet-/- mice (10x the negative control of Ifng-/-). Is this why Tbet deletion is not as severe as Ifng deletion, because T cells are still making IFNg?

      Along these lines, the statement in the text that, "Tbet-/-Il17a-/- mice completely lacked both IFN-γ producing...." T cells is not supported by the data in Figure 2C. Tbet-/-Il17a-/- mice look to have more gamma-producing T cells than Tbet-/- mice (which is already 10x the negative control of Ifng-/- in panel 2B if one includes the gamma single positive and IFNg/IL-17 double positive).

      (3) In the Results sections describing Figures 3, 4, and 5, the authors equate IL-17 production by T cells with TH17 responses and IFNg expression with TH1, but Tbet and RORgt expression in the T cells should be measured to make conclusions about TH1 and TH17. Or the authors can rephrase their findings to specifically state the observations as IFNg or IL-17 expressing CD4+ T cells.

      (4) Conceptually, do the authors think that ESX1/PDIM promotes TH1 responses and this blocks TH17 or are ESX1/PDIM blocking TH17 responses directly, allowing for increased TH1 responses? It would be helpful to clarify the model in this regard, describe how the data supports one model or the other, and then make sure the language is consistent throughout. Can these effects on T cell responses be tested and recapitulated in vitro using infected APC and T cell co-cultures?

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript by Zilinskas et al seeks to understand the mechanisms underlying the ability of Mtb to suppress Th17 differentiation. As Th17 responses are needed for protective immunity against TB, this is an important topic of investigation. They use Mtb mutants that lack eccC1 (from the ESX-1 locus) and fadD28 (encoding PDIM) and implicate a Tbet-dependent pathway by which Mtb modulates Th17 differentiation. The mechanism by which ESX-1/PDIM function to impact Th17 differentiation is, however, unclear, which limits the novelty of the results.

      Strengths:

      Understanding how Mtb limits Th17 differentiation has implications for vaccine development. Comparative study of KO mice and Mtb mutants is a strength.

      Weaknesses:

      (1) The authors should acknowledge and reference key findings from the literature that have identified suppression of Th17 differentiation as an Mtb virulence mechanism, e.g., the role of the Hip1 protease and CD40 signaling (Madan-Lala JI 2014, Sia Plos Path 2017, Enriquez iScience 2022) and Khader JI 2005, showing the requirement of IL-23 for Th17 responses in vivo in a TB mouse model.

      (2) Addressing several questions related to the Tbet KO mouse experiments would strengthen the study. Do the Tbet KO mice have elevated IL-4/5/13 (which has been previously reported in non-TB studies) in addition to IL-17? The lack of Th17 cells in the IFNg KO compared to the Tbet KO may be due to a difference in timing, since only 3-week data are shown; earlier and later time points would provide better interpretation. The authors do not present any data on neutrophil infiltration in WT vs Tbet KO vs IFNg KO mice. Since IL-17 is known to be important for recruiting neutrophils to the lung, data on neutrophils are important for clarifying the mechanism for the CFU outcomes.

      (3) While IL-23 is important for sustaining IL-17 production, IL-6, TGF-b and/or IL-1β are necessary for Th17 polarization. What were the levels of these cytokines in DCs in the lung? (Figure 5). Additionally, Tbet-deficient DCs exhibit impaired activation of antigen-specific Th1 cells and have reduced IL-12 production. Given the data showing higher IL-17 levels in Tbet KO mice, the authors should provide information on the DC phenotype (IL-23, IL-6, etc.) in the Tbet KO experiments.

      (4) The mechanism by which ESX-1/PDIM function to impact Th17 differentiation is not clear. While data showing a role for ESX-1 and PDIMs in inhibiting Th17 responses is interesting, there is no insight into the potential mechanism of action. Figure 3 showing reduction in IFNg+ CD4 T cells after infection with eccC1 and fadD28 mutants suggests that this outcome is due to a lower bacterial load relative to WT Mtb at the 3-week time point. Since IFNg is known to suppress IL-17, the higher levels of Th17 cells could be due to the reduction in IFNg due to the attenuated growth of the mutants. Additionally, what was the level of Type I IFNs elicited by these mutants?

      (5) Since macrophages have been implicated in the reduced cytokines seen in the ESX-1 mutant, IL-23 and other cytokine data on lung macrophages would complement the DC data.

      (6) Figure 5. There are many fewer DCs overall in the eccC1 and fadD28 mutant groups, which could account for the increased % IL-23p19 in DCs (5D). What were the levels of IL-23 in DC1s?

    1. Reviewer #2 (Public review):

      Summary:

      This paper by Diaz et al. uses the zebrafish model to examine how early embryonic exposure to Chlorpyrifos (CPF), a widely used organophosphate pesticide, induces social behavior deficits later in life. This paper combined behavioral testing, pharmaceutical treatment, genetic manipulation, and multi-omics to test the hypothesis that early CPF increases the abundance of denitrifying bacteria, Pseudomonas, which, in turn, enhances nitric oxide production and induces selective inhibition of HDAC8 and abnormal gene expression in the brain.

      Strengths:

      (1) The observation that early embryonic CPF exposure causes behavior deficits in juvenile zebrafish is very intriguing. It is especially exciting to see that CPF-induced behavior deficits can be reversed by overnight treatment with butyrate or HDAC1 inhibitors in juvenile zebrafish. In humans, CPF exposure during pregnancy causes brain abnormalities and neurological disorders such as Autism. Though it is far away from the zebrafish experimental study to human application, the experimental effects reported in the paper are still quite thought-provoking.

      (2) The authors performed RNA sequencing experiments on control zebrafish, CPF-exposed zebrafish, and CPF-exposed zebrafish that were treated with Butyrate. The data not only showed large-scale transcriptomic changes in the juvenile zebrafish brain in response to embryonic CPF exposure but also showed that many CPF-induced genetic alterations can be alleviated by butyrate exposure later in life.

      (3) The authors also performed untargeted metabolomics on zebrafish gut and metagenomic analysis in zebrafish feces samples. The results are interesting and support the conclusion that increased Intestinal Nitric oxide metabolism and the abundance of denitrifying bacteria, such as Pseudomonas, are associated with CPF exposure.

      (4) The large datasets presented in the paper will be useful to other researchers interested in understanding how CPF or butyrate alters brain and gut function. It might be useful to generate new hypotheses to power other research lines.

      (5) The social preferences, behavior testing, and experimental paradigm used by the paper may also be used by other researchers to investigate the interaction among gene, environmental factors, and brain function.

      Weaknesses:

      (1) The presented link between gut microbiome and CPF-induced behavior and genetic alteration is an association, but not causation. Although the research data align with the hypothesis, the hypothesis is not fully supported or tested by the data presented in the paper in the current state.

      (2) The authors performed several large omic studies. However, some of the presented analyses are relatively simple and incomplete. For example, the authors performed shotgun metagenomic analysis on zebrafish feces. However, the paper only displayed the bacterial taxa differences. Are there any differences in bacterial genetic pathways, especially the pathways associated with microbial nitrogen metabolism? What is the alpha and beta diversity looking like when comparing different experimental groups?

    1. L'Impasse de la Punition à l'École : Vers une Approche Éducative et Restauratrice

      Résumé Exécutif

      Ce document analyse les limites du système punitif traditionnel en milieu scolaire et explore les alternatives fondées sur la discipline positive et les actions réparatrices.

      Le constat central est que la punition, définie historiquement comme le fait de "faire payer", échoue souvent à modifier durablement le comportement des élèves.

      Au contraire, elle génère fréquemment des sentiments d'injustice, de rébellion ou de ressentiment (les "4 R").

      L'approche proposée repose sur un changement de paradigme : passer d'une culture de la domination à une culture de la coopération, caractérisée par un cadre à la fois ferme et bienveillant.

      En se concentrant sur l'acquisition de compétences psychosociales manquantes plutôt que sur le blâme, les établissements peuvent réduire de moitié les taux de récidive, améliorer le climat scolaire et favoriser la réussite académique.

      La clé réside dans l'autorégulation de l'adulte et l'implication de l'élève dans la recherche de solutions.

      --------------------------------------------------------------------------------

      1. Analyse du Système Punitif Actuel

      Définitions et Cadre Légal

      Le système éducatif français distingue deux niveaux d'intervention, bien que leurs objectifs finaux (rappeler la règle, stopper un comportement, assurer la sécurité) soient similaires :

      | Terme | Champ d'application | Autorité compétente | | --- | --- | --- | | Punition scolaire | Manquements mineurs (retards, manque de travail, bavardages). Outil de proximité. | Enseignant ou personnel éducatif. | | Sanction scolaire | Manquements graves ou répétés, atteintes aux biens ou aux personnes. Inscrite au dossier. | Chef d'établissement ou conseil de discipline. |

      L'Échec de la "Logique Interne"

      La punition repose sur l'idée que pour inciter un élève à mieux agir, il faut d'abord qu'il se sente mal.

      Or, les études et les témoignages cliniques démontrent que la punition déclenche une "logique interne" contre-productive chez l'élève.

      Les "4 R" de la punition (selon Jane Nelsen) :

      1. Ressentiment : Sentiment d'injustice ("Ce n'est pas juste", "L'adulte ne m'aime pas").

      2. Revanche : Volonté de rendre les coups ("Elle a écrasé ma balle, je ne ferai rien dans son cours").

      3. Rébellion : Opposition frontale pour prouver son autonomie.

      4. Retrait : Soumission apparente cachant une baisse d'estime de soi ou une fuite (malhonnêteté pour ne pas être pris).

      --------------------------------------------------------------------------------

      2. Le Changement de Paradigme : De la Domination à la Coopération

      Le document souligne que 33 % des enseignants continuent d'utiliser la punition traditionnelle, souvent par manque de temps, de formation ou d'alternatives face à des groupes difficiles.

      Le passage à une "action éducative et réparatrice" nécessite de modifier le regard porté sur l'enfant.

      La Métaphore de l'Iceberg

      Le comportement inapproprié n'est que la partie émergée de l'iceberg. Sous la surface se cachent des émotions, des perceptions et, surtout, des compétences manquantes.

      Approche punitive : Se concentre sur la pointe (le comportement) pour le supprimer par la force.

      Approche éducative : Cherche la compétence manquante (organisation, gestion de la colère, attention) pour l'enseigner.

      Les Piliers de l'Alternative

      L'alternative ne signifie pas l'absence de cadre, mais l'adoption d'une posture ferme et bienveillante :

      Fermeté : Respect de la règle et de la sécurité du collectif. Le cadre est non négociable.

      Bienveillance : Respect de la dignité de l'élève. On ne porte pas atteinte à son intégrité physique ou morale.

      --------------------------------------------------------------------------------

      3. Outils Pratiques pour une Discipline Positive

      L'approche se décline en trois axes : prévention, intervention et réparation.

      A. La Prévention

      Ritualiser l'autorégulation : Proposer des temps de silence ou de respiration (ex: la "minute d'installation") pour stabiliser le niveau émotionnel de la classe.

      Expliciter le cadre : Co-construire les lignes de conduite avec les élèves pour favoriser leur sentiment d'appartenance et de responsabilité.

      Travailler les contributions : Donner à chaque élève un rôle ou une responsabilité au sein du collectif.

      B. L'Intervention (Le Temps de Pause)

      Contrairement à l'exclusion-sanction, le temps de pause est un outil de gestion émotionnelle.

      Objectif : Redescendre physiologiquement en zone de calme (sortir du mode "cerveau dans la main" ou réactif).

      Processus : L'élève se rend dans un espace dédié (vie scolaire, bureau calme) non pas pour "réfléchir à ce qu'il a fait" sous la contrainte, mais pour retrouver ses capacités rationnelles.

      C. La Réparation (La Recherche de Solution)

      La réparation doit être liée, respectueuse, proportionnée et utile.

      Exemple : Un élève oubliant ses affaires ne doit pas simplement copier des lignes, mais identifier trois stratégies concrètes pour ne plus oublier son matériel et en tester une.

      Question clé pour l'adulte : "Quelle compétence a manqué à cet élève et comment puis-je l'aider à l'acquérir ?"

      --------------------------------------------------------------------------------

      4. Bénéfices Constatés et Conclusions

      Le passage d'un système punitif à un maillage éducatif (incluant la CNV, la justice restaurative ou la discipline positive) produit des résultats tangibles :

      1. Diminution de la récidive : Un exemple cité montre une réduction de 50 % des incidents graves en un an après l'adoption d'une démarche restaurative.

      2. Amélioration des résultats scolaires : Un élève en sécurité émotionnelle (dans sa "fenêtre de tolérance") est plus apte à l'apprentissage et à la concentration.

      3. Restauration du lien : L'approche privilégie le maintien de la relation enseignant-élève, évitant la rupture qui mène souvent à la violence contre l'institution.

      4. Apaisement de l'adulte : En sortant de la posture de domination permanente, l'enseignant réduit sa fatigue liée au "faire faire" par la contrainte.

      Conclusion synthétique : L'éducation efficace repose sur le principe de Jane Nelsen : "On fait mieux quand on se sent mieux".

      La punition traditionnelle, en dégradant l'état émotionnel de l'élève, fait obstacle à l'apprentissage qu'elle prétend pourtant servir.

    1. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate how the anterior claustrum may integrate temporally separated task-relevant signals to guide behavior in a delayed escape paradigm. Because in vivo neural recordings from claustrum during this task are extremely limited - comprising single-trial data with small neuronal samples - the authors adopt a modeling-driven approach. They train recurrent neural networks (RNNs) using only behavioral data (escape latency) to reproduce task performance and then analyze the internal dynamics of the trained networks. Within these networks, they identify a subset of units whose activity exhibits persistent responses and strong correlations with behavior, which the authors label as "claustrum-like." Using dimensionality reduction, decoding, and information-theoretic analyses, they argue that these units dynamically integrate conditioned stimulus (CS) and door-opening signals via nonlinear, trajectory-based population dynamics rather than fixed-point attractor states.

      To bridge model predictions and biology, the authors complement the modeling with in vitro slice experiments demonstrating recurrent excitatory connectivity and prolonged activity in the anterior claustrum that depends on glutamatergic transmission. They further compare latent neural trajectories derived from previously published in vivo claustrum recordings to those observed in the RNN, reporting qualitative similarities. Based on these results, the authors propose that the claustrum implements temporal signal integration through recurrent excitatory circuitry and dynamic population trajectories, potentially supporting broader theories of integrative brain function.

      Strengths:

      This study addresses an important and challenging problem: how to infer population-level computation in a brain structure for which in vivo data are sparse and experimentally constrained. The authors are commendably transparent about these limitations and seek to overcome them through a principled modeling framework. The integration of behavioral modeling, RNN analysis, and slice electrophysiology is ambitious and technically sophisticated.

      Several aspects stand out as strengths. First, the behavioral RNN is carefully trained and interrogated using a rich set of modern analytical tools, including cross-temporal decoding, trajectory analysis, and partial information decomposition, providing multiple complementary views of network dynamics. Second, the slice experiments convincingly demonstrate recurrent excitatory connectivity in the anterior claustrum, lending biological plausibility to the model's reliance on recurrent dynamics. Third, the manuscript is clearly written, logically organized, and conceptually engaging, and it offers a coherent mechanistic hypothesis that could guide future large-scale recording experiments.

      Importantly, the work has significant heuristic value: rather than merely fitting data, it attempts to generate testable computational ideas about claustral function in a regime where direct empirical access is currently limited.

      Weaknesses:

      Despite these strengths, the manuscript suffers from a recurring and substantial conceptual issue: systematic over-interpretation of model-data correspondence. While the modeling results are potentially insightful, the extent to which they are presented as recapitulating real claustral neural mechanisms goes beyond what the available data can support.

      A fundamental limitation is that the RNN is trained solely on behavioral output, without being constrained by neural data at either single-unit or population levels. As a result, the internal network dynamics are underdetermined and non-unique. Many distinct internal solutions could plausibly generate identical behavior. However, the manuscript frequently treats the specific internal solution discovered in the RNN as if it were a close approximation of the actual claustrum circuit.

      This issue is compounded by the sparse nature of the in vivo data used for comparison. The GPFA-based trajectory analyses rely on pseudo-populations and single-trial recordings, yet are interpreted as evidence for robust population-level dynamics. Because neurons were not recorded simultaneously, the inferred trajectories necessarily lack true population covariance and shared trial-to-trial variability, limiting their interpretability as genuine population dynamics. Similarly, conclusions about trajectory-based versus attractor-based computation are drawn almost exclusively from model analyses and then generalized to the biological system.

      Overall, while the modeling framework is appropriate as a hypothesis-generating tool, the manuscript repeatedly crosses the line from proposing plausible mechanisms to asserting explanatory or even causal equivalence between the model and the brain. This undermines the otherwise strong contributions of the work.

      Below are several specific points that warrant further clarification or revision:

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      Overall Assessment:

      This manuscript presents an interesting and potentially valuable modeling-based framework for thinking about temporal integration in the claustrum, supported by solid slice physiology. However, in its current form, it overstates the degree to which the proposed RNN dynamics reflect actual claustral neural mechanisms. With substantial revision - especially a more cautious interpretation of model-data similarity and a clearer articulation of modeling limitations - the study could make a meaningful contribution as a hypothesis-generating work rather than a definitive mechanistic account.

    2. Reviewer #2 (Public review):

      This manuscript reports the behavior of a computational model of rat claustral neurons during the performance of a behavioral task known as the delayed escape task (in this reviewer's understanding, this behavioral task was created and implemented by this group only). These authors have argued in a prior manuscript (Han et al.) that a group of neurons located "rostral to striatum" is part of the claustrum. The group names the region the "rostral to striatum claustrum." Additionally, in the Han et al. paper, the authors argue that these cells are responsible for maintaining a signal that lasts through the delay period.

      The main findings of the current paper are:

      (1) The authors have built a model network that was trained to show firing similar to what was reported for rats in their prior paper.

      (2) The authors' analysis of model behavior is used to suggest that the model network recapitulates biological activity, including the existence of a cluster of cells mainly responsible for the delay period firing.

      (3) The authors offer evidence from patch clamp recordings for excitatory interconnections among claustral neurons that are an essential feature of the model network.

      A major value of the computational network is that "trials" of the network can be performed. In experiments on animals, only single trials can be used.

      Concerns:

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

    3. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Our goal was to propose a possible computational mechanism underlying information integration in the claustrum, not to claim structural or causal equivalence between the model and the biological circuit. We acknowledge that some expressions in the original manuscript may have been interpreted as exceeding this intention, and we will revise the text to explicitly soften such statements.

      It is well established that behavior-trained RNNs can admit multiple internal solutions capable of producing the same behavioral output, and we fully agree with this point. Among the many possible solutions, we focused on networks that exhibited dynamical properties consistent with independently obtained behavioral and physiological findings. Thus, in our view, biological plausibility in this study is not grounded in structural isomorphism, but rather in whether the core population-level dynamical properties observed in the model are reproducible in actual claustral population activity.

      We also agree with the reviewer that our original qualitative comparison of GPFA-based latent trajectories did not provide sufficient quantitative support. In the revised manuscript, we have therefore added an eigenvalue-based quantitative analysis of the dimensional structure of population trajectories. This analysis does not depend on the identity of the dimensionality-reduction method itself, but instead focuses on quantifying the geometric structure of population-state trajectories as they evolve over time. Applying the same metric to both the RNN and biological claustrum data revealed consistent condition-specific differences in population dynamics.

      This quantitative addition strengthens the previous qualitative trajectory comparison and clarifies that the model implements a specific computational dynamical regime that directionally corresponds to claustral population activity. While this does not imply uniqueness of the model, we believe it suggests that the proposed computational principle represents a biologically realizable candidate mechanism.

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit.

      We agree with the reviewer’s concern. Expressions such as “closely mimicked,” “nearly identical,” and “recapitulate” will be replaced with more moderate language.

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      As the reviewer noted, behavior-trained RNNs can yield multiple internal solutions that generate the same behavioral output, and we acknowledge this non-uniqueness. However, we do not interpret the relatively low success rate (5/100 networks) as evidence of fragility. Rather, we interpret it as suggesting that the emergence of this particular dynamical regime requires stringent structural constraints.

      The computational demands of the task—specifically, the integration of temporally separated signals—drive convergence toward networks capable of sustaining persistent activity through recurrent excitatory connectivity. Indeed, all networks exhibiting a claustrum-like cluster shared a strong recurrent excitatory structure within Cluster 1, a structural feature consistent with our slice electrophysiology findings.

      Our criterion for selecting RNNs was their ability to reproduce behavioral and physiological observations from the delayed escape experiment. Excluded RNNs may reflect alternative information-processing strategies characteristic of other brain regions or artificial logical solutions. Importantly, claustrum-like dynamics were not explicitly enforced during training; they emerged spontaneously under behavioral constraints, suggesting that this solution is not arbitrary.

      Furthermore, the computational principles derived from the RNN were quantitatively consistent with in vivo single-neuron activity. Using an eigenvalue-based metric (λ<sub>3</sub>/Σλ), both the RNN and biological claustrum data showed effects in the same direction. Leave-one-neuron-out analyses further demonstrated that this pattern was broadly distributed across neurons in the claustrum. These convergent results suggest that the identified network captures a computational regime that is consistent with claustral population dynamics, rather than representing an arbitrary solution unrelated to the biological observations.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      We agree that the original GPFA trajectory comparison in the biological claustrum data remained qualitative and did not sufficiently establish robustness or population-level structure. We have therefore added quantitative analyses in the revised manuscript.

      Before presenting these analyses, we clarify methodological limitations inherent in pseudopopulation and single-trial data. GPFA estimates latent trajectories based on covariance structure and temporal smoothness assumptions. In pseudopopulations, true simultaneously recorded covariance cannot be fully reconstructed. Although our dataset is based on single trials rather than trial-to-trial variability, we acknowledge that latent-space estimation depends on covariance structure.

      Therefore, the additional quantitative metric is not independent of the GPFA estimation stage; rather, it evaluates the geometric structure of single-trial latent trajectories estimated by GPFA.

      Specifically, for biological data, we reanalyzed GPFA-estimated latent trajectories in PCA space and computed an eigenvalue-based metric (λ<sub>3</sub>/Σλ). Across 20 time bins, a sliding window of 10 bins was applied. For each window, we computed the covariance matrix and extracted eigenvalues for PC1, PC2, and PC3. The third eigenvalue (λ<sub>3</sub>) was normalized by total variance (Σλ = λ<sub>1</sub> + λ<sub>2</sub> + λ<sub>3</sub>). This metric quantifies the extent to which trajectories deviate from a planar (two-dimensional) structure into a third dimension. An increase in λ<sub>3</sub>/Σλ indicates the formation of a higher-dimensional geometric structure.

      For RNN data, since all unit activities were simultaneously observed and sufficient trials were available, we directly applied PCA to population activity without GPFA. Mean trajectories across trials were computed, and the same λ<sub>3</sub>/Σλ metric was applied. Although the initial dimensionality-reduction steps differ, the final metric definition and computation are identical. Thus, the comparison focuses on geometric dimensional structure rather than the dimensionality-reduction method itself.

      Importantly, within the biological dataset, GPFA estimation, preprocessing, pseudopopulation construction, subsampling strategy, temporal alignment, and smoothing were applied identically across the CS and Neutral conditions. Under this common analysis framework, λ<sub>3</sub>/Σλ values were consistently higher in the CS condition than in the Neutral condition.

      For the RNN data, an identical analysis pipeline was applied across the CS+Open and Open-only conditions. In this case as well, λ<sub>3</sub>/Σλ values were significantly higher in the CS+Open condition than in the Open-only condition.

      If structural bias arose from covariance estimation or dimensionality reduction, it would be expected to affect conditions similarly within each dataset. The observation that λ<sub>3</sub>/Σλ increases selectively in the CS condition in biological data and in the CS+Open condition in the RNN therefore supports the interpretation that the effect reflects a condition-specific dynamical difference rather than an artifact of dimensionality reduction.

      To further examine whether the effect was driven by a small subset of neurons, we performed leave-one-neuron-out analyses in the biological dataset. In the CS group, most neurons contributed relatively evenly to the metric, whereas such distributed contribution was not observed in the Neutral group. This suggests that the three-dimensional structure reflects an organized population-level phenomenon rather than covariance dominated by a small number of outlier neurons.

      These results indicate that the consistent elevation of λ<sub>3</sub>/Σλ in the CS condition (biological data) and in the CS+Open condition (RNN) reflects a genuine dynamical feature rather than an artifact arising from pseudopopulation construction or dimensionality reduction.

      Taken together, the three-dimensional geometric structure observed in GPFA-based latent trajectories is unlikely to reflect random noise. The replication of the same quantitative metric in the RNN, using an independent dimensionality-reduction procedure, strengthens the correspondence between the two systems. We appreciate the reviewer’s suggestion for quantitative reinforcement, which has substantially strengthened the manuscript.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      We agree with the reviewer and will clearly indicate that references to broader theoretical interpretations are speculative. We will substantially reduce their strength and emphasis.

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      We agree with the reviewer’s concern. We will describe the delayed escape task as “a behavioral paradigm that requires integration of temporally separated task-relevant signals” and remove inference-related terminology throughout the manuscript.

      Reviewer #2 (Public review):

      We appreciate the reviewer’s constructive and well-balanced comments. We regret that some of our wording and the scope of our introduction and discussion may not have appropriately reflected the contributions of prior studies. We will revise the manuscript accordingly to ensure that previous literature is more accurately and fairly acknowledged. In addition, we will reorganize the figures to more clearly present the hypotheses being tested and will provide additional details regarding both the modeling framework and the experimental procedures.

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      We will clarify more explicitly which data and methods originate from Han et al. (2024). In the original manuscript, Figure 1 panels A, D, E, F, and L (left) were indicated in the legend as originating from Han et al. (2024). We will further clarify this distinction in the main text. Additionally, we will briefly describe the behavioral experiments and in vivo electrophysiology performed in Han et al. in the Methods section, with appropriate citation.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      As requested, we will provide additional details regarding model training procedures, weight matrices and their evolution during training, equations (2) and (3), the origin of constants used in the equations, and detailed methods for ChrimsonR injection (anesthesia, stereotaxic coordinates, injection parameters, and clarification of “sparse expression”).

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      We will reorganize the figures to emphasize core results and clarify that the primary goal is to test and validate the computational model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      We will cite Orman (2015) as suggested and note that persistent activity has been observed in slices cut at specific angles, consistent with our findings.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      We will remove wording implying “limited” prior work and appropriately acknowledge contributions from the Mathur and Citri groups.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      Across all whole-cell recordings, optogenetic responses were observed in 38 out of 43 patched cells (~90%), suggesting that a high proportion of claustral neurons receive intra-claustral excitatory input. However, precise connectivity frequency and strength cannot be determined from the current dataset.

      As the reviewer noted, our RNN is specialized for the delayed escape task, and we do not claim direct generalization to other proposed claustral functions such as attention, salience, or sleep. The goal of this study is to computationally characterize the temporal integration mechanism observed in this specific task.

      While our model is specific to the delayed escape task, the computational principle identified here—nonlinear trajectory-based temporal integration supported by recurrent excitatory connectivity—may represent a more general mechanism for integrating temporally separated signals. However, testing such generality lies beyond the scope of the present study and will be framed as a future direction in the revised Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

      We thank Reviewer 1 for their kind and constructive comments. While we have thoroughly addressed all specific recommendations below, in brief, we have added new analysis of the variance inflation factor in Supplementary Tables 2 and 3 to reassure readers that the chosen parameter sets exhibit low levels of collinearity, and provided more explanation for why the relative positional parameters were chosen to avoid this issue. We have added explanatory figures for all positional and orientational parameters to improve understanding of the technical details, and improved clarity of existing figures as detailed below. We welcome the suggestion to add QT interval to the manuscript – whilst this was only available in the UK Biobank for a single lead, we have included an analysis of both QT and QTc intervals in this lead to Page 10, and added some discussion of this to the second full paragraph of Page 14.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comment 1: “Collinearity and Regression Analysis: It would be valuable to assess the collinearity among the regressed parameters (e.g., cardiac size, torso volume, heart center positions [x, y, z], and cardiac orientation angles) and evaluate whether alternative regression methods (e.g., ridge regression) might improve robustness. Additionally, cardiac digital twinning with electrophysiological models could help isolate the exact contribution of electrophysiology while enabling sensitivity analysis. Nonlinear regression or machine learning approaches might also enhance the predictive power of the analysis.”

      We thank the reviewer for drawing attention to the important issue of collinearity in the parameter sets used in the regression analysis. To address this, we have added Supplementary Tables 2 and 3, which detail the variance inflation factors for each of the parameter sets used. This was considered in the selection of anatomical parameters – e.g. using relative position not absolute distances between landmarks, which would be more collinear. As these are all below a value of 3.4, we believe that the effect of collinearity is limited, and thus to reduce subjectivity of parameter selection in more complex methods, and encourage interpretability, we have retained our linear regression analysis. In addition, we have added an explanation to the second full paragraph on Page 6 of how we calculated the relative, rather than absolute position of the cardiac centre partially to avoid the problem of collinearity when using multiple absolute distances. We concur that modelling and simulation techniques are well suited to explore the electrophysiological component further – as this is out of the scope of this work, we have addressed the role of these methods in future work in the final paragraph of Page 16.

      Comment 2: “Figure Clarity (Bar Plots): The superimposed bar plots in Figures 2-4 are difficult to interpret; separating the bars for each coefficient would improve readability.”

      We accept that the stacked bar plots could be improved in their clarity. Whilst plotting each anatomical parameter separately multiplies the number of plots by a factor of nine, and makes comparison between parameters more difficult, we have added clear horizontal grid lines in order to make values easier to read and interpret.

      Comment 3: “Feature Extraction Visualization: A schematic figure illustrating the steps for measuring heart positional parameters (e.g., with example annotations) would help readers better understand the feature extraction methodology.”

      We agree with the reviewer that the calculation of positional and orientational parameters is crucial to illustrate clearly. We have included additional Supplementary Figures 2 and 3 to better convey these parameters.

      Reviewer #2 (Public review):

      Summary:

      Missed diagnosis of myocardial ischemia (MI) is more common in women, and treatment is typically less aggressive. This diagnosis stems from the fact that women's ECGs commonly exhibit 12 lead ECG biomarkers that are less likely to fall within the traditional diagnostic criteria. Namely, women have shorter QRS durations and lower ST junction and T wave amplitudes, but longer QT intervals, than men. To study the impact, this study aims to quantify sex differences in heart-torso anatomy and ECG biomarkers, as well as their relative associations, in both pre- and post-MI populations. A novel computational pipeline was constructed to generate torso-ventricular geometries from cardiac magnetic resonance imaging. The pipeline was used to build models for 425 post-myocardial infarction subjects and 1051 healthy controls from UK Biobank clinical images to generate the population.

      Strengths:

      This study has a strength in that it utilizes a large patient population from the UK Biobank (425 postMI and 1051 healthy controls) to analyze sex-based differences. The computational pipeline is stateof-the-art for constructing torso-ventricular geometries from cardiac MR and is clinically viable. It draws on novel machine learning techniques for segmentation, contour extraction, and shape modeling. This pipeline is publicly available and can help in the large-scale generation of anatomies for other studies. This allows computation of various anatomical factors (torso volume, cavity volume, etc), and subsequent regression analysis on how these factors are altered before and after MI from the 12-lead ECG.

      Weaknesses:

      Major weaknesses stem from the fact that, while electrophysiological factors appear to play a role across many leads, both post-MI and healthy, the electrophysiological factors are not stated or discussed. The computational modeling pipeline is validated for reconstructing torso contours; however, potential registration errors stemming from ventricular-torso construction are not addressed within the context of anatomical factors, such as the tilt and rotation of the heart. This should be discussed as the paper's claims are based on these results. Further analysis and explanation are needed to understand how these sex-specific results impact the ECG-based diagnosis of MI in men and women, as stated as the primary reason for the study at the beginning of the paper. This would provide a broader impact within the clinical community. Claims about demographics do not appear to be supported within the main manuscript but are provided in the supplements. Reformatting the paper's structure is required to efficiently and effectively present and support the findings and outcomes of this work.

      We thank Reviewer 2 for their considered and detailed feedback. We greatly appreciate the invitation to elaborate on the electrophysiological factors, and we have added discussion of this matter to the second and third full paragraphs on Page 14, extending to Page 15 and first full paragraph on Page 15, and highlighted the role of modelling and simulation in future work on the third full paragraph of Page 16. We agree that registration errors are one reason behind remaining reconstruction errors and feel a strength of our study is that the large number of subjects used aided in reducing the effect of this noise, and have updated the second full paragraph of Page 16 to reflect this. We are wary of moving too many supplemental figures and tables describing demographic trends to the main manuscript for fear of diluting the specific answers to our research questions. We have however actioned the suggestions as detailed below to reformat the paper, including redressing the balance of supplemental versus main methodological sections, and thank the reviewer for their guidance in increasing our clarity.

      Reviewer #2 (Recommendations for the authors):

      (1) Please detail what "chosen to be representative of the underlying dataset" means in terms of a validation dataset.

      We thank the reviewer for addressing the lack of clarity in this matter. We have added a reference in the third full paragraph on Page 6 to Supplementary Appendix 1.1, where we have included full details of the selection criteria.

      (2) “Current guidelines ... further research [16]." The paragraph should begin with a broader statement that is relevant to the fact that the entire body of work focuses on ECG-based diagnosis differences in women, rather than LVEF through echocardiography.

      We have revised the introduction to Paragraph 3 on Page 3 to clarify our motivation for focusing on the ECG in order to shape proposals for novel ECG-based risk stratification tools.

      (3) The last paragraph of the introduction should more clearly state what was performed and how you aim to prove your hypothesis. There is no mention of the data, the regression model, or other key aspects important to the reader.

      We have added methodological details to Paragraph 5 on Page 3 in order to clarify our approach in testing our hypothesis.

      (4) An overview paragraph should be included in the Methods at the beginning.

      We thank the reviewer for this valuable suggestion – we have added an overview paragraph to the start of the methodology section on Page 5.

      (5) The computational pipeline portion of the methods should be written in full paragraphs instead of almost a bulleted list. In general, more details from the supplement should be provided in the methods.

      We thank the reviewer for raising important points concerning the balance of methodological description in the main manuscript and the supplementary materials. We have added detailed description of the reconstruction pipeline to Pages 5 and 6. We feel that the ordered format of the methods section adds to the reproducibility and transparency of our methodology.

      (6) The torso reconstruction method was already validated in Smith et al. [29]. What value does your additional validation bring to this methodology? Furthermore, how does the construction of the ventricular-torso reconstructions using the cardiac axes (not just the torso contours) influence ECG metrics?

      We apologise that this was not clear – we have clarified in Paragraph 4 on Page 5 that while Smith et al. 2022 provided a detailed validation to the contour extraction networks, it did not validate the torso reconstruction pipeline, as it only presents the reconstruction of two cases as a proof of concept. We have also expanded the second full paragraph on Page 6 to explain that the sparse (but not dense) cardiac anatomies were constructed in order to calculate the cardiac size, which we found was a key factor moderating many ECG biomarkers. We also specified that the cardiac position and orientation were necessary in order to relate these to the torso axes and positions of the ECG electrodes.

      (7) Include the details of the regression analysis in the main body of the methods for the readers. This is crucial to the claims and outcomes of the paper. Only a sentence is included in the results and one in the figure: "Each factor's contribution is calculated from the product of the regression coefficients and anatomical sex differences (Supplementary Appendix 1.5)." What specific contributions can I expect to see in the results figures? The results are filled with methodological aspects that should be in the results.

      We thank the reviewer again for this important comment regarding the balance of the main text methodology and supplementary methodology sections. We have added detail to the statistical analysis section of the main text on Pages 7 and 8 in order for the reader to understand the following results section without consulting the supplemental methods. We have also removed these details from the results section.

      (8) What is "the remaining estimated effect of electrophysiology". Did you do simulations on the electrophysiology, or how is this computed from the clinical data of patients? More explanation is needed, as without this, the paper is just focusing on anatomy.

      We have clarified this important point by moving the explanation of the methodology underpinning our estimation of the electrophysiological contributions using the clinical ECGs from the supplementary methods to the main manuscript on the second full paragraph on Page 7, and continuing to Page 8. We have also specified the role of simulations studies in future work on the final paragraph on Page 16.

      (9) Include an overview paragraph of the methods to create more structure.

      We thank the reviewer again for the further attention to this issue – as previously, we have added an overview paragraph to the methodology section on Page 5.

      (10) Only 19.8% of the patients were female, which is probably due to females having a more severe presentation of the disease. How does this impact, bias, or skew your results?

      This comment raises a very interesting point, and while the origin of this imbalance is of course multifactorial – women likely do have lower rates of MI events due to the cardioprotective role of estrogen and different health promoting behaviours, and our sex imbalance was reflective of wider trends in MI diagnosis. However, as mentioned in Paragraph 2 Page 3 of the text, there are more missed MI diagnoses in women, and we agree that this may lead to a more severe presentation of female MI pathophysiology. We have expanded the first full paragraph on Page 16 to specify the ECG and demographic impacts that this has on our results, and that it is a strength of this work that we may contribute to future adjustment of the diagnostic criteria, such that future investigations do not have this bias, and that clinical outcomes are improved.

      (11) A lot of extra information is provided in Tables 1 and 2. Include additional information in the supplements that is not directly relevant to your findings.

      We agree that Table 2 is supplementary, rather than critical information, and have moved it accordingly to the Supplementary Materials on Page 38. We do believe that Table 1 is central for understanding the extracted dataset.

      (12) Combine paragraphs 3 and 4 into a single paragraph. "Current guidelines..." and "T wave amplitude...". They are part of a single coherent concept.

      We have removed the paragraph break on Page 3 Paragraph 3.

      (13) Check all acronyms throughout the paper. The abbreviation for sudden cardiac death (SCD) is only used once in the same paragraph. Remove the acronym and type it out. T-wave amplitude (TWA) is introduced twice in a Figure caption and not introduced until the methods.

      Many thanks for this suggestion – we have reviewed all acronyms in the manuscript.

      (14) "Figure 1B showcases the capability of the computational pipeline to extract torso contours and reconstruct them into 3D meshes". Isn't this Figure 1A?

      We apologise that this was unclear, and have updated the sentence on the first full paragraph of Page 8 to clarify the purpose of Figure 1B.

      (15) No need to state: "Female y-axis limits have been adjusted by the difference in healthy QRS duration between sexes for ease of comparison" in the Figure 2 caption.

      We have removed this statement on all relevant captions.

      (16) The paragraph "For lead V6, 15.9% of healthy subjects..." can be combined with the previous section.

      We have removed this paragraph break on Page 9 to improve readability.

      (17) The only demographics I could find were age and BMI. State which demographics you used explicitly. This is especially true when the discussion makes claims like "Our findings suggest that corrected QRS duration taking into consideration demographics...". How did you take them into account?

      We accept that our previous description of the demographic adjustment to QRS duration in the discussion did not adequately reflect the comprehensiveness of our approach, and have adjusted the second paragraph on Page 14 to rectify this.

      (18) The results section is also almost a bulleted list that should be written and reformatted into paragraphs.

      The ordered style of our results section was designed to compare how our obtained data answers our research question differently for ECG intervals, amplitudes, and axis angles. Whilst we have adjusted paragraph breaks and moved methodological details to more appropriate sections, we have retained this stylistic choice.

      (19) The following sentence should be in the introduction: "Alterations to the polarity and amplitude of the T wave are used in the diagnosis of acute MI [42] and TWA affects proposed risk stratification tools, particularly markers of repolarization abnormalities [9, 43]."

      We thank the reviewer for this suggestion. We have included the discussion of how TWA is separately used in proposed risk stratification and current diagnostic tools in Paragraph 3 of Page 3.

    2. Reviewer #1 (Public review):

      Summary:

      The electrocardiogram (ECG) is routinely used to diagnose and assess cardiovascular risk. However, its interpretation can be complicated by sex-based and anatomical variations in heart and torso structure. To quantify these relationships, Dr. Smith and colleagues developed computational tools to automatically reconstruct 3D heart and torso anatomies from UK Biobank data. Their regression analysis identified key sex differences in anatomical parameters and their associations with ECG features, particularly post-myocardial infarction (MI). This work provides valuable quantitative insights into how sex and anatomy influence ECG metrics, potentially improving future ECG interpretation protocols by accounting for these factors.

      Strengths:

      (1) The study introduces an automated pipeline to reconstruct heart and torso anatomies from a large cohort (1,476 subjects, including healthy and post-MI individuals).

      (2) The 3-stage reconstruction achieved high accuracy (validated via Dice coefficient and error distances).

      (3) Extracted anatomical features enabled novel analyses of disease-dependent relationships between sex, anatomy, and ECG metrics.

      (4) Open-source code for the pipeline and analyses enhances reproducibility.

      Weaknesses:

      (1) The linear regression approach, while useful, may not fully address collinearity among parameters (e.g., cardiac size, torso volume, heart position). Although left ventricular mass or cavity volume was selected to mitigate collinearity, other parameters (e.g., heart center coordinates) could still introduce bias.

      (2) The study attributes residual ECG differences to sex/MI status after controlling for anatomical variables. However, regression model errors could distort these estimates. A rigorous evaluation of potential deviations (e.g., variance inflation factors or alternative methods like ridge regression) would strengthen the conclusions.

      (3) The manuscript's highly quantitative presentation may hinder readability. Simplifying technical descriptions and improving figure clarity (e.g., separating superimposed bar plots in Figures 2-4) would aid comprehension.

      (4) Given established sex differences in QTc intervals, applying the same analytical framework to explore QTc's dependence on sex and anatomy could have provided additional clinically relevant insights.

    1. Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) --forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) --indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      - Elegant behavioral design that affords the detection of hidden-state representations.<br /> - Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      -The number of subjects is small --can't fully rule out idiosyncratic, animal-specific effects.

      Comments on revisions:

      The authors have thoroughly addressed all of my previous comments. Congratulations on an excellent paper!

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors trained rats on a "figure 8" go/no-go odor discrimination task. Six odor cues (3 rewarded and 3 non-rewarded) were presented in a fixed temporal order and arranged into two alternating sequences that partially overlap (Sequence #1: 5<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-2<sup>+</sup>; Sequence #2: 3<sup>+</sup>-0<sup>-</sup>-1<sup>-</sup>-4<sup>+</sup>) - forming an abstract figure-8 structure of looping odor cues.

      This task is particularly well-suited for probing representations of hidden states, defined here as the animal's position within the task structure beyond superficial sensory features. Although the task can be solved without explicit sequence tracking, it affords the opportunity to generalize across functionally equivalent trials (or "positions") in different sequences, allowing the authors to examine how OFC representations collapse across latent task structure.

      Rats were first trained to criterion on the task and then underwent 15 days of self-administration of either intravenous cocaine (3 h/day) or sucrose. Following self-administration, electrodes were implanted in lateral OFC, and single-unit activity was recorded while rats performed the figure-8 task.

      Across a series of complementary analyses, the authors report several notable findings. In control animals, lOFC neurons exhibit representational compression across corresponding positions in the two sequences. This compression is observed not only in trial/positions involving overlapping odor (e.g., Position 3 = odor 1 in sequence 1 vs sequence 2), but also in trials/positions involving distinct, sequence-specific odors (e.g., Position 4: odor 2 vs odor 4) - indicating generalization across functionally equivalent task states. Ensemble decoding confirms that sequence identity is weakly decodable at these positions, consistent with the idea that OFC representations collapse incidental differences in sensory information into a common latent or hidden state representation. In contrast, cocaine-experienced rats show persistently stronger differentiation between sequences, including at overlapping odor positions.

      Strengths:

      Elegant behavioral design that affords the detection of hidden-state representations.

      Sophisticated and complementary analytical approaches (single-unit activity, population decoding, and tensor component analysis).

      Weaknesses:

      The number of subjects is small - can't fully rule out idiosyncratic, animal-specific effects.

      Comments

      (1) Emergence of sequence-dependent OFC representations across learning.

      A conceptual point that would benefit from further discussion concerns the emergence of sequence-dependent OFC activity at overlapping positions (e.g., position P3, odor 1). This implies knowledge of the broader task structure. Such representations are presumably absent early in learning, before rats have learned the sequence structure. While recordings were conducted only after rats were well trained, it would be informative if the authors could comment on how they envision these representations developing over learning. For example, does sequence differentiation initially emerge as animals learn the overall task structure, followed by progressive compression once animals learn that certain states are functionally equivalent? Clarifying this learning-stage interpretation would strengthen the theoretical framing of the results.

      We agree that the emergence of sequence-dependent OFC activity at overlapping positions (e.g., P3) implies knowledge of the broader task structure and therefore must depend on learning. Although we did not record during early acquisition in the current study, we can outline a learning-stage framework consistent with both prior work and the comparative analyses included here and include it in the discussion.

      We think the development of OFC representations is a multi-stage process. Early in learning, before animals have acquired the sequential structure of the task, OFC activity is likely dominated by local sensory features and immediate reinforcement history, with little differentiation between sequences at overlapping positions. As animals learn that odors are embedded within extended sequences that have utility for predicting future outcomes, OFC representations would begin to differentiate identical sensory cues based on their sequence context, giving rise to sequence-dependent activity at positions such as P3. This stage reflects acquisition of the broader task structure and the recognition that current cues carry information about future states.

      With continued training, however, OFC representations normally undergo a further refinement: positions that differ in sensory identity but are functionally equivalent become compressed, while distinctions that are irrelevant for guiding behavior are suppressed. Evidence for this later stage comes from our over-trained control animals, in which discrimination between overlapping positions is near chance across most trial epochs, and from prior work using the same task in less-trained animals, where sequence-dependent discrimination is more strongly preserved. Thus, sequence differentiation appears to emerge during structure learning but is subsequently down weighted as animals learn which distinctions are behaviorally irrelevant.

      Within this framework, prior cocaine exposure appears to interfere specifically with this later refinement stage. Cocaine-experienced rats exhibit OFC representations resembling those seen earlier in learning—retaining sequence-dependent discrimination at overlapping and functionally equivalent positions—despite extensive training. This suggests not a failure to acquire task structure per se, but rather an impairment in the ability to collapse across states that share common underlying causes.

      (2) Reference to the 24-odor position task

      The reference to the previously published 24-odor position task is not well integrated into the current manuscript. Given that this task has already been published and is not central to the main analyses presented here, the authors may wish to a) better motivate its relevance to the current study or b) consider removing this supplemental figure entirely to maintain focus.

      Thanks for your suggestion, we have removed this supplemental figure as suggested.

      (3) Missing behavioral comparison

      Line 117: the authors state that absolute differences between sequences differ between cocaine and sucrose groups across all three behavioral measures. However, Figure 1 includes only two corresponding comparisons (Fig. 1I-J). Please add the third measure (% correct) to Figure 1, and arrange these panels in an order consistent with Figure 1F-H (% correct, reaction time, poke latency).

      Thanks for your suggestion, we have included the related figure as suggested.

      (4) Description of the TCA component

      Line 220: authors wrote that the first TCA component exhibits low amplitude at positions P1 and P4 and high amplitude at positions P2 and P3. However, Figure 3 appears to show the opposite pattern (higher magnitude at P1 and P4 and lower magnitude at P2 and P3). Please check and clarify this apparent discrepancy. Alternatively, a clearer explanation of how to interpret the temporal dynamics and scaling of this component in the figure would help readers correctly understand the result.

      Thanks for your suggestion. We appreciate this point and agree that clearer guidance on how to interpret the temporal and scaling properties of the tensor components would help readers. In the TCA framework, each component is defined by three separable factors: a neuron factor, a temporal factor, and a trial (position) factor. The temporal factor reflects the shape of the activity pattern within a trial, indicating when during the trial that component is expressed, whereas the trial factor reflects how strongly that temporal pattern is expressed at each position and across trials.

      Importantly, the absolute scaling of these factors is not independently meaningful. Because TCA components are scale-indeterminate, the magnitude of the temporal factor and the trial factor should be interpreted relative to one another within a component, not across components. Thus, a large value in the trial factor does not imply stronger neural activity per se, but rather greater expression of that component’s characteristic temporal pattern at that position or trial.

      Accordingly, when a component shows similar temporal dynamics across groups but differs in its trial factor structure—as observed here—the interpretation is that the same within-trial dynamics are being differentially recruited across task positions, rather than that the timing of neural responses has changed.

      We have added a brief discussion of this in this section of the results in the manuscript.

      (5) Sucrose control

      Sucrose self-administration is a reasonable control for instrumental experience and reward exposure, but it means that this group also acquired an additional task involving the same reinforcer. This experience may itself influence OFC representations and could contribute to the generalization observed in control animals. A brief discussion of this possibility would help contextualize the interpretation of cocaine-related effects.

      We agree that sucrose self-administration is not a perfect neutral manipulation and that this experience could, in principle, influence OFC representations. In particular, sucrose self-administration involves instrumental responding for the same primary reinforcer used in the odor task, and thus may promote additional learning about reward predictability, action–outcome contingencies, or contextual structure that could facilitate generalization.

      Several considerations, however, suggest that the generalization observed in control animals primarily reflects learning-dependent refinement of task representations rather than a specific consequence of sucrose self-administration per se. First, the amount of sucrose administered during this phase was minimal (50 µl × 60 presses at most per session for 14 sessions) compared with the total sucrose reward obtained during task recording (100 µl × 160 trials per session for several dozen sessions). Second, all rats were extensively trained on the odor sequence task prior to any self-administration, and the key signatures of compression and generalization we report—near-chance discrimination between functionally equivalent positions—are consistent with prior studies using the same task in animals that did not undergo sucrose self-administration. Finally, comparisons to less-trained animals in earlier work show that OFC representations evolve toward greater abstraction with increasing task experience, indicating that generalization is a property of advanced learning rather than a unique outcome of sucrose exposure.

      Importantly, even if sucrose self-administration were to enhance generalization in OFC, this would not account for the primary finding that cocaine-experienced rats fail to show these signatures despite identical task training and parallel instrumental experience. Thus, the critical comparison is not between sucrose-trained animals and naive controls, but between two groups matched for self-administration experience, differing only in the pharmacological consequences of the reinforcer. Within this framework, the absence of position-general representations in cocaine-experienced rats reflects a disruption of normal learning-dependent abstraction rather than an artifact of the control condition.

      We have added a brief discussion acknowledging that sucrose self-administration may bias OFC toward abstraction, while emphasizing that cocaine exposure prevents the emergence or maintenance of these representations under otherwise comparable experiential conditions.

      (6) Acknowledge low N

      The number of rats per group is relatively low. Although the effects appear consistent across animals within each group, this sample size does not fully rule out idiosyncratic, animal-specific effects. This limitation should be explicitly acknowledged in the manuscript.

      We acknowledge that the number of animals per group is relatively small and therefore cannot fully rule out animal-specific effects. However, the key neural and behavioral signatures reported here were consistent across individual animals within each group and across multiple levels of analysis, and no outliers were observed. In addition, sample sizes of this scale are common in cocaine self-administration studies due to their technical and logistical constraints. We did not attempt to obscure this limitation and have now explicitly acknowledged it in the manuscript discussion.

      (7) Figure 3E-F: The task positions here are ordered differently (P1, P4, P2, P3) than elsewhere in the paper. Please reorder them to match the rest of the paper.

      Thank you for pointing this out. We agree that the ordering of task positions in Figures 3E–F should be consistent with the rest of the manuscript. We have reordered the positions to match the standard sequence order used elsewhere in the paper (P1, P2, P3, P4) to improve clarity and avoid confusion.

      Reviewer #2 (Public review):

      In the current study, the authors use an odor-guided sequence learning task described as a "figure 8" task to probe neuronal differences in latent state encoding within the orbitofrontal cortex after cocaine (n = 3) vs sucrose (n = 3) self-administration. The task uses six unique odors which are divided into two sequences that run in series. For both sequences, the 2nd and 3rd odors are the same and predict reward is not available at the reward port. The 1st and 4th odors are unique, and are followed by reward. Animals are well-trained before undergoing electrode implant and catheterization, and then retrained for two weeks prior to recording. The hypothesis under test is that cocaine-experienced animals will be less able to use the latent task structure to perform the task, and instead encode information about each unique sequence that is largely irrelevant. Behaviorally, both cocaine and sucrose-experienced rats show high levels of accuracy on task, with some group differences noted. When comparing reaction times and poke latencies between sequences, more variability was observed in the cocaine-treated group, implying animals treated these sequences somewhat differently. Analyses done at the single unit and ensemble level suggests that cocaine self-administration had increased the encoding of sequence-specific information, but decreased generalization across sequences. For example, the ability to decode odor position and sequence from neuronal firing in cocaine-treated animals was greater than controls. This pattern resembles that observed within the OFC of animals that had fewer training sessions. The authors then conducted tensor component analysis (TCA) to enable a more "hypothesis agnostic" evaluation of their data.

      Overall, the paper is well written and the authors do a good job of explaining quite complicated analyses so that the reader can follow their reasoning. I have the following comments.

      While well-written, the introduction mainly summarises the experimental design and results, rather than providing a summary of relevant literature that informed the experimental design. More details regarding the published effects of cocaine self-administration on OFC firing, and on tests of behavioral flexibility across species, would ground the paper more thoroughly in the literature and explain the need for the current experiment.

      We appreciate this suggestion and have tried to expand the Introduction to more explicitly situate the study within the existing literature on cocaine-induced changes in OFC function. In particular, prior work has shown that cocaine self-administration alters OFC firing properties and disrupts behavioral flexibility across species, including impairments in reversal learning, outcome devaluation, and sensory preconditioning. We have revised the Introduction to expand this literature review and more clearly articulate how these established findings motivated our focus on OFC representations of hidden task structure and generalization.

      For Fig 1F, it is hard to see the magnitude of the group difference with the graph showing 0-100%- can the y axis be adjusted to make this difference more obvious? It looks like the cocaine-treated animals were more accurate at P3- is that right?

      The concluding section is quite brief. The authors suggest that the failure to generalize across sequences observed in the current study could explain why people who are addicted to cocaine do not use information learned e.g. in classrooms or treatment programs to curtail their drug use. They do not acknowledge the limitations of their study e.g. use of male rats exclusively, or discuss alternative explanations of their data.

      We agree that the current 0–100% scale can make small differences difficult to discern. We will make it clear in the figure captions (We will adjust the y-axis to a narrower range to better highlight group differences). Across P3, cocaine-experienced rats were more accurate than controls.

      We appreciate the suggestion to expand the discussion. We have revised the concluding section to acknowledge key limitations, including the use of only male rats, the number of subjects, and to note that alternative explanations—such as differences in motivational state or attention—could also contribute to the observed effects. These revisions provide a more balanced interpretation while retaining the focus on OFC-mediated generalization as a potential mechanism for persistent, context-specific drug-seeking.

      Is it a problem that neuronal encoding of the "positions" i.e. the specific odors was at or near chance throughout in controls? Could they be using a simpler strategy based on the fact that two successive trials are rewarded, then two successive trials are not rewarded, such that the odors are irrelevant?

      We thank the reviewer for this point. While neuronal encoding of individual positions (specific odors) in control animals was comparatively lower, this does not indicate that the rats were using a simpler strategy based solely on reward patterns. First, rats were extensively trained on the odor sequence task prior to recordings, demonstrating accurate discrimination across all positions, and their trial-by-trial behavior reflects sensitivity to specific odors rather than only reward alternation. Second, the task design—with overlapping sequences and positions that differ in reward contingency across sequences—requires tracking odor-specific context to maximize reward; a purely “two rewarded, two non-rewarded” strategy would fail at overlapping positions and would not account for the compression of functionally equivalent positions observed in the OFC. Third, in the less-trained rats shown in Figure 3C, decoding accuracy was higher than in the sucrose group, indicating that these animals still differentiated negative positions. With additional training, decoding patterns suggested improved generalization across positions. Thus, the near-chance neural selectivity in controls reflects representation of latent task states rather than external sensory cues, consistent with the idea that OFC abstracts task-relevant structure and ignores irrelevant sensory differences.

      When looking at the RT and poke latency graphs, it seems the cocaine-experienced rats were faster to respond to rewarded odors, and also faster to poke after P3. Does this mean they were more motivated by the reward?

      At present, the basis of these response-time differences remains unclear, in part because motivation is difficult to define operationally. If motivation is indexed solely by reaction time or poke latency, then the data are consistent with increased response vigor in cocaine-experienced rats. Indeed, RT and poke-latency measures indicate that cocaine-experienced rats responded more quickly on some rewarded trials, including after P3. However, overall task performance was high in both groups, suggesting that these differences cannot be attributed simply to superior learning or engagement. Faster responses may also reflect differences in deliberation or strategy, with cocaine-experienced rats relying more on rapid, stimulus-driven responding and sucrose-trained rats engaging in more careful evaluation. In addition, altered reward sensitivity or persistent effects of cocaine exposure may contribute to these behavioral differences. Thus, the faster responses observed in cocaine-experienced rats likely reflect a combination of heightened reward responsivity and altered encoding of task structure, rather than a straightforward increase in motivation alone.

      Recommendations for the authors:

      The reviewers were very positive about the manuscript and emphasized the rigor and state of the art analyses. Two points that came up were the very small n (6 total and 3 per condition) and the exclusive use of males. Adding more subjects is not recommended. However, more discussion and acknowledgement of this issue is recommended. The main concern is that idiosyncratic differences between individuals (not differences in cocaine history) are responsible for the differences observed in OFC encoding.

      We acknowledge that the sample size (n = 3 per group) and use of only male rats limit generalizability and do not fully rule out idiosyncratic, individual-specific effects. However, the key neural and behavioral signatures we report were consistent across all animals within each group and across multiple analyses (single-unit, ensemble decoding, and TCA). We now explicitly note these limitations in the Discussion, emphasizing that while individual variability cannot be fully excluded, the convergence of results across multiple levels of analysis supports the interpretation that the observed differences reflect effects of prior cocaine exposure rather than idiosyncratic differences.

      Reviewer #2 (Recommendations for the authors):

      In the legend to figure 2, the authors state "Notably, rats could discriminate between the two sequences (S1 vs. S2) based solely on current sensory information at two task epochs ["Odor" at P3 and P4; black bars]. At all other task epochs, indicated by gray bars, the discrimination relied on an internal memory of events". I'm confused by this statement- how does the odor at P3 help to discriminate the sequences? Surely P1 and P4 are the times when the odor sampling indicates which sequence they are in?

      We thank the reviewer for pointing out this source of confusion. The statement in the original figure legend was imprecise, and we have removed the figure and revised the figure legends because the results in the left panel substantially overlapped with those shown in the right panel. In this task, odors at positions P1 and P4 are the only cues that directly signal sequence identity, whereas the odors presented at P2 and P3 are identical across sequences. Accordingly, discrimination observed during the “Odor” epoch at P3 does not reflect sensory differences but instead depends on the animal’s use of internal memory or sequence context to infer sequence identity.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1:

      Yet I think that important aspects of my critique of the first statement of the manuscript about the flaws of [SR] model remain unanswered.

      I believe that I have fully addressed the points in the earlier review. The reviewer had doubted that my results were correct, attributing them to “a poor setup of the model” on my part. The reviewer stated that if I were correct about the factor of >10<sup>43</sup> change in cmax, this would “naturally break down all the estimates and conclusions made in Siljestam and Rueffler” (S&R).

      It appears that the reviewer is now convinced that my results represent a faithful analysis of the models on which S&R based their claims. The reviewer now contends that these results, including the factor of >10<sup>43</sup>, present no difficulties for the claims of S&R after all. In fact, this enormous factor of >10<sup>43</sup> is now claimed to support the conclusions of S&R by invalidating my conclusions. I respond to these new and very different arguments in what follows.

      As I stated in the first round of review, the issue is not the enormity of this factor per se, but the fact that the compensatory adjustment of cmax conceals the true effects of changes in other parameters. These effects are large; small changes to the parameter values mostly eliminate the diversity that the model is claimed to explain.

      The model in [SR] is not phenomenological as none of the parameters or functional forms were derived empirically. Instead, it is a proof of principle demonstration that inevitably grossly simplifies the actual immune response.

      The hidden sensitivity of the results of S&R to paramater values is sufficient to invalidate them as a proof of principle. The manuscript goes further and explains how the problem "is not specific to the details of the models of Siljestam and Rueffler, but is inherent in the phenomenon invoked to allow high diversity" because "any change that affects condition by as much as the difference between MHC heterozygotes and homozygotes will eliminate high equilibrium diversity". This general principle addresses all of the reviewer's points.

      In reality, a new pathogen cannot reduce the "survival" by such a factor as it would wipe out any resident population. So to compensate for such an artifact, the additional factor cmax was introduced to buffer such an excess. There is no reason to fix cmax once for an arbitrary number of pathogens, because varying cmax basically reflects the observation that a well-adapted individual must have a reasonable survival probability.

      This is not a legitimate reason for making compensatory, diversity-promoting adjustments to cmax when evaluating sensitivity to other parameters. If the number of pathogens or their virulence changes, cmax obviously does not automatically change along with it. If the population or species consequently goes extinct, then it goes extinct. If it persists, it does so with the same value of cmax.

      The possibility of extinction arguably puts a minimum value on cmax, but it does not restrict it to a range of values that conveniently leads to high MHC diversity. In the examples that I analyzed, slightly decreasing the number of pathogens or their virulence, which increases survivability, eliminates diversity. This phenomenon obviously cannot be dismissed on the grounds that survivability would be too low for the species to exist.

      S&R in effect assume that the condition of the most fit homozygote remains fixed, regardless of the number of pathogens, their virulence, and myriad other differences between species. It is this assumption that is without justification.

      At the same time, there are many ways in which the numerical simulation may break down when the survival rates become of the order of 10^(-43) instead of one

      I am not sure what is meant by “the numerical simulation may break down”. Numerical error is not a tenable explanation of the lack of diversity observed in that simulation. The outcome is exactly what is expected from purely theoretical considerations: conditions of all genotypes fall on the steep part of the curve, making the mechanism proposed by S&R largely inoperative, so a pair of alleles forming a fit heterozygote comes to predominate. The numerical simulation is actually superfluous.

      Low survival rates are completely irrelevant to the effect of decreasing the number of pathogens or their virulence, which does not lower survival rates, but does eliminate diversity.

      so it comes to no surprise that the diversification, predicted by the adaptive dynamics, does not readily occur in the scenario with an addition or removal of the 8th pathogen with a very high virulence \nu=20.

      Whether or not it surprising, the lack of diversity is a problem for the claims of S&R, as there is no reason to expect the number of pathogens to have just the right value to produce high diversity. Furthermore, for many combinations of values of the other parameters (e.g., my v=19.5 and 20.5 examples), no number of pathogens leads to high diversity.

      Again, the general principle mentioned above makes the details that the reviewer refers to irrelevant. Nonetheless, some additional remarks are in order:

      (1) This comment ignores the fact that removal of a pathogen, or a slight decrease in “virulence”, eliminates diversity without lowering survival rates.

      (2) Small increases or decreases in v (virulence) eliminate diversity without having such large effects on condition.

      (3) In the example emphasized by the reviewer, mean survival rates are nowhere near as low as 10<sup>-43</sup>. Only homozygotes have such low fitness.

      (4) The adaptive dynamics predict the low diversity seen in the simulations, contrary to what the reviewer seems to suggest. Elimination of diversity is not an artifact of the simulation.

      (5) v\=20 was chosen because it is most favorable to the model of S&R in that it yields the highest diversity. Indeed, S&R only observed realistically high diversity with the narrow gaussians that the reviewer objects to. With lower values of v, diversity is much lower, but even this meager diversity is eliminated by small changes in parameter values (see below). If narrow gaussians and large effects of pathogens somehow invalidate results, then they invalidate the high-diversity results of S&R.

      I have doubts that the reported breakdown of the [SR] model with fixed cmax remains observable with less extreme values of m and \nu (say, for \nu=7 and m=3 plus or minus 1 used in Fig. 3 in the manuscript).

      These doubts are unwarrented. With the suggested parameter values, for example, increasing or decreasing m by 1 reduces the effective number of alleles to around 1 or 2. This can easily be checked using the simulation code of S&R, as detailed in my initial response and now in a Supplementary Text. Even without this result, the general principle mentioned above tells us that considering other regions of parameter space cannot rescue the conclusions of S&R.

      So I still find the claim that " the phenomenon that leads to high diversity in the simulations of Siljestam and Rueffler depends on finely tuned parameter values" is not well substantiated.

      What is unsubstantiated is the claim of S&R that “For a large part of the parameter space, more than 100 and up to over 200 alleles can emerge and coexist”. As my manuscript illustrates, this is an illusion created by the adjustment of one parameter to compensate for changes in others.

      The reviewer even acknowledges that “the choice of constants and functions...works in a limited range of parameter values”. Furthermore, the manuscript explains why this problem is inherent to the general phenomenon, not specific to the details of the model or parameter values.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth. Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number. It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler. I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c</sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct. The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values. A simple way to determine this number is to have the simulation code print the value to which c</sub>max</sub> is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values. I have in this way confirmed this factor using the simulation code written and used by Siljestam and Rueffler. A procedure for doing so is described in the new Supplementary Text S1. In addition, I now give a theoretical derivation of this factor in Supplementary Text S2.

      This begs the conclusion that the branching remains robust to changes in cmax that span 4 decades as well.

      That shows at most that the results are not extremely sensitive to c</sub>max</sub> or K. They are, nonetheless, exquisitely sensitive to m and v. This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c</sub>max</sub>. It is evident from Fig. 4 of Siljestam and Rueffler that the level of diversity is not robust to these very large changes in c</sub>max</sub>, which include, as noted above, a change of over 43 orders of magnitude.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v\=20. As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v. This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions. Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      ...the branching condition appears to be pretty robust with respect to reasonable changes in parameters.

      It is clear from Fig. 4 of Siljestam and Rueffler that the branching condition is far from sufficient for high MHC diversity.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable. I have addressed the reasons for this suggestion above. Furthermore, I have confirmed the main conclusion—the extreme sensitivity of the results of Siljestam and Rueffler to parameter values--using the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”. I now describe, in Supplementary Text S1, how anybody can verify my conclusions in this way.

      Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem. However, as I understand it, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c</sub>max</sub>. Rather, they describe the adjustment of c</sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”. Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>). In this sense there is no loss of generality, but the automatic adjustment of c</sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I have expanded the end of the Discussion in the hope of clarifying the point expressed by the title.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to suggest to the author that they provide essential details about their simulations that would justify their claims, and to communicate with Mattias Siljestam and Claus Rueffler whether claims of the lack of robustness could be confirmed.

      The models simulated were modified versions of those of Siljestam and Rueffler. Thus, only the modifications were described in my manuscript. I have added a more detailed description of how c</sub>max</sub> was set in the simulations concerned with sensitivity to parameter values. In addition, the new Supplementary Text S1, which describes confirmation of the lack of robustness using the code of Siljestam and Rueffler, should remove any doubt about this conclusion.

      Reviewer #2 (Recommendations for the authors):

      I have no further recommendations. The manuscript is well written and clear.

      Thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Since this is a full report and not just a letter to the editor, it would benefit from a bit more introduction of what the MHC actually is and what the current understanding of its evolution is. Currently, it assumes a lot of knowledge about these genes that might not be available to every reader of eLife.

      I have added some more information to the opening paragraph. I would also note that this report was submitted as a “Research Advance”, which may only need “minimal introductory material”.

      (2) Some more recent literature on MHC evolution should be added, e.g., the review by Radwan et al. 2020 TiG, a concrete case of MHC heterozygote advantage by Arora et al. 2020 MolBiolEvol, and a simulation of MHC CNV evolution by Bentkowski et al. 2019 PLOSCompBiol.

      I have cited some additional literature.

      (3) Since much of the criticism hinges on the cmax parameter, its biological meaning or role (or the lack thereof) could be discussed more.

      I am not sure what I can add to what is in the first paragraph of the Discussion.

      (4) I find it difficult to grasp how the v parameter, which is intended to define pathogen virulence, if I understand it correctly, can be used to amend the breadth of peptide presentation. Maybe this could be illustrated better.

      I have attempted to make this clearer. The parameter v actually controls the breadth of peptide detection conferred by an allele, which, if not identical to the breath of presentation, is certainly affected by it. The basis of the “virulence” interpretation seems to be that narrower detection breadth can, according to the model, only decrease peptide detection probability, which increases the damage done by pathogens.

      (5) Please check sentences in lines 279ff on peptide detection and cost of . There seem to be words missing.

      There was an extraneous word, which I have removed. Thank you for pointing this out.

    1. Reviewer #1 (Public review):

      Summary:

      This study examined whether infraslow fluctuations in noradrenaline and in heart rate are coupled and how they are affected by sleep transitions. The authors used the fluorescent NA biosensor GRAB-NE2m in the medial prefrontal cortex of mice to record extracellular NA while also recording EEG and EMG during sleep-wake episodes. They also analyzed previously published human data to reproduce relationships they found between sigma power and RR intervals in mice.

      Strengths:

      This is an impressive study with significant strengths, as it involves a rich set of data that includes not only observations of associations between heart rate and noradrenergic dynamics but also optogenetic manipulation of the locus coeruleus. Human data is presented to show parallels in the association between sigma power during sleep and phasic heart-rate bursts.

      Weaknesses:

      (1) Language could be clearer and more precise. As detailed below, in both the introduction and the discussion, the way the hypotheses and study objectives are described could use some revision to be more precise and accurate.

      1A) In the introduction on p. 4: The overarching question is framed as "could the peripheral autonomous systems be a read-out of the central LC-NE system and thus be a biomarker of memory consolidation and LC dysfunction?" This gives the impression that the LC function would be the main influence on peripheral autonomous systems. There are, of course, many influences on peripheral autonomous systems, so it would be advisable for the authors to be more specific here about what signal(s) in particular would be predicted to be sensitive markers of LC function.

      1B) In the discussion on p. 12: "In this study, we leveraged real-time measurements of mPFC NE levels and HR measurements from EMG recordings in mice to investigate the causal link between the two variables with high temporal resolution in freely moving sleeping mice, with similar inspection in humans." To test the causal link between mPFC NA levels and HR measures, the study would manipulate NA levels just in the mPFC and not elsewhere in the brain. However, in this study, the manipulation occurred in the LC, and so there would be broad cortical changes in NA levels. Thus, it could be that LC activity causes HR changes via a non-PFC pathway.

      (2) Comparisons with the control condition need further development.

      2A) While the authors did include a key YFP control condition, in the main text no direct statistical comparison between the closed-loop optogenetic stimulation (ChR2) condition and the YFP control condition was reported. (It was reported in Supplementary Figure 2c-d.) Instead, in the main text, the authors only reported that the effects of stimulation were significant in the closed-loop condition and not in the control. However, that is not the same as demonstrating that the two conditions significantly differed from each other, and it is the direct test that is important for the conclusions, so it seems important to include this result in the main presentation.

      2B) In addition, the authors should address the issue that the pre-stimulation NE was consistently significantly lower in the YFP condition than in the ChR2 condition (see Supplementary Figure 2c), which is a potential confound.

      2C) Direct comparison of the strengths of correlations shown in Figure 2h vs. Supplementary Figure 2f should be included. Currently, we see relatively weak correlations in both ChR2 and YFP conditions, and it is not clear if the relationships differ in the control. It seems they are still present in the control condition but weaker, which would contradict the apparently broad claim on p. 7 that "No such effects were present in the control condition" (it is not entirely clear whether this claim refers to all effects discussed in the figure or just a subset - this language should be clarified).

      2D) Did the YFP controls vs. ChR2 animals show any differences in the number of NA states that triggered stimulation in the closed-loop system? With ChR2 animals, stimulation changes NA, which could change future triggering. In YFP animals, nothing changes NA (other than natural fluctuations), so the dynamics of stimulation timing could diverge between groups in a way that complicates interpretation. Specifically, if ChR2 stimulation raises NA and prevents future threshold crossings, ChR2 animals may end up receiving fewer subsequent stimulations than YFP animals (or a different temporal clustering). If the number or pattern of stimulation differed in two groups, it would be important to have a yoked control where matched animals get the same stimulation pattern but not triggered by their own NA.

      (3) Some more discussion/explanation of the rationale for the closed-loop approach and how it influences how we should interpret the results could be useful. For instance, currently, it is not clear whether LC stimulation needs to be timed after an NA dip to yield the effects seen.

      (4) The section on heart rate decelerations is hard to follow. In particular, I was not sure how to interpret Figure 3f-j. For Figure 3f, what does the middle line represent? The laser onset or the max RR value after laser onset? What is the baseline that is used to correct the values to obtain amplitudes? If it is the whole period before the maximal RR value or the laser onset, wouldn't baseline values differ significantly across conditions and so potentially account for differences seen between conditions in the reported HR decelerations? Larger HR decelerations may be seen in conditions with higher HR simply as a regression to the mean phenomenon.

      (5) The findings regarding LC suppression could be further clarified.

      5A) Page 8: "observed a response in NE decline" - please be more precise. Did NE decline more or less?

      5B) It would be helpful to also show the correlation between NE and RR in the control (YFP) condition and whether there were any differences between YFP and Arch conditions (Figure 4e).

      5C) This sentence took me multiple readings to understand - it would be helpful to rewrite to make it clearer: "indicating that, while HR generally did not respond strongly to LC suppression, the variability in RR responses was dependent on NE changes to the suppression (Figure 4e)."

      5D) The two colors in Figure 4 are similar and hard to distinguish.

      5E) The correlations shown in Figure 4j seem to be driven by just two of the cases. Are the effects significant when outliers are removed?

      5D) Page 10: Were there any differences in memory performance between the Arch and YFP conditions?

      5E) Page 10: "We found a correlation between RR responses to LC suppression and sigma power, suggesting that a stronger HR reduction response is linked to higher spindle power." It should be noted in the text that the correlation was not specific to sigma (it was also seen for theta and beta, Figure 4i).

      (6) It is not clear which of the sigma power and RR interval findings do/do not exactly line up between the mice and humans. It could be helpful to have a table comparing them. For instance, was the finding in humans that pre-HRB sigma power was positively associated with slowing in heart rate after the HRB also seen in mice? Was there evidence in mice (as seen in the human sample) that sleep-dependent memory improvement was associated with pre-HRB sigma power?

      (7) Page 18: It is not clear if the sex of mice was balanced across controls and optogenetics groups.

    1. Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      (2) For some methods, more detailed information is needed.

      (3) There are grammar issues in the text that need to be fixed.

      (2) Some text in the figures is not labeled well.

    2. Author response:

      eLife Assessment

      This important study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The detailed genetic analysis of two cancer genes (BRCA1 and BRCA2) demonstrated new roles for these genes in causing the tumor microenvironment in lung cancer. Further experimental explorations of the immune-related changes may still be required. The solid findings of this study provide a foundation for further developing drugs targeting BRCA1/2 in lung cancer therapy.

      We would like to express our sincere gratitude for your thoughtful and constructive comments on our manuscript. We will carefully consider each comment from these two reviewers and will revise the manuscript accordingly. Below, we provide a point-by-point response to each comment.

      Reviewer #1 (Public review):

      Summary:

      Liao et al. performed a large-scale integrative analysis to explore the function of two cancer genes (BRCA1 and BRCA2) in lung cancer, which is one of the cancers with an extremely high mortality rate. The detailed genetic analysis demonstrated new roles of BRCA1/2 in causing the tumor microenvironment in lung cancer. In particular, the discovery of different mechanisms of BRCA1 and BRCA2 provides an essential foundation for developing drugs that target BRCA1 or BRCA2 in lung cancer therapy.

      Strengths:

      (1) This study leveraged large-scale genomic and transcriptomic datasets to investigate the prognostic implications of BRCA1/2 mutations in LUAD patients (~2,000 samples). The datasets range from genomics to single-cell RNA-seq to scTCR-seq.

      (2) In particular, the scTCR-seq offers a powerful approach for understanding T cell diversity, clonal expansion, and antigen-specific immune responses. Leveraging these data, this study found that BRCA1 mutations were associated with CD8+ Trm expansion, whereas BRCA2 mutations were linked to tumor CD4+ Trm expansion and peripheral T/NK cell cytotoxicity.

      (3) This study also performed a comprehensive analysis of genomic variation, gene expression, and clinical data from the TCGA program, which provides an independent validation of the findings from LUAD patients newly collected in this study.

      (4) This study provides an exemplary integration analysis using both computational biology and wet bench experiments. The experimental testing in the A549 cell line further supports the robustness of the computational analysis.

      (5) The findings of this study offer a comprehensive view of the molecular mechanisms underlying BRCA1 and BRCA2 mutations in LUAD. BRCA1 and BRCA2 are two well-known cancer-related genes in multiple cancers. However, their role in shaping the tumor microenvironment, particularly in lung cancer, is largely unknown.

      (6) By focusing on PD-L1-negative LUAD patients, this study demonstrated the molecular mechanisms underlying resistance to immune therapy. These new insights highlight new opportunities for personalized therapeutic strategies to BRCA-driven tumors. For example, they found histone deacetylase (HDAC) inhibitors consistently downregulated 4-R genes in A549 cells.

      (7) The deposition of raw single-cell sequencing (including scRNA-seq and scTCR-seq) data will provide an essential data resource for further discovery in this field.

      Weaknesses:

      (1) The finding of histone deacetylase (HDAC) inhibitors suggests the potential roles of epigenetic regulation in lung cancer. It would be interesting to explore epigenetic changes in LUAD patients in the future.

      Thank you for your insightful comment. We fully agree that the specific situation of epigenetic dysregulation in LUAD needs to be explored. We believe that future investigations utilizing clinical specimens and animal models to map histone acetylation patterns and DNA methylation profiles will be crucial for identifying novel biomarkers and therapeutic targets unique to LUAD.

      (2) For some methods, more detailed information is needed.

      This is a valid point. We agree that additional details regarding are necessary for clarity and reproducibility. We will expand these method details in the revised manuscript.

      (3) There are grammar issues in the text that need to be fixed.

      We apologize for our irregular use of grammar. In the revised manuscript, we will carefully check the grammar and make corrections.

      (4) Some text in the figures is not labeled well.

      We appreciate the reviewers' comments. We will add labels to the revised version of the figures.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma using multi-omics approaches. The work highlights distinct roles of BRCA1 and BRCA2 mutations in shaping immune-related processes, and is logically structured with clearly presented analyses. However, the conclusions rely primarily on descriptive computational analyses and would benefit from additional immunological validation.

      Strengths:

      By integrating public datasets with in-house data, this study examines the impact of BRCA1/2 mutations on immunotherapy in lung adenocarcinoma from multiple perspectives using multi-omics approaches. The analyses are diverse in scope, with a clear overall logic and a well-organized structure.

      Weaknesses:

      The study is largely descriptive and would benefit from additional immunological experiments or validation using in vivo models. The fact that the BRCA1 and BRCA2 samples were each derived from a single patient also limits the robustness of the conclusions.

      Thank you for this excellent suggestion. In the revised manuscript, we will supplement the additional immunological experiments or validation using in vivo models. In addition, we will elaborate on the limitations of our study in the Discussion section and provide reasonable explanations.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We sincerely thank all the reviewers for their thoughtful and constructive comments.

      In our revision, we have addressed the reviewer's specific criticisms with additional experiments and text edits as described below. We believe the constructive feedback from peer reviews helped us to significantly extend our mechanistic findings and strengthen the manuscript through revision.

      Point-by-point description of the revisions

      Reviewer #1:

      Summary:

      The study by Zatulovskiy et al. examined how cell size influences cell susceptibility to ferroptosis. The authors found a size dependence specifically for ferroptosis-inducing drug Era2, but not for other drugs. Using various human cell lines (HMEC, HT 1080, RPE 1), the authors generated populations of small and large G1 cells by FACS, CDK4/6 inhibition (palbociclib), or inducible cyclin D1 knockdown, and measured cell susceptibility to ferroptosis. Larger cells were more resistant than smaller cells. Mechanistically, larger cells showed reduced plasma membrane lipid peroxidation, higher glutathione concentrations, and changes in relevant cellular proteins levels, as analyzed using previously published data. Deleting ACSL4, which is involved in ferroptosis, partly eliminated the size dependence of ferroptosis. The work concludes that cell size is a key determinant of ferroptosis susceptibility.

      My major concerns about this work focus on whether many of the results reflect cell size or cell cycle effects, and whether the FACS-based size-scaling analyses have some misleading features to their design & presentation. If these concerns can be addressed with new experiments, then the conclusions of this paper are justified. If these concerns cannot be addressed, then the authors should more directly acknowledge the alternative hypothesis that cell cycle effects may explain many of their results.

      The experiments seem to be replicated sufficiently, and most conclusions rely on data from multiple cell lines. My minor comments focus on needs to provide statistics and method details, and on suggestions on how to improve text clarity, but these edits are easily done and don't require new experiments. Overall, this is an interesting study, and it should be published once the concerns below are addressed.

      Major comments:

      • In experiments reported in Fig 1 and 2A, the authors sort small and large cells in G1, plate them, and later start the drug treatments & cell monitoring. Are these cells actively cycling (progressing in the cell cycle), and how fast? The large cells are likely to enter S phase earlier than the small cells, so by the time that the authors start their drug treatments, they may be comparing cells in different cell cycle stages, which could influence drug sensitivity more than cell size (as the authors also suggest later in Fig 2). This needs to be controlled for. Furthermore, even if the cells remain in G1 after sorting until the drug treatments are started, the authors should address the fact that the drugs are present for a long time, thus targeting the cells in various cell cycle stages.

      We agree with the reviewer that the cell cycle stage could affect ferroptosis susceptibility and could be a confounding effect in asynchronous cells. One of us (Dixon) reported the cell cycle effects on ferroptosis previously, and we observe them in this manuscript too (Fig. 2B,C,E). We now state this more clearly both in the Results and in the Discussion sections, where we write:

      Line 159: "We note that non-arrested cells had a lower susceptibility to Era2-induced ferroptosis compared to cells that were arrested in G1 for 2-3 days, despite being smaller in size. This is likely due to the difference in the fraction of cells in different cell cycle phases between arrested and non-arrested conditions since cells in S/G2/M phases are known to be more resistant to ferroptosis than cells in G0/G1 phases (Rodencal et al, 2024; Kuganesan et al, 2023)"

      Line 533: "Cells in G1 phase of the cell cycle were reported to be more susceptible to ferroptosis (Rodencal et al, 2024; Kuganesan et al, 2023), which suggested that ferroptosis inducers could be used in combination with cancer drugs, like the CDK4/6 inhibitor palbociclib, that arrest cells in G1 phase of the cell cycle (Herrera-Abreu et al, 2024). However, while CDK4/6 inhibitors arrest cells in G1, they do not inhibit cell growth, such that the longer they are arrested, the larger the cells grow (Lanz et al, 2022; Crozier et al, 2023; Manohar et al, 2023). This results in a complex, non-monotonic ferroptotic response dynamics in cells treated with CDK4/6 inhibitors (Fig. 2B,E). Just following CDK4/6 inhibitor treatment, as more and more cells are arrested in G1 phase, cells become more sensitive to both RSL3- and erastin-induced ferroptosis (Kuganesan et al, 2023; Rodencal et al, 2024). However, the longer the cells are arrested, the larger they become, which further promotes their susceptibility to RSL3 (Fig. S1B) but reduces their susceptibility to Era2-induced ferroptosis (Fig. 2B). The fact that the cell cycle arrest and cell size increase have opposing effects on Era2-induced ferroptosis susceptibility could explain why different studies reported seemingly contradictory results, where sometimes an increased and sometimes a decreased or unchanged sensitivity to system xc- inhibitors was observed depending on the cell type, duration and type of cell cycle arrest (Lee et al, 2024; Kuganesan et al, 2023; Rodencal et al, 2024). Such complex interplay between the cell cycle and cell size effects on ferroptosis suggests that combination therapies utilizing CDK4/6 inhibitors and ferroptosis inducers would have to carefully choose a dosage schedule.""

      Given the potentially confounding effects of the cell cycle in cycling cells sorted by size, we performed an additional experiment, in which RPE-1 cells were pre-treated with the CDK4/6 inhibitor palbociclib to synchronize them in G1 phase prior to treatment. These cells were then continuously exposed to palbociclib during the Era2 treatment (Fig. 2C-E). RPE-1 cells pre-treated with palbociclib for 2 and 4 days had the same cell cycle distribution with 94% of cells being arrested in G1, but with different sizes. Cells treated with palbociclib for 4 days were significantly larger and more resistant to Era2 as can be seen in the Figure 2C-E.

      Additionally, in the experiment shown in Fig. 5E,F, where we FACS-sorted WT and ACSL4 KO HMEC cells by cell size, and then measured Era2 susceptibility, we pre-treated the cells with palbociclib for 24 h to synchronize them in G1 prior to the sorting. We then cultured the cells in the presence of palbociclib during the Era2 treatment to avoid the cell cycle effects observed in Fig. 2. In this case, we still observe that larger cells are more resistant to Era2, consistent with our conclusion that cell size protects against Era2-induced ferroptosis.

      Reviewer #1: "Can the G1 arrest-driven changes in drug susceptibility (Fig 2 C-D) be attributed to cell size? Can the authors rescue the palbociclib treatment with rapamycin or other growth inhibitors that allow size to remain small during G1 arrest?"

      We have attempted to perform these experiments, but when we co-treated the cells with palbociclib and mTORC inhibitors, but observed variable results, which are likely due to the fact that prolonged mTORC inhibition itself rewires cellular metabolism and reduces cell susceptibility to ferroptosis, as one of us (Dixon) found previously (Armenta et al. (2022), Ferroptosis inhibition by lysosome-dependent catabolism of extracellular protein. Cell Chemical Biology 29: 1588-1600.e7). Our results were consistent with this previous report and is now included in a new supporting figure panel (Fig. S3C).

      Thus, upon palbociclib+rapamycin co-treatment there seems to be a competition between cell-size-mediated and metabolism-mediated effects of mTORC inhibition on ferroptosis, which leads to variable outcomes.

      Reviewer #1: "In Fig 2E-F, is the cell cycle distribution of the samples influenced by CCND1 shRNA induction? Are the drug sensitivity effects due to cell size or cell cycle changes?"

      The CCND1 manipulation model is extensively characterized in our recent work cited in this manuscript (You et al. (2025), Cell size-dependent mRNA transcription drives proteome remodeling. 2025.10.30.685141 doi:10.1101/2025.10.30.685141). Indeed, CCND1 shRNA cells have a slightly elongated G1 phase due to a ~30% reduction in Cyclin D1 concentration: the G1 fraction changes from ~70% in wild-type to ~80% in CCND1 shRNA cells, which could potentially affect the ferroptosis susceptibility, but the additional results obtained on synchronized RPE-1 cells, described above (Fig. 2C-E), support the conclusion that the primary effect on Era2 sensitivity is due to cell size.

      Reviewer #1: "Can the authors address the meaningfulness of the FACS-based size-scaling results in cases where cell-to-cell variability is very large? For example, in Fig 4D&G, the results are so variable even in identically sized cells that the importance of the size-scaling pattern seems questionable."

      We do observe variability in fluorescent probe-based measurements of GSH and lipid oxidation, which could be due to biological (natural cell heterogeneity) and/or technical (low sensitivity of the probes) reasons. However, when we look at binned data and compare the mean values {plus minus} s.e.m. for each bin, we observe a robust and reproducible trend (black line with dark-grey shaded area), even though the SD is quite broad (lighter shaded area). We believe such trends are meaningful when describing cell death in probabilistic terms as we do. I.e., the GSH measurement might not be precise enough to predict cell death for a given individual cell, but the statistical trend is clear and these measurements help predict cell death probabilities for cells of different sizes.

      Reviewer #1: "In Figs 4B-D, the cell size axis seems to have over 4-fold size variability, but when the authors show the analysis of this data (Figs 4E-G) the variability is only 2-fold. What was excluded and on what basis?"

      To address this point, we have now clarified in the Methods section how the data were processed and what data points we excluded from this analysis:

      Line 671: "For all binned flow cytometry data plots, the cells below the 2nd and above the 98th cell size percentiles were excluded to remove the extreme outliers. Then, the remaining data were binned by size and plotted as background-corrected average fluorescence intensity for each bin against the bin's average cell size. Bins with fewer than 200 cells were excluded from the analysis to reduce noise."

      Typically, such pre-processing reduces the size range, mostly from the large-cell end, because of the long right tail of the size distribution containing a few very large cells.

      Reviewer #1: "Based on the methods section & figure legends of Fig 4B-I, the RPE cells were not pre-sorted to include only G1 cells, nor did the assay account for cell cycle differences. How can these data be used to explain results from earlier figures, where analyses were exclusively focused on size differences in G1?"

      This is a valid point: Cells in the GSH measurement experiment were not gated by Hoechst signal for G1 phase because the channel normally used for Hoechst staining was in this case occupied by the MCB probe. However, given the data in Fig. 4A,B showing that the GSH production machinery is superscaling when measured specifically in G1-phase cells, we believe the flow cytometry data in Fig. 4C-J showing GSH concentration increasing with cell size across the whole cell cycle is very likely true for G1 cells as well.

      Reviewer #1: "Minor comments:

      I recommend clarifying in the early introduction that all size changes discussed are in the absence of DNA content increase."

      We have now clarified this in the introduction (Line 41 and Line 81).

      Reviewer #1: "The introduction seems to cite primary research and review paper in the same sentences, which is a bit misleading as the reviews don't seem to add new evidence."

      We have removed review citations where they did not provide additional context.

      Reviewer #1: "*OPTIONAL* In the second introduction paragraph, consider the classification/description of the three different mechanisms. Currently, it seems that these mechanisms are not independent of each other, and the details provided about each mechanism are inconsistent."

      We have now modified this paragraph to make the description more consistent.

      Reviewer #1: "Please provide statistics for the IC50 values reported based on Fig 1C. Were small and large cells statistically different? Are the IC50 values reported as +/- standard deviation or some other metric?"

      This has now been clarified in the text as follows:

      "For example, at the 72 h time point, the Era2 IC50 was 28 {plus minus} 11 µM (mean {plus minus} SD) for large cells versus 2.0 {plus minus} 1.4 µM for small cells (Student's t-test: p = 0.039) (Fig. 1C)."

      Reviewer #1: "*OPTIONAL* Providing more insight into why Era2 and RSL3 treatments yield more opposite responses would be of great interest to the field."

      We agree this is an important point that should be discussed in more detail. In the field of ferroptosis, context-dependent (i.e., cell type-specific) effects are common and multiple groups including our own (Dixon) have published extensively on genes and mechanisms that can lead to differences between erastin2 and RSL3 sensitivity. For example, there are studies showing that the mTOR pathway or the p53 pathway can either prevent or promote ferroptosis, depending on the cell type and/or other currently unknown variables. To address more specifically the differences between Era2 and RSL3 in the context our observed cell-size-dependent response, we have now added more data and discussion. In the Results section we added panel 4B and the following text:

      Line 359: "While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2."

      Additionally, in the Discussion we added the following:

      Line 578: "We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc-, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in size-dependent responses to RSL3 and Era2."

      Reviewer #1: "Is the BODIPY-C11 labeling specific to plasma membrane, as suggested by the writing of the authors, or do the results shown integrate signals over all cell membranes?"

      We thank the reviewer for pointing this out. BODIPY-C11 581/591 stains many membranes in the cell, not just the plasma membrane. We have changed the wording in the manuscript to reflect this.

      Reviewer #1: "How exactly is gating done for the flow cytometry samples? Especially when analyzing size-scaling, the results are likely to be sensitive to outliers, such as those seen in Fig 4C (a subpopulation of very low CFSE stained cells). Can the authors clarify their methods and/or display supplementary figures with gating examples?"

      We have now specified our gating strategy in the Methods section (Line 663) and added a corresponding Supplementary Figure S5:

      "Single cells were gated based on FSC-A vs SSC-A, then FSC-A vs FSC-H, then SSC-A vs SSC-W plots. From this population of single cells, G1 cells were selected using Hoechst-A vs FSC-A plot for subsequent scaling analysis"

      Reviewer #1: "In Fig 4, total protein staining was used as a control, whereas Fig 5B b-actin was used as a control. Why did the authors rely on different controls approaches for essentially the same measurements? Are these controls comparable?"

      In our flow cytometry experiments, we consistently use live-cell total protein stain (CFSE) for live cells, and anti-Tubulin immunofluorescent staining for fixed cells, both of which scale in proportion to cell volume and act as a read-out for total cellular protein content (Lanz and Zatulovskiy et al., Mol Cell 2022; Berenson et al. MBoC 2019), which we use to calculate concentrations of other cellular components (analogous to loading controls). In Fig. 5B, beta-Actin is used as a reference - a protein whose concentration does not change with cell size, as opposed to ACSL4 whose concentration decreases with cell size. In this plot, both ACSL4 and beta-Actin amounts were normalized to alpha-Tubulin, which is analogous to a concentration calculation using loading control. This is now explained in more detail in the Figure legend.

      Reviewer #2:

      "Zatulovskiy et al. demonstrate that cell size modulates susceptibility to ferroptosis, a form of iron-dependent cell death driven by lipid peroxidation. Using human cell lines (HMEC, HT-1080, RPE-1), the authors examined cell size through FACS sorting, CDK4/6 inhibition and inducible cyclin D1 knockdown. They found that larger cells are more resistant to ferroptosis induced by system xc⁻ inhibition (erastin2), but more sensitive to GPX4 inhibition (RSL3), highlighting pathway-specific size dependencies.

      Mechanistically, larger cells exhibited:

      • Higher glutathione levels, supporting lipid peroxide detoxification
      • Increased ferritin expression, promoting iron sequestration
      • Lower ACSL4 levels, reducing incorporation of peroxidation-prone lipids These findings were supported by high-throughput microscopy, flow cytometry (BODIPY-C11 lipid peroxidation assays), and proteomic analyses. The study concludes that cell size influences proteome composition and metabolic capacity, thereby shaping cell death decisions, an insight with implications for aging, cancer, and ferroptosis-based therapies.

      Major Comments

      1. Direct evaluation of SLC7A11 abundance and function is needed The opposite size-dependent effects of erastin2 and RSL3 strongly suggest a role for SLC7A11/system xc⁻ activity in size-dependent ferroptosis resistance. However, SLC7A11 levels were not quantified due to insufficient peptide detection in the proteomic data.

      o Direct measurement of SLC7A11 protein levels (immunoblotting or flow cytometry) in small vs large cells would test whether its expression scales with size.

      o Functional perturbation (siRNA/CRISPR knockdown) followed by erastin2 treatment would provide mechanistic validation.

      o Use of additional SLC7A11 inhibitors (e.g., sulfasalazine, sorafenib) could further test whether the size resistance phenotype is xc⁻-specific."

      We agree that the difference in size-dependent responses to RSL3 and Era2 is an important point that needs further investigation and discussion, as other reviewers also pointed out. To address more specifically the differences between Era2 and RSL3 in the context of cell-size-dependent response, we have now added more data and discussion. In the Results section we added panel 4B measuring SLC7A11 and Cathepsin B scaling with cell size and the following text:

      Line 359: "While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2."

      Additionally, in the Discussion we added the following:

      Line 578: "We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc-, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in size-dependent responses to RSL3 and Era2."

      Reviewer #2: "2. Functional tests of ferritin contribution to resistance are needed

      Although elevated ferritin (FTH1/FTL) levels in larger cells represent a strong correlational signal, definitive experimental evidence establishing causality is currently lacking.

      o Measuring the labile iron pool directly in size-stratified populations would strengthen the link.

      o Knockdown of FTH1 or FTL could reveal whether ferritin upregulation is necessary for the resistance of large cells to ferroptosis."

      We thank the reviewer for raising this point. We have now completed additional experiments, as suggested by the reviewer, and found that iron chelation is unlikely to mediate the size-dependent response to Era2. We have modified the manuscript accordingly and added the following data and discussion to address this point:

      Line 296: "The observed increase in ferritin concentration with cell size could therefore lead to additional Fe2+ ion chelation, which in turn would protect large cells from iron-dependent lipid peroxidation and ferroptosis. However, when we measured the concentration of labile intracellular Fe2+ using a fluorescent probe FerroOrange (Hirayama et al, 2020), we did not observe any size-dependent decrease in labile iron concentration (Fig. S2A). Previous work suggests a link between increased sequestration of ferrous iron in lysosomes and resistance to ferroptosis. It was reported that senescent cells, which are also large (Fig. S3A,B), gain resistance to ferroptosis through lysosomal alkalinization and sequestration of ferrous iron in lysosomes (Loo et al, 2025). We therefore tested whether the superscaling of lysosomes observed in large cells (Lanz et al, 2022; You et al, 2025) promotes Era2 resistance through lysosomal iron sequestration. To do this, we stained the cells with the lysosomal iron detection probe Lyso-FerroRed (Saimoto et al, 2025) and measured its scaling using flow cytometry (Fig. S2B). We observed that the amount of Lyso-FerroRed, and therefore, the amount of lysosomal iron, scaled in direct proportion to cell size, just like the total cellular protein content (Fig. S2B). These results indicate that iron chelation by ferritin and its sequestration in lysosomes are unlikely to play a crucial role in size-dependent decrease in Era2 sensitivity."

      Reviewer #2: "3. Relevance to senescence should be addressed experimentally or explicitly discussed

      Given that senescent cells are enlarged and accumulate in aged and tumour tissues, testing senescent models for erastin2 resistance would greatly strengthen the physiological significance."

      We agree that an increase in cell size contributing to the resistance of senescent cells to ferroptosis is intriguing. We have now added a Supplementary Figure S3 and discussion of this point in the manuscript as follows:

      Discussion line 552: "our data suggest that previously reported resistance of senescent cells to ferroptosis can at least partially be due to the increased cell size, a well-established hallmark of senescence."

      Reviewer #2: "Minor Comments

      1. Mechanistic nuance regarding RSL3 should be included RSL3 has been reported to induce ferroptosis independently of GPX4 (PMID: 37087975, PMID: 40392234) and may target other selenoproteins such as TXNRD1. This nuance would help explain the observed divergence between RSL3 and erastin2 sensitivity across sizes."

      We have now added this in the Discussion as suggested by the reviewer (line 583):

      "In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in size-dependent responses to RSL3 and Era2."

      Reviewer #2: "2. Dynamic range of BODIPY-C11 assays needs commentary

      Despite high erastin2 doses, the oxidized BODIPY signal remains close to DMSO levels. The authors should comment on whether this reflects high GSH buffering capacity, probe limitations, or other factors."

      We believe there are both technical (narrow dynamic range of the probe) and biological reasons for the relatively small (2-3 fold) difference in Oxidized-to-Non-oxidized BODIPY-C11 ratios between DMSO and Era2-treated cells. The biological reason is that the cells continue producing GSH until they fully deplete the cystine pool, which happens ~20-24 h after Era2 addition. Once the cystine pool is depleted, the cells very rapidly deplete GSH and initiate cell death. Therefore, there is only a short time window where cells are strongly depleted of GSH before dying. We see this small fraction of cells with a high Oxidized BODIPY-C11 signal in our flow cytometry experiments and in previous microscopy analysis of BODIPY-C11 (Murray et al., Protocol for detection of ferroptosis in cultured cells. STAR Protoc. 2023), but at our chosen time point (20h Era2) most cells are not as bright because we aimed to analyze the population before the onset of widespread cell death.

      Reviewer #2: "3. Western blot for shCycD1 depletion should be included

      CycD1 depletion usually causes cells to stop proliferating, which is not the case here. Therefore, depletion must be partial. The level of depletion should be shown by immunblotting."

      The CCND1 manipulation model is extensively characterized in our recent work cited in this manuscript (You et al. (2025), Cell size-dependent mRNA transcription drives proteome remodeling. 2025.10.30.685141 doi:10.1101/2025.10.30.685141). CCND1 shRNA cells do not fully arrest in G0/G1 because the concentration of Cyclin D1 protein in this system is only partially decreased, as the reviewer noted. As a result, the cells have a slightly elongated G1 phase due to a ~30% reduction in Cyclin D1 concentration, but continue to proliferate. The G1 fraction changes from ~70% in wild-type to ~80% in CCND1 shRNA cells.

      Reviewer #3:

      "In this manuscript, Zatulovskiy and colleagues elaborate on their previous work describing cell size-dependent changes in the proteome by investigating whether these changes can be correlated in differences in cell physiology. Using a cleverly-designed high throughput screen, they searched for compounds that differently-sized cells display differential sensitivity towards. Their primary hit, Era2, is involved in the ferroptosis pathway and serves as the starting point for a detailed study of how excess cell size protects cells from ferroptosis-induced cell death via: 1) lower concentrations of ACSL4 (which produces peroxidation-prone PUFAs), 2) increased ferritin concentrations, and 3) increased GSH concentrations.

      Overall, the experiments in this manuscript are well-designed and interpreted. It is an extremely well-written manuscript with a clear trajectory of logic. I have only a few major concerns that should be addressed before publication:"

      We thank Reviewer #3 for their careful reading of the manuscript and for the clear summary of our study and its central findings. We appreciate their positive assessment of the experimental design, interpretation, and overall clarity of the writing and logical flow. We are also grateful for their constructive feedback and take their major concerns seriously; we have addressed each point in detail below.

      Reviewer #3: "Major concerns:

      1) In Figure 3E, the authors gate their flow cytometry data using SYTOX so that they are only analyzing live cells. Based on their gating scheme, it seems like there are really a lot of dead cells. Presumably the cells that died were the most sensitive to Era2, so it seems an oversight to discard these cells. Of course, it is not appropriate to analyze dead cells, but this could potentially be solved by using a shorter treatment duration than 24 hours wherein fewer cells die."

      This is a good point. To address it, we have now replaced this panel with a time point where most cells are still alive (20 h, 0.2 µM Era2), as suggested by the reviewer (Fig. 3E,F). This did not change the conclusion that BODIPY-C11 oxidation decreases with cell size.

      Reviewer #3: "2) In Figure 5, are the small, medium, and large bins for ACSL4 KO cells the same as for WT cells? If the ACSL4 KO cells are just bigger to begin with, this could explain why the "small" bin has greater cell survival than the WT small bin. Moreover, is the overlap between the three bins the same in the WT and KO cells?"

      This is an important point that we now address with data shown in Fig. S4B. We have now added a Supplementary Figure S4B to show the relative size of small, medium, and large WT and ACSL4 KO HMEC cells. As seen from this graph, the ACSL4 KO cells are not bigger than WT cells. Importantly, the fold-range between the small and large FACS-sorted cells is similar (~1.9 to 2-fold).

      Reviewer #3: "3) Loo, et al. Nat Comms 2025 similarly found that senescent cells (which are enlarged) are resistant to ferroptosis using the same inhibitor as the authors. In contrast to the authors, they show that this is due to lysosomal alkalinization and sequestration of ferrous iron in lysosomes. Given that Lanz et al. 2022 found that lysosomal components super-scale with cell size, it seems like this would be an important hypothesis to address. Free lysosomal iron can be easily measured with the LysoRhoNox stain. Loo et al. was able to restore ferroptosis sensitivity in senescent cells using the V-ATPase activator EN6, so it would be important for the authors to address whether this (or similar) treatment would have the same effect in enlarged cells."

      This is an excellent point. We have now performed this experiment and added it to the manuscript, as suggested by the reviewer. Based on the Lyso-FerroRed staining (another brand name for the LysoRhoNox probe), we do not see an increase in lysosomal iron sequestration in large cells (Fig. S2B) - see the graph and the corresponding discussion below:

      Line 301: "Previous work suggests a link between increased sequestration of ferrous iron in lysosomes and resistance to ferroptosis. It was reported that senescent cells, which are also large (Fig. S3A,B), gain resistance to ferroptosis through lysosomal alkalinization and sequestration of ferrous iron in lysosomes (Loo et al, 2025). We therefore tested whether the superscaling of lysosomes observed in large cells (Lanz et al, 2022; You et al, 2025) promotes Era2 resistance through lysosomal iron sequestration. To do this, we stained the cells with the lysosomal iron detection probe Lyso-FerroRed (Saimoto et al, 2025) and measured its scaling using flow cytometry (Fig. S2B). We observed that the amount of Lyso-FerroRed, and therefore, the amount of lysosomal iron, scaled in direct proportion to cell size, just like the total cellular protein content (Fig. S2B). These results indicate that iron chelation by ferritin and its sequestration in lysosomes are unlikely to play a crucial role in size-dependent decrease in Era2 sensitivity."

      Reviewer #3: "Minor concerns:

      1) It would be helpful if this manuscript were re-submitted with line numbers to more easily reference the text."

      We have added line numbers for convenience.

      Reviewer #3: "2) In Figure 5A and other figures that reproduce data from Lanz et al. 2022, it would be helpful to have a summary curve for the overall abundance of each protein rather than only the individual peptide curves. These plots (particularly Figure 5A) are difficult to interpret since some peptides were presumably more abundant / measured with higher confidence than others."

      We have added the average ACSL4 protein slope line to Fig. 5A.

      Reviewer #3: "3) In Figure 5, the authors show the validation of the ACSL4 KO HT-1080 cell line but not HMEC, even though both are used in this figure. It would be useful to show both. Additionally, the authors switch back and forth between the two cell lines for this figure, and it is not clear why."

      We have added the HMEC ACSL4 KO validation Western blot in Fig. S4A (see below).

      For the BODIPY oxidation experiment (Fig. 5D), we used HT-1080 instead of HMEC because HT1080 cells are sensitive to lower concentrations of Era2, and therefore, we could better optimize the Era2 concentrations and treatment durations to measure BODIPY oxidation at the time point when most cells are still alive but demonstrate a pronounced oxidized BODIPY signal.

      Reviewer #3: "4) In Figure 5B, the authors use antibody-based staining of ACSL4 and flow cytometry to correlate a loss of ACSL4 expression with increased cell size, validating the proteomics data in Figure 5A. This does not seem like a good way to do this. Firstly, fixing cells with formaldehyde alters their size (is this proportional across differently sized cells? It's impossible to know), which makes it inappropriate to use SSC as a proxy for size in this particular situation. Secondly, the normalization scheme here doesn't make sense. If actin was used as a reference protein, why was tubulin used to normalize ACSL4 abundance? Overall, this seems like a very round-about experiment that could have just been addressed by doing a simple western blot with the four size bins sorted from live cells (as it was in the proteomics). If the issue is that ACSL4 is not detectable by western in the HMEC cells, another solution would be plating the live, sorted bins on coverslips and measuring by IF (or using the HT-1080 cells)."

      We prefer IF flow cytometry to Western blotting for protein scaling analysis because it is more quantitative and provides cell size and protein content information for each individual cell. While in principle, different-sized cells might change their size differently during fixation, the cells that were larger or smaller prior to the fixation remain larger or smaller after fixation as well. Therefore, the SSC measurement after fixation still provides reliable information on size ranking, even if SSC does not perfectly linearly scale with cell volume. We do not use the SSC information to calculate protein concentrations here. Instead, we divide the amount of our protein of interest in the cell by the amount of constitutively-expressed Tubulin, which acts as an analogue of a loading control in this experiment. In Fig. 5B, both ACSL4 and Actin were normalized to Tubulin to estimate their concentrations. Actin is used just as a reference protein to show how the concentration of a perfectly scaling protein remains constant across cell size, as opposed to the sub-scaling ACSL4. Tubulin in this case was used as a proxy for total cellular protein content, which scales linearly in proportion to cell volume. This approach for determining the scaling behaviors of different proteins was previously validated in Lanz et al., Mol Cell 2022.

      Reviewer #3: "5) In Figure 5E/5F, the authors pre-arrest the cells in G1 with palbociclib before size-sorting them. The pre-arrest is not done in other experiments using this cell line for size-sorting, so it would be important for the authors to comment on why this was done for this experiment but not others."

      As we found in Fig. 2B-E, the cell cycle has confounding effects on size-dependent ferroptosis susceptibility measurements (as discussed in detail in our response to the first major point of Reviewer #1 above). Briefly, to avoid these confounding effects and isolate the effects of cell size from the effects of the cell cycle, we pre-synchronized the cells with 24 h treatment with palbociclib in Fig. 5E,F. This is now better clarified in the text, as follows:

      Line 456: "In this experiment, we synchronized cells in G1 phase using palbociclib prior to cell sorting and also incubated the sorted cells in the presence of palbociclib during Era2 treatment to isolate cell size effects from the previously observed confounding effects of the cell cycle on ferroptosis (Fig. 2B,E)."

      Reviewer #3: "6) Conceptually, it is difficult for me to understand why large cell size sensitizes cells to GPX4 inhibition but confers resistance to Era2 treatment. Particularly given the pathway described in Figure 3A, I am having trouble understanding why these would convey such opposing phenotypes. Shouldn't the extra ferritin in the bigger cells also help them cope with GPX4 inhibition if, as the authors state in the discussion, the increased sensitivity to the GPX4 inhibitor is reported to be mediated by (among other things) iron accumulation? A deeper discussion of this seeming-incongruity would be helpful for contextualizing the broader role of cell size in determining ferroptosis sensitivity."

      We agree this is an important point, which was also raised by the other reviewers. As such, we note that context-dependent (i.e., cell type-specific) effects are common in the ferroptosis field, and multiple groups including our own (Dixon) have published extensively on genes and mechanisms that can lead to differences between erastin2 and RSL3. For example, there are studies showing that the mTOR pathway or the p53 pathway can both prevent and promote ferroptosis, depending on the cell type or some other hidden variable.

      To better address the differences between Era2 and RSL3 in the context of the cell-size-dependent response, we have now added more data and discussion. In the Results section we added panel 4B and the following text:

      Line 359: "While the upregulation of GSH biosynthesis may promote the resistance of larger cells to ferroptosis, such an upregulation alone cannot explain why larger cells become more resistant to ferroptosis induced by the cystine import inhibitor Era2, but not, for example, by the GPX4 inhibitor RSL3 (Chan et al, 2025) (Figs. 2B, S1B). We found previously that upon mTORC1 inhibition cells can evade cystine deprivation-induced ferroptosis by uptake and catabolism of cysteine-rich extracellular proteins, mostly albumin (Armenta et al, 2022) (Fig. S3C). This process involves albumin degradation in lysosomes, predominantly by cathepsin B (CatB), and subsequent export of cystine from lysosomes to fuel the synthesis of glutathione. Large cells undergo proteome rearrangements similar to those occurring upon mTORC1 inhibition (Zatulovskiy et al, 2022). This suggests that large cells may upregulate CatB expression to bypass the Era2-induced cystine import inhibition via system xc-. To test this hypothesis, we used flow cytometry to measure how the expression of cathepsin B and the system xc- cystine/glutamate transporter SLC7A11 (xCT) scales with cell size (Fig. 4B). We found that SLC7A11 concentration modestly decreases, while CatB concentration significantly increases with cell size (Fig. 4B). This shift in the ratio between SLC7A11 and CatB supports the hypothesis that larger cells may rely less on cystine import via system xc- and thus become more resistant to system xc- inhibition by Era2."

      Figure 4. (B) Flow-cytometry-based measurement of cystine/glutamate transporter SLC7A11 (xCT) and cathepsin B (CatB) concentrations in G1-phase RPE-1 cells demonstrates a modest decrease in SLC7A11 and a significant increase in Cathepsin B concentrations with cell size. To calculate the concentrations of SLC7A11 and CatB, their amounts were measured with flow cytometry using immunofluorescence and normalized to the amounts of α-Tubulin. The data were binned by cell size, and mean values for each bin were plotted against normalized cell size (solid blue line for SLC7A11 and red line for CatB). Shaded areas denote the s.e.m. for each bin.

      Additionally, in the Discussion we added the following:

      Line 578: "We show that large cells may become resistant specifically to Era2 but not RSL3 through the upregulation of lysosomal function, particularly cathepsin B expression, which enables the uptake and catabolism of cysteine-rich extracellular proteins. A size-dependent shift in the ratio between SLC7A11 and cathepsin B makes large cells less dependent on cystine import via system xc-, and thus, more resistant to Era2. In addition to this, it was reported that RSL3 can induce ferroptosis independently of GPX4 and may target other selenoproteins (DeAngelo et al, 2025; Cheff et al, 2023), which could also contribute to the difference in size-dependent responses to RSL3 and Era2."

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this manuscript, Zatulovskiy and colleagues elaborate on their previous work describing cell size-dependent changes in the proteome by investigating whether these changes can be correlated in differences in cell physiology. Using a cleverly-designed high throughput screen, they searched for compounds that differently-sized cells display differential sensitivity towards. Their primary hit, Era2, is involved in the ferroptosis pathway and serves as the starting point for a detailed study of how excess cell size protects cells from ferroptosis-induced cell death via: 1) lower concentrations of ACSL4 (which produces peroxidation-prone PUFAs), 2) increased ferritin concentrations, and 3) increased GSH concentrations.

      Overall, the experiments in this manuscript are well-designed and interpreted. It is an extremely well-written manuscript with a clear trajectory of logic. I have only a few major concerns that should be addressed before publication:

      Major concerns:

      1) In Figure 3E, the authors gate their flow cytometry data using SYTOX so that they are only analyzing live cells. Based on their gating scheme, it seems like there are really a lot of dead cells. Presumably the cells that died were the most sensitive to Era2, so it seems an oversight to discard these cells. Of course, it is not appropriate to analyze dead cells, but this could potentially be solved by using a shorter treatment duration than 24 hours wherein fewer cells die.

      2) In Figure 5, are the small, medium, and large bins for ACSL4 KO cells the same as for WT cells? If the ACSL4 KO cells are just bigger to begin with, this could explain why the "small" bin has greater cell survival than the WT small bin. Moreover, is the overlap between the three bins the same in the WT and KO cells?

      3) Loo, et al. Nat Comms 2025 similarly found that senescent cells (which are enlarged) are resistant to ferroptosis using the same inhibitor as the authors. In contrast to the authors, they show that this is due to lysosomal alkalinization and sequestration of ferrous iron in lysosomes. Given that Lanz et al. 2022 found that lysosomal components super-scale with cell size, it seems like this would be an important hypothesis to address. Free lysosomal iron can be easily measured with the LysoRhoNox stain. Loo et al. was able to restore ferroptosis sensitivity in senescent cells using the V-ATPase activator EN6, so it would be important for the authors to address whether this (or similar) treatment would have the same effect in enlarged cells.

      Minor concerns:

      1) It would be helpful if this manuscript were re-submitted with line numbers to more easily reference the text.

      2) In Figure 5A and other figures that reproduce data from Lanz et al. 2022, it would be helpful to have a summary curve for the overall abundance of each protein rather than only the individual peptide curves. These plots (particularly Figure 5A) are difficult to interpret since some peptides were presumably more abundant / measured with higher confidence than others.

      3) In Figure 5, the authors show the validation of the ACSL4 KO HT-1080 cell line but not HMEC, even though both are used in this figure. It would be useful to show both. Additionally, the authors switch back and forth between the two cell lines for this figure, and it is not clear why.

      4) In Figure 5B, the authors use antibody-based staining of ACSL4 and flow cytometry to correlate a loss of ACSL4 expression with increased cell size, validating the proteomics data in Figure 5A. This does not seem like a good way to do this. Firstly, fixing cells with formaldehyde alters their size (is this proportional across differently sized cells? It's impossible to know), which makes it inappropriate to use SSC as a proxy for size in this particular situation. Secondly, the normalization scheme here doesn't make sense. If actin was used as a reference protein, why was tubulin used to normalize ACSL4 abundance? Overall, this seems like a very round-about experiment that could have just been addressed by doing a simple western blot with the four size bins sorted from live cells (as it was in the proteomics). If the issue is that ACSL4 is not detectable by western in the HMEC cells, another solution would be plating the live, sorted bins on coverslips and measuring by IF (or using the HT-1080 cells).

      5) In Figure 5E/5F, the authors pre-arrest the cells in G1 with palbociclib before size-sorting them. The pre-arrest is not done in other experiments using this cell line for size-sorting, so it would be important for the authors to comment on why this was done for this experiment but not others.

      6) Conceptually, it is difficult for me to understand why large cell size sensitizes cells to GPX4 inhibition but confers resistance to Era2 treatment. Particularly given the pathway described in Figure 3A, I am having trouble understanding why these would convey such opposing phenotypes. Shouldn't the extra ferritin in the bigger cells also help them cope with GPX4 inhibition if, as the authors state in the discussion, the increased sensitivity to the GPX4 inhibitor is reported to be mediated by (among other things) iron accumulation? A deeper discussion of this seeming-incongruity would be helpful for contextualizing the broader role of cell size in determining ferroptosis sensitivity.

      Significance

      Strengths: well-designed study, clear results that are broadly in line with other recent studies on the same topic.

      Limitations: opposing results between Era2 and GPX4i (which should in theory cause similar phenotypes) call into question how broadly cell size impacts ferroptosis susceptibility / indicate that this may occur in a limited context

      Advances: first study to show that enlarged cells might have higher antioxidant capacity

      Audiences: cell size researchers, ferroptosis/cell death researchers

      My expertise: cell cycle checkpoints, cell size control, proteomics

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The study by Zatulovskiy et al. examined how cell size influences cell susceptibility to ferroptosis. The authors found a size dependence specifically for ferroptosis-inducing drug Era2, but not for other drugs. Using various human cell lines (HMEC, HT 1080, RPE 1), the authors generated populations of small and large G1 cells by FACS, CDK4/6 inhibition (palbociclib), or inducible cyclin D1 knockdown, and measured cell susceptibility to ferroptosis. Larger cells were more resistant than smaller cells. Mechanistically, larger cells showed reduced plasma membrane lipid peroxidation, higher glutathione concentrations, and changes in relevant cellular proteins levels, as analyzed using previously published data. Deleting ACSL4, which is involved in ferroptosis, partly eliminated the size dependence of ferroptosis. The work concludes that cell size is a key determinant of ferroptosis susceptibility.

      My major concerns about this work focus on whether many of the results reflect cell size or cell cycle effects, and whether the FACS-based size-scaling analyses have some misleading features to their design & presentation. If these concerns can be addressed with new experiments, then the conclusions of this paper are justified. If these concerns cannot be addressed, then the authors should more directly acknowledge the alternative hypothesis that cell cycle effects may explain many of their results.

      The experiments seem to be replicated sufficiently, and most conclusions rely on data from multiple cell lines. My minor comments focus on needs to provide statistics and method details, and on suggestions on how to improve text clarity, but these edits are easily done and don't require new experiments. Overall, this is an interesting study, and it should be published once the concerns below are addressed.

      Major comments:

      • In experiments reported in Fig 1 and 2A, the authors sort small and large cells in G1, plate them, and later start the drug treatments & cell monitoring. Are these cells actively cycling (progressing in the cell cycle), and how fast? The large cells are likely to enter S phase earlier than the small cells, so by the time that the authors start their drug treatments, they may be comparing cells in different cell cycle stages, which could influence drug sensitivity more than cell size (as the authors also suggest later in Fig 2). This needs to be controlled for. Furthermore, even if the cells remain in G1 after sorting until the drug treatments are started, the authors should address the fact that the drugs are present for a long time, thus targeting the cells in various cell cycle stages.
      • Can the G1 arrest-driven changes in drug susceptibility (Fig 2 C-D) be attributed to cell size? Can the authors rescue the palbociclib treatment with rapamycin or other growth inhibitors that allow size to remain small during G1 arrest?
      • In Fig 2E-F, is the cell cycle distribution of the samples influenced by CCND1 shRNA induction? Are the drug sensitivity effects due to cell size or cell cycle changes?
      • Can the authors address the meaningfulness of the FACS-based size-scaling results in cases where cell-to-cell variability is very large? For example, in Fig 4D&G, the results are so variable even in identically sized cells that the importance of the size-scaling pattern seems questionable.
      • In Figs 4B-D, the cell size axis seems to have over 4-fold size variability, but when the authors show the analysis of this data (Figs 4E-G) the variability is only 2-fold. What was excluded and on what basis?
      • Based on the methods section & figure legends of Fig 4B-I, the RPE cells were not pre-sorted to include only G1 cells, nor did the assay account for cell cycle differences. How can these data be used to explain results from earlier figures, where analyses were exclusively focused on size differences in G1?

      Minor comments:

      • I recommend clarifying in the early introduction that all size changes discussed are in the absence of DNA content increase.
      • The introduction seems to cite primary research and review paper in the same sentences, which is a bit misleading as the reviews don't seem to add new evidence.
      • OPTIONAL In the second introduction paragraph, consider the classification/description of the three different mechanisms. Currently, it seems that these mechanisms are not independent of each other, and the details provided about each mechanism are inconsistent.
      • Please provide statistics for the IC50 values reported based on Fig 1C. Were small and large cells statistically different? Are the IC50 values reported as +/- standard deviation or some other metric?
      • OPTIONAL Providing more insight into why Era2 and RSL3 treatments yield more opposite responses would be of great interest to the field.
      • Is the BODIPY-C11 labeling specific to plasma membrane, as suggested by the writing of the authors, or do the results shown integrate signals over all cell membranes?
      • How exactly is gating done for the flow cytometry samples? Especially when analyzing size-scaling, the results are likely to be sensitive to outliers, such as those seen in Fig 4C (a subpopulation of very low CFSE stained cells). Can the authors clarify their methods and/or display supplementary figures with gating examples?
      • In Fig 4, total protein staining was used as a control, whereas Fig 5B b-actin was used as a control. Why did the authors rely on different controls approaches for essentially the same measurements? Are these controls comparable?

      Significance

      I work in the cell size research field, and I am familiar with other related works in this field. My evaluation reflects a specialist's view of this study. Overall, this study will be of a large interest to a small group of specialists, and specific aspects of the work will also gain some interest from broader basic research audiences studying mechanisms of drug responses and ferroptosis in general. However, I do not see this work gaining very broad interest across larger audiences, simply because the field of cell size research is not of broad interest, and this is not a landmark study for the field.

      The field of cell size research has long searched for size-dependent functions, as these could help explain why cell size matters. This study is a nice addition to our field, helping establish ferroptosis as a size-dependent function. However, the significance of this work relies on how clearly the authors can establish that their results are cell size rather than cell cycle effects (see major comments above). Should the authors address these concerns, then this study will provide some conceptual and mechanistic insight.

      Regarding mechanistic insights, this work is in stark contrast to a recent study about size-dependency of ferroptosis (https://doi.org/10.1016/j.isci.2025.112363), where increased cell size heightened sensitivity to the GPX4 inhibitor RSL3, thus suggesting an opposite conclusion than what the authors observed with the drug Era2. The authors examined this contradiction, and while their results with the drug RSL3 agreed with the recent study, they did not explain why different drug mechanisms yield opposite results. Providing more insights into this discrepancy would increase the impact of this work.

      Regardless of the impact of this work, I want to emphasize that I am fully supportive of seeing this work published once the technical concerns have been addressed. Our field will benefit from this work, and this work could catalyze important future research. The general topic studied here has the potential to become very important.

    1. Reviewer #2 (Public review):

      Summary:

      This manuscript combines fMRI and EEG investigations performed at two research sites to examine 'willed' or volitional visuospatial attention, as contrasted with more standard cued (or 'instructed') visuospatial attention. The primary findings are: 1) willed attention (vs. instructed attention) drives additional cortical circuitry across a broad fronto-parietal network; 2) the direction of willed attention, but not instructed attention can be decoded from the pre-cue EEG data and from MVPA analysis of the trial-level fMRI data; and 3) the subjects with high EEG decoding also exhibited high neural efficiency (i.e., high decoding with low BOLD signal change) in the fMRI data. The methods and data analysis are generally sound, and these results appear solid. On the negative side, it is not made clear how the present findings extend our understanding beyond prior published work from one of the senior authors. There are also three significant concerns regarding interpretation of the findings. One has to do with the causal interpretation of the pre-cue alpha EEG signal determining the direction of willed attention. The second concern is the degree to which the present research paradigm adequately examines 'willed attention.' The third is that the MVPA analysis is not sufficiently described, and Permutation testing needs to be done to validate these findings. Otherwise, this manuscript appears methodologically sound, but questions about interpretation may mute the potential impact.

      Strengths:

      The focus on willed attention attempts to move beyond some of the many limitations of standard laboratory investigations of attention.

      The shared paradigm across two modalities and two research sites demonstrates solid reproducibility, even though a few minor differences are observed across sites.

      Weaknesses:

      (1) There are concerns about this experimental paradigm carrying the banner of Willed Attention, because the application of 'Will' appears quite modest. Yes, extra brain activity is exhibited for this condition vs. its control, but do the cognitive processes isolated adequately stand in for 'Willed Attention?" Willed attention, as operationally defined here, appears to involve a simple decision process prior to the shifting of spatial attention. The cue is internally generated, but after that the rest of the attentional processes appear identical to standard externally cued visuospatial attention experiments. This self-generated cue process likely involves some sort of memory/history of the recently selected cues and then some random-ish selection between A and B. This appears very similar to asking the subject to guess whether a fair coin flip will be heads or tails on each trial. A mental 'coin flip' feels like a very weak version of 'will.' As a potential remedy, it would be helpful to discuss what other phenomena might fall within 'willed attention' and what some future studies might choose to focus on, along with some potential pitfalls (e.g., the reasons why the current study avoided more robust exemplars of will).

      (2) The manuscript is lacking a description of the decision processes used during the willed attention paradigm and is lacking evidence as to WHEN subjects made their willed decision. Both of these points are of major concern:

      (a) The authors state: "For willed attention, participants were explicitly told to avoid relying on any stereotypical strategies of generating decisions, such as always attending the same/opposite side they attended during the previous trial, as well as to avoid randomizing or equalizing their decisions to choose left or right across trials; prior studies found that decisions to explicitly randomize decisions might invoke additional working memory related processes (Spence & Frith, 1999)." Subjects were instructed NOT to apply a simple heuristic and NOT to randomize or try to equalize their decisions, but exactly HOW the subjects made their decisions is not at all clear. What options does that leave? How does this strategy avoid the working memory-related processes mentioned in the Spence & Frith, 1999 citation? The brain regions that comprise the network of interest (aka Frontoparietal Decision Network) are activated by a very broad range of visual cognitive tasks, including many working memory paradigms. The Anterior Insula and dACC nodes Salience Network often simply reflect task difficulty. Obviously, making a choice is more cognitively demanding than not making a choice. The present experiments do not distinguish functional roles between different regions of the Frontoparietal Decision Network. On the whole, the study does very little to isolate the cognitive processes or neural bases of willed attention beyond calling out the set of 'Usual Suspects' for visual cognition.

      (b) The finding that pre-cue EEG signals predicted the postcue decision is intriguing. It could mean that the seemingly irrelevant and transient state of the brain causally and unconsciously biased the subject to one direction or the other. Alternatively, it could mean that the subjects utilized the pre-cue period to make their decision and hold it in case it was needed (i.e., that it was a choice trial). While 2-8 seconds ITI variability makes sense for fMRI decoding, it is a long time for a subject to idly wait, so they might fill that time preparing for the next trial. There appears to have been a substantial amount of individual difference in the pre-cue alpha decoding, which could reflect individual differences in cognitive strategy, specifically in the use of the pre-cue period to make their decision. More efficient decision makers might have pre-decided, which might account for the neural efficiency. The experiments lack any measurement of WHEN participants made their decision. For that reason, I would ask that the authors temper their claims about the significance of the alpha decoding and its possible causality.

      (3) Did individual subjects exhibit a choice bias of location for the willed trials? If not, doesn't that raise concerns that subjects were trying to equalize their trials? If they do exhibit location biases, how does that impact the decoding? A simple decoder could learn to always just guess the biased direction for a subject and would perform > 50%. Consider the example in which an individual subject chooses 'Left' 55% of the time. A classifier that simply learns to choose 'Left' on every trial will be correct on 55% of trials. The training data would likely be sufficient to learn the direction of choice bias in each individual subject. So the classifiers could perform significantly above 50% without learning anything beyond the tendency of each subject. That is to say, 50% is not truly chance in this data set. It doesn't appear that Permutation testing has been performed to empirically determine chance for an individual's data. Permutation methods, scrambling the labels 1000 or 10000 times to establish a true baseline would be preferred over simply comparing to 50% and would address concerns about individual subject biases.

      (4) The novel contributions of this work beyond the two prior Bengson et al papers from Dr. Mangun's lab appear quite modest. The discussion would be enhanced by specifically stating how the present work advances understanding beyond the prior Bengson studies.

    2. Reviewer #3 (Public review):

      Summary:

      This manuscript analyzes two independent datasets collected at different sites. Using the same willed-attention paradigm (instructional vs. choice cues) and combining fMRI and EEG analyses, the authors investigate how attentional direction is selected when no external instruction is provided. Their main claims are that the dorsal attention network is engaged by both cue types, whereas the choice cue additionally involves a frontoparietal decision network. Moreover, left-versus-right attentional decisions can be decoded in this decision network only on choice trials, and multichannel pre-stimulus alpha patterns predict the subsequent attentional choice. Finally, individuals with more predictive alpha patterns show greater neural efficiency in the decision network, i.e., higher decoding with lower BOLD activation.

      The question is worthwhile and the two-site design is a genuine strength. At the same time, several central inferences rely on decoding analyses for which the statistical testing and cross-validation structure are not described in enough detail to assess robustness. In addition, using a ratio-based neural-efficiency measure make the interpretation more fragile than it needs to be. With a focused revision that tightens inference around MVPA and clarifies a few methodological points, I think the paper could become substantially more convincing.

      Strengths:

      The work extends previous willed attention studies by attempting to link pre-stimulus alpha pattern predictability to post-cue frontoparietal representations, and by testing reproducibility across two datasets. The conceptual advance beyond previous studies, e.g., Bengson et al. (2015), however, depends on how solid the decoding-based evidence is and whether alternative explanations are convincingly excluded. At present, the strength of support is limited mainly by incomplete reporting and/or controls for MVPA significance testing, as well as potential inflation of decoding estimates if folds are not independent of run structure. Concerns about statistical assessment of decoding accuracy are well documented in the literature (Combrisson & Jerbi, 2015).

      Weaknesses:

      (1) The manuscript describes the decoding pipeline for both fMRI and EEG MVPA. However, it does not clearly specify how "significantly above chance" is determined for the fMRI ROI decoding, nor how multiple comparisons across ROIs are handled, even though p-values are reported. The same issue applies to the time-resolved EEG analysis across many time points. For each decoding analysis, please specify the inferential test (e.g., permutation test within participant, group-level test on subject accuracies, binomial test, etc.) and report effect sizes with confidence intervals (e.g., Combrisson & Jerbi, 2015). Further, for EEG decoding over time, it would be preferable to control family-wise error, e.g., cluster-based permutation, rather than thresholding pointwise p-values. A standard approach here is the nonparametric cluster framework (e.g., Maris & Oostenveld, 2007).

      (2) The cross-validation approach used here is appreciated and appropriate in principle. However, random 10-fold splits across trials can inflate accuracy if training and test folds share run-specific noise, scanner drift, or autocorrelated structure. The manuscript should indicate whether folds were blocked by run or randomized across the entire session. In addition, please report the number of trials per condition after artifact rejection and after removing short ITIs for the long prestimulus epochs (−2500 ms to 0 ms) for each dataset in the section of EEG preprocessing. Similarly, please report how often participants chose left vs. right on choice trials, and whether balanced folds (or an equivalent balancing procedure) were used if needed.

      (3) Moreover, ROI definition is not sufficiently specified and independence should be clarified. The ROIs are defined based on peaks from the choice-instructed univariate contrast (Table 2) and then used for MVPA. First, are these ROIs defined as spheres around peaks or using anatomical masks? What radius or voxel count was used? This needs to be explicit. Second, I am concerned about circularity risk. Although choice-vs-instructed selection is not identical to left-vs-right decoding, ROI selection from the same dataset can still bias descriptive estimates and encourages overinterpretation if not carefully justified (Kriegeskorte et al., 2009). At minimum, the authors should explain why their selection criterion is independent of the decoded contrast under the null, and ideally provide a robustness check using either anatomical ROIs or independently defined ROIs, e.g., from prior literature or an atlas.

      (4) Using an index of neural efficiency is conceptually interesting. However, if the denominator, computed as the activation difference between choice and instructional conditions, is near zero or noisy, the ratio can become unstable. I would rather see a multivariate model that treats activation and decoding as separate dependent measures, or a latent-variable approach, than a single ratio.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1

      __#1.1 __In Fig 1 it appears that both PI3P and PI3,5P2 are present on lysosomes (Lamp1-FRB) and that recruitment of MTM1 (via FKBP) depletes both of them. Additionally, In the images and quantifications, it appears that after recruitment of MTM1, there is no vesicular pools of PI3P or PI3,5P2 left at all. This raises several concerns:

      __#1.1a __What is happening to the pool of PI3P on Rab5 endosomes? The main pool of PI3P is on Rab5 endosomes and separate from LAMP1 lysosomes. Why does recruitment of MTM1 to lysosomes then lead to almost complete depletion of all PI3P (Fig1 B bottom panel), when the lysosomal pool of this lipid should be minor compared to that on early endosomes?

      We plan to add data showing recruitment of FKBP-MTM1 to FRB-Rab5 positive endosomes. This also causes complete depletion of both PI3P and PI(3,5)P₂ probes. We believe this is the result of the over-expression of Rab5 or LAMP1 reporters, the robust activity of FKBP-MTM1 and the somewhat indistinct nature of these compartments (e.g. 10.1016/j.cell.2024.11.028) – in short, there is enough residual LAMP1 present in Rab5 enriched compartments or Rab5 in LAMP1-enriched membranes that robust depletion of lipids occurs in both when over-expressed .

      __#1.1b __The main problem comes from disagreement with figure 3. There it is shown that ".PI(3,5)P2 does not significantly overlap with PI3P". If this is true, then why does recruitment of MTM1 to LAMP1 lead to the depletion of both PI3P and PI3,5P2. This would indicate that they are on the same compartment (LAMP1 positive lysosomes)?

      We plan to add a complete, quantitative co-localization analysis of PI(3,5)P₂ and PI3P probes, as well as an analysis of both lipid probes with Rab5 and LAMP1. We already showed in such an analysis that PI3P is more closely co-localized with Rab5 than Rab7 (10.1083/jcb.201312072), but this is not absolute: not all late endosomal compartments are alike, and some contain PI3P whereas some do not (e.g. 10.1016/j.cell.2023.09.027). Likewise, we expect to show near exclusion of PI3P and PI(3,5)P₂, with greater co-localization between PI3P and Rab5 vs LAMP1, and the opposite for PI(3,5)P₂ (as shown in 10.1083/jcb.202408158). But nonetheless, there will be some PI3P-positive LAMP1 membranes.

      #1.1c A merge and colocalization analysis is at least needed to address how much of SnxA and EEA1-FYVE is localized to LAMP1-FRB. But just by looking at the one cell depicted, the localisation of LAMP1-FRB seems to be rather distinct from EEA1-FYVE and SnxA

      We believe our response to #1.1b will also address his point.

      #1.2 Minor: Fig2 is all in all way to small and it is difficult to actually see the signal in panel A and C. It should also be noted that in these panels (zoomed in to my best ability) that there is clear localisation of SnxA to the plasma membrane. Is this pool lost after Apilimod treatment, or is this unspecific binding? This PM localisation also appears in Fig1 B and 5C and should be discussed.

      We will enlarge the panels in figure 2 to ease visualization of the detail. We will also describe the PM localization of the probe, and how this is insensitive to apilimod treatment (Fig2. 2A). We will add this to the discussion, also citing how a non-PI(3,5)P₂ dependent PM localization of SnxA was also noted in the original study (Vines et al), and note this as a potential draw back of the full-length SnxA.

      __#1.3 __It should also be pointed out that dextran will be taken up through macropinocytosis and might follow different endocytic trafficking to conventional receptor.

      We will add a description to the results for figure 2 describing how dextran is taken up non-selectively through fluid-phase endocytosis (i.e. both conventional receptor-mediated endocytosis and micropinocytosis).

      #1.4 It also seems that the dextran signal decreases after Ap treatment. Is this the case? Has this been quantified? Would that point towards the whole endocytic structure being degraded and not just PI3,5P2?

      We believe this reduction is due to simple photobleaching of the dye under time-lapse imaging. We will quantify this as suggested – but crucially, we included the DMSO vehicle control which will perfectly control for photobleaching.

      __#1.5 __OPTIONAL: The timeline of PI3,5P2 degradation and vacuolation is a great experiment and very important. However, PI3,5P2 production is critically dependent on prior generation of PI3P. It could be of interest to readers what happens to SnxA localisation when cells are treated with VPS34-IN-1 or Wortmannin. How does loss PI3,5P2 compare between the timing of PIKfyve inhibition and inhibition of PI3K that generate its substrate. The timescales of this might tell us something interesting about the timings of phosphoinositide conversions. Also, treatment with PI3K inhibitors is known to not lead to vacuolisation even though it does also lead to a loss of PI3,5P2, further indicating that there might be more to the vacuolation phenotype than just the loss of PI3,5P2. These experiments would add novelty and detail to the study but do not change the overall findings or conclusions.

      This would indeed be an feasible experiment, but we do not believe the novelty would be significant: rapid depletion of PI3P with PI3K inhibitors has already been documented; notably, we demonstrated depletion of PI3P by wortmannin with a similar ~10 min time constant (10.1083/jcb.201312072) to the PI(3,5)P₂ depletion by apilimod reported herein. Since substrate depletion is ultimately an enzyme inhibitor, we would anticipate very similar kinetics for PI(3,5)P₂ depletion by apilimod vs wortmannin. However, if the reviewer and editor believe this experiment is essential, we could add it. We just do not believe it significantly strengthens the paper.

      #1.6a: Major: Fig3: The distinct localisation of PI3P and PI3,5P2 looks convincing but does not seem to be in line with data in Fig1.

      We believe our response to #1.1 addresses this point also.

      __#1.6b __It should be clearly noted and discussed that the structure in Fig3 B is a nascent macropinosomes (the size and proximity to the PM make this clear) and this transition on macropinosomes as been well characterised in Vines et al 2023

      We agree with the reviewer that this structure could easily be a macropinosome, though without a marker, we cannot definitively make this claim. Of course, the same logic applies to our erroneous declaration that this is an endosome (though HEK293A cells do possess some similarly very large, Rab5+ve sorting endosomes)! We will therefore amend the text to classify this as “an initially PI3P positive compartment”.

      #1.7 Major minor: Figure 5 is a very powerful approach to produce a completely independent pool of PI3,5P2. As control, the PIKfyve-CCR-Kina construct should be recruited to mitochondria in the absence of the MavQ construct. This should lead to the generation of PI5P (the other known product of PIKfyve) without generating PI3,5P2. This would be a useful control to test that SnxA does not bind to PI5P. (As a side note, this method would be great to use to try and find a bone fide PI5P effector. Maybe by isolating mitos and sending them to Mass Spec after altering their PIP composition? But this is obviously a completely different story)

      The is an excellent suggestion. We plan to add these experiments as a control for Fig. 5, though there will be nuanced interpretation – because we have no way of verifying PI5P generation by PIKfyve in the absence of a verified PI5PP biosensor. Furthermore, direct synthesis of PI5P by PIKfyve remains controversial (e.g. 10.1073/pnas.1203106109). Incidentally, we have attempted to co-recruit a PIP4K to convert any PI5P to PI(4,5)P₂, but this did not work.

      #1.8 Major: Figure 6 is great work, but it must be discussed that SnxA is a constitutive dimer while the 2xPx is a monomer. Therefore, the SnxA dimer will also contain two fluorescent proteins and will therefore also appear brighter than the 2xPx. Therefore, the increased baseline of SnxA might just be due to its nature as dimer and not because it has a higher affinity to PI3,5P2 compared to the 2xPx. OPTIONAL: One way to clear this up would be to make a 2xPx construct that also contains 2 fluorescent proteins. Also, the increase in fluorescence should also be shown as normalized to the initial intensity. From the graphs in 6a it seems that both have an ~1.3 and ~1.5 fold increase?

      We will add a discussion of the constitutive dimer nature of full-length SnxA. However, we disagree with the assertion that the dual GFP of the GFP-SnxA dimers accounts for the increased baseline labeling – since our quantification is based on the ratio of fluorescence at the membrane to elsewhere. Since there is a heavily diffuse signal of unbound probe, this diffuse pool is also dually labelled by the dimeric GFP-SnxA – in effect, the increased signal on membranes by dimeric GFPs is cancelled by the cytosolic dimers. We will clarify this point in the discussion.

      #1.9 OPTIONAL (but important): It would be great to compare the recently described G13V mutant of SnxA that is reported to have a higher affinity to PI3,5P2 (Preprint, Nishimura et al 2025). These experiments would add novelty and detail to the study but do not change the overall findings or conclusions. But as the study sets out to find the best PI3,5P2 probe, this might be very important

      This is indeed an important study. That said, the existence of this mutant does not change the overall importance of our demonstration here that PI(3,5)P₂ is both necessary and sufficient to recruit SnxA to membranes. We also feel that it is beyond the scope of our manuscript to perform detailed controls on the findings of another manuscript that itself has not passed peer review. However, we will add a point to the discussion about the likely impact of the G13V mutant.

      #1.10 Minor: In the discussion it is stated that "Vines et al. showed SnxA to have higher affinity for PI(3,5)P2 in vitro compared to 2xPx-SnxA" The Kd (to model membranes with 5% PI3,5P2 in POPC background!) for SnxA there is 187+-13nM and for 2xPx 217+-7 nM. Taking the top and bottom ranges of the uncertainty intervals to 200nM and 210nM respectively. While this is still a slightly higher Kd for SnxA, I am unsure how physiologically relevant this difference is.

      This is an excellent point, and we will amend the discussion to point out that the subtle differences in affinity likely don’t explain the enhanced performance of SnxA, as indeed the dual GFPs don’t as covered in #1.8.

      __#1.11 __Minor: In the discussion it is stated that "But, we could observe a short temporal overlap between the two lipids within a single endosome, lasting only a few minutes". If this statement refers to Figure 3B, then it must be clearly stated that this a macropinosome and not be conflated with canonical endosomes

      We will less precisely define these membranes as we did in response to #1.6b

      __#2.1 __While the data are by enlarge backed up by the presented data, there are two instances in which controls do not show the desired properties in some figures (but do in others). Specifically, the kinase dead control construct experiments in Figures 4D and 5D suffer from a lack of localization of the FKBP-PIKfyve-CCR-Kinase-Dead construct to the Rab5 endosomes or mitochondria respectively. Without robust localization of the control kinase, the dependency on PI(3,5)P2 synthesis (as opposed to physical kinase mediated recruitment) cannot be excluded. The construct shows robust relocalization in Figure 6 when the 2xPx-SnxA is being evaluated. With the combined evidence shown in the paper, this does not create significant doubt about the overall conclusions, but this issue needs to be addressed.

      We believe the poor apparent recruitment in figures 4D and 5D stems from two reasons; firstly, the signal to noise of the tagBFP-kinase construct is low (especially since the photodamaging 405 nm excitation laser must be used at low intensity of these time-lapse experiments). Therefore, one solution is to simply select example images to demonstrate recruitment where expression of TagBFP2 had better signal-to-noise. As can be seen from the quantification of data in figures 5 and 6, recruitment of FKBP-PIKfyve-CCR-Kinase and FKBP-PIKfyve-CCR-Kinase-Dead were very similar, but this was not so for figure 4; in this case, we will re-analyze the data to analyze only FKBP-PIKfyve-CCR-Kinase cells with similar recruitment to the dead for a direct comparison; if there is insufficient data, we will repeat the experiments to obtain balanced recruitment of both.

      __#3.1 __In Figure 1B, the authors show a rapid depletion of PI3P upon recruitment of MTM1 to LAMP1-positive membranes. The apparent minimal overlap between EEA1-FYVE and LAMP1 signals in the images is intriguing. It would be helpful if the authors could comment on whether this reflects rapid membrane trafficking dynamics or nonspecific membrane targeting of MTM1. In addition, the lysosomal SnxA signal appears to decrease with somewhat slower kinetics compared to PI3P. A short clarification as to whether MTM1 is expected to directly dephosphorylate PI(3,5)P₂, or whether the observed reduction in PI(3,5)P₂ is secondary to depletion of its precursor PI3P, resulting in gradual turnover of the PI(3,5)P₂ pool. Clarification of this point would further refine the mechanistic interpretation, although this point does not detract from the overall conclusions.

      The first part of this comment will be addressed by our response to #1.1. We will also add a discussion detailing how MTM1 itself has a somewhat controversial substrate selectivity, with the first report stating it was PI3P selective and did not hydrolyze PI(3,5)P₂ (10.1073/pnas.160255697), whereas a subsequent study found that it has robust activity against PI(3,5)P₂ (10.1016/s0960-9822(03)00132-5). Therefore, direct activity on PI(3,5)P₂ cannot be ruled out. Secondly, we will describe how our data and Pemberton et al showed that PIKfyve recruitment depletes PI3P, demonstrating that PIKfyve activity can outstrip VPS34 that makes PI3P. Therefore, in PI(3,5)P₂-positive, PI3P-negative membranes, it is possible that VPS34 is still active, but PI3P never accumulates because PIKfyve immediately converts it to PI(3,5)P₂.

      #3.2. In Figure 3, the authors conclude that SnxA and FYVE do not overlap, but rather localize in close proximity. While this is an interesting observation, some degree of apparent overlap can be perceived in the images. Although the qualitative impression is largely convincing, inclusion of a quantitative colocalization analysis (e.g., Pearson's or Manders' coefficients) could further strengthen this conclusion.

      We are planning to add precisely this analysis in response to #1.1b.

      #3.3. In Figure 4D, SnxA localization is still detectable in cells expressing the kinase-dead PIKfyve construct. If endogenous wild-type PIKfyve is present, residual PI(3,5)P₂ production could explain this signal and influence the interpretation of the experiment. The authors should clearly state which cell lines were used and whether endogenous PIKfyve activity was present in each condition. This information should be provided consistently in the figure legends to avoid ambiguity.

      We will clarify the results and legends to make clear that endogenous PIKfyve is indeed expressed in these cells, with either FKBP-PIKfyve-CCR-Kina or FKBP-PIKfyve-CCR-Kina-Dead. Will clarify that over-expressed catalytic PIKfyve fragments will not perturb endogenous PIKfyve activity, explaining basal SnxA localization in figures 4C and 4D.

    1. railway station.

      好的,我们先一步步来分析这个题目。

      原句是:<br /> He always has his shoes __ at the railway station.


      1. 句子结构分析

      这里考查 “have + 宾语 + 宾语补足语” 的用法。<br /> - 如果表示 “让某人做某事”,用 have sb do sth(美式)或 have sb to do sth(英式较少见,但此题不适用)。<br /> - 如果表示 “让某物被做”(让别人做,不是主语自己做),用 have sth done(过去分词表示被动)。<br /> - 如果表示 “让某物/某人一直保持某种状态或进行某个动作”,用 have sth/sb doing(现在分词表示主动且持续)。


      2. 结合句意

      “他总是在火车站让人擦鞋。”<br /> 鞋子是被擦的,所以逻辑上鞋子是 被擦亮,而不是鞋子自己“正在擦亮”别人。<br /> 因此要用 have sth done(让别人做某事)的结构。

      “擦亮”鞋子,动词原形是 shine,过去分词是 shined(美式,特指擦亮鞋子)或 shone(一般指发光,不用于擦鞋)。<br /> 这里指“擦鞋”,所以过去分词用 shined


      3. 选项分析

      (A) shone —— 过去分词,但意思是“发光”,不表示“被擦亮”,排除。<br /> (B) to shine —— 不定式,这里不是表示目的或将来,不符合“have sth to do”的惯用表达(have sth to do 表示“有某事要做”,主语自己做),这里主语是让别人做,所以排除。<br /> (C) shining —— 现在分词,表示主动或持续,但鞋子不会自己“正在擦亮”,逻辑不对,排除。<br /> (D) shined —— 过去分词,表示“被擦亮”,正确。


      4. 简单解释与重点

      简单解释:<br /> 这个句子意思是“他总是在火车站(让别人)擦鞋”。<br /> 英语里“have + 物品 + 过去分词”表示“让别人做这件事”。<br /> “擦鞋”的“擦亮”在美式英语里过去分词是 shined,所以选 (D)。

      关键点:<br /> 1. have sth done = 让某事被做(别人做)。<br /> 2. 区分 shine 的过去分词:<br /> - shone → 发光<br /> - shined → 擦亮(鞋子等)<br /> 3. 根据句意,这里是被动含义,所以用过去分词 shined


      答案(D) shined

    2. quite a while ago.

      好的,我们一步步来分析这道题。

      原句:

      We can go out now. It stopped __ quite a while ago.


      1. 句子意思

      “我们现在可以出去了。雨在好一会儿之前就停了。”<br /> 空格在 It stopped __ 后面,需要选择正确的动词形式。


      2. 关键语法点

      stop 这个动词后面可以接两种形式:

      1. stop doing something
      2. 表示“停止做某事”(停下正在做的事情)。
      3. 例:It stopped raining.(雨停了 → 雨原来在下,现在不下了。)

      4. stop to do something

      5. 表示“停下来,去做另一件事”。
      6. 例:He stopped to rest.(他停下来休息。)

      3. 结合本题

      句子说“雨在好一会儿之前就停了”,意思是“下雨”这个动作停止了。<br /> 所以要用 stop + doing 结构。<br /> rain 的动名词是 raining


      4. 选项分析

      • (A) rain → stop 后面不能直接接动词原形,排除。
      • (B) raining → 正确,stop raining 表示“停止下雨”。
      • (C) to rain → stop to rain 意思是“停下来,以便下雨”,逻辑不通。
      • (D) rained → stop 后面不能直接接过去式,排除。

      5. 答案与要点总结

      正确答案:(B) raining

      关键点:<br /> - stop doing = 停止做某事(某事结束)<br /> - stop to do = 停下来去做另一件事<br /> - 这里“雨停”是“停止下雨”,所以用 stopped raining


      如果你愿意,我可以再给你出几道类似的练习题来巩固这个知识点。

    3. by the prosecution

      好的,我们一步步来分析这道题。

      原句:<br /> The witness went on the witness stand __ by the prosecution.

      中文翻译:<br /> 证人走上证人席,______ 被检方。


      1. 句子结构分析

      • 主语:The witness(证人)
      • 谓语:went on the witness stand(走上证人席)
      • 空格部分:修饰“went on the witness stand”的目的或状态,并且后面有 by the prosecution(被检方)。
      • “by the prosecution” 表示动作的发出者是检方,所以空格处的动词应该是被动关系

      2. 选项分析

      (A) being questioned<br /> - 表示“正在被询问”,强调进行中的被动状态。<br /> - 但这里“went on the witness stand”是一个瞬间动作,用“being questioned”作状语时,通常表示伴随状态,逻辑上证人走上证人席时“正在被检方询问”不太符合常理(一般上了证人席后才开始被询问)。

      (B) to question<br /> - 主动不定式:表示目的——“为了询问(别人)”。<br /> - 主语是证人,证人走上证人席是为了询问别人吗?不合理,因为证人主要是被询问的。

      (C) to be questioned<br /> - 被动不定式:表示目的——“为了被询问”。<br /> - 符合逻辑:证人走上证人席,目的是被检方询问。

      (D) questioning<br /> - 现在分词主动形式:表示主动询问别人。<br /> - 与 by the prosecution 连用会矛盾,因为如果是“questioning by the prosecution”应该是被动才对。


      3. 关键点

      • 看到 by the prosecution 就确定要用被动形式
      • 不定式可以表目的,这里最合理的目的是“为了被询问”。
      • 因此正确选项是 (C) to be questioned

      4. 简单解释

      这句话意思是:证人走上证人席,是为了被检方询问。<br /> “to be questioned” 表示“将要被询问”,符合证人席的功能。

      答案:C

    4. breakfast every day.

      好的,我们一步步来分析这道题。


      题目原句:<br /> Americans ______ bacon and eggs for breakfast every day.<br /> 选项:<br /> (A) used to having<br /> (B) are used to have<br /> (C) are used to having<br /> (D) used to


      1. 关键语法点

      英语中有两个容易混淆的结构:

      1. used to + 动词原形
      2. 表示“过去常常做某事,但现在不做了”。
      3. 例如:I used to play basketball.(我过去常打篮球。)

      4. be used to + 名词/动名词

      5. 表示“习惯于某事”。
      6. 例如:I am used to getting up early.(我习惯早起。)

      2. 分析句子意思

      句子说“美国人每天早餐吃培根和鸡蛋”,这是描述一个现在的习惯或普遍现象,而不是“过去有但现在没有”的习惯。

      所以不能用 used to + 动词原形(因为那是过去习惯,且暗示现在不这样了)。

      应该用 be used to + 动名词 表示“习惯于做某事”。


      3. 看选项

      (A) used to having ❌(没有 be 动词,used to 后面不能直接接 having 表示习惯)<br /> (B) are used to have ❌(be used to 后面应接动名词或名词,不能接动词原形)<br /> (C) are used to having ✅(正确,表示“习惯于吃”)<br /> (D) used to ❌(后面缺动词原形,而且表示过去习惯,与句意不符)


      4. 简单解释

      • 句子想表达“美国人习惯于每天早餐吃培根鸡蛋”,所以选 are used to having
      • are used to = 习惯于(现在)
      • having = 动名词,因为 to 在这里是介词,不是不定式符号。

      答案: (C) are used to having

      一句话总结:<br /> “be used to doing” 表示现在习惯做某事,符合句意“美国人习惯每天早餐吃培根鸡蛋”。

    5. special holes.

      这个题目考查的是如何用英语正确、流畅地连接两个相关的动作或事实。

      原句是:<br /> “__ the silkworm makes a liquid in its body and then squeezes it out through special holes.”<br /> 意思是:“__ 蚕在体内制造一种液体,然后通过特殊的小孔把它挤出来。


      简单解释:<br /> 横线后面是一个完整的句子(有主语 the silkworm,有动词 makes),所以横线上需要填一个能引出这个句子的成分,并且要和后面内容在逻辑上连贯。

      选项分析:

      1. (A) It makes silk<br /> 如果选这个,就变成 “It makes silk the silkworm makes a liquid...” 这会出现两个动词 makes 连在一起,语法错误,意思也不通。

      2. (B) Making silk<br /> “Making silk, the silkworm makes a liquid...” 语法上可以,意思是“为了制造丝,蚕在体内制造一种液体……”,但英语里这种现在分词开头通常表示伴随或方式,这里逻辑是“目的是制丝”,用 Making silk 不太精确,但可接受。不过,更常见、更明确表示目的的是不定式。

      3. (C) To make silk,<br /> “为了制造丝,蚕在体内制造一种液体……” 这是最清晰的表达“目的”的方式,语法正确,并且和后面句子连接自然。

      4. (D) Silk is made by<br /> 如果选这个,就变成 “Silk is made by the silkworm makes a liquid...” 这样 by 后面接 the silkworm 本来可以,但后面又出现 makes,就重复谓语,语法错误。


      关键点:<br /> - 从语法上,只有 (B)(C) 能让句子完整。<br /> - 从逻辑和常见表达上,这里描述的是“蚕制丝的过程”,用 To make silk(为了制丝)来引出后面步骤是最自然、最符合英语习惯的。<br /> - 因此正确答案是 (C) To make silk,


      简单总结答案:<br /> 选 C,因为“To make silk”表示目的,能自然连接后面蚕的身体如何制造丝的过程,语法正确,意思清晰。

    6. He is said

      这是一道非常经典的不定式(to do)被动语态结合的题目。正确答案是 (A) to be gentle and gracious

      我们可以用你之前学到的“还原法”和“助动词逻辑”来彻底拆解这道题:

      1. 还原逻辑:从“据说”到“事实”

      这句话的底层逻辑是一个关于“传闻”的结构。我们先把它还原成主动语态:

      • 主动语态: His friends say **that** he **is** gentle and gracious.(他的朋友们说他很文雅。)
      • 被动语态(形式主语): It is said by his friends **that** he **is** gentle...
      • 被动语态(主语提升): He is said by his friends **to be** gentle...

      2. 为什么用不定式 to be

      还记得你之前读到的吗?不定式 to 往往隐藏着一个助动词或一种“非确切事实”的语气。

      当我们将 that 从句(事实陈述)转换为简单句时,原来的动词 is 不能直接跟在 said 后面(不能说 He is said is...)。

      • 逻辑转换: 这里的 to be 承接了原来从句中 is 的含义。
      • 语气含义: “据说”本身带有某种程度的推测或转述色彩(类似 it seemsit is believed),这种非直接观察到的状态,在英语中习惯用不定式 to 来连接。

      3. 选项分析(排除干扰)

      • (B) to have graciousness and gentle: 词性不匹配。and 连接的是并列成分,graciousness 是名词,gentle 是形容词,结构不对称。
      • (C) gentle and a gracious man: 缺少动词。He is said... 后面如果没有 to be,句子就断了,无法连接后面的形容词。
      • (D) that is a gentle and gracious man: 语法错误。如果用 that 引导从句,主语必须是形式主语 It(例如 It is said that...)。当主语是 He 时,后面不能直接接 that 从句。

      4. 延伸:如果他“以前”很文雅呢?

      根据你之前读到的“完成式表达过去”的逻辑,如果他的朋友们说他“以前(现在不一定了)”很文雅,这句话会变成:

      He is said to **have been** gentle.

      你看,逻辑是完全统一的:To + 原形表示现在的情况,To + have done表示过去的情况。


      💡 总结

      这类题目(is said to / is reported to / is believed to)考察的就是将从句转化为不定式的能力。

      记住这个公式:

      主语 + be said + to do / to be ...

      需要我再给你出两道类似的“据说/看起来”结构的题目练习一下吗?

    7. Marie __

      这是一道考察使役动词固定短语结合的经典语法题。正确答案是 (D) made believe

      我们可以用你之前学到的“逻辑还原法”来拆解这个句子:

      1. 核心短语:Make believe

      在英语中,make believe 是一个固定短语,意思是“假装”(pretend)。

      • 逻辑拆解: 这里的 make 依然遵循使役动词的逻辑。make believe 字面意思是“使(人)相信”。
      • 为什么不带 to? 就像你之前读到的,使役动词 make 具有强制性,要求结果是确定的(我要让你信,你就得信),因此排除了代表不确定性的 to,直接接原形 believe

      2. 时态的一致性:过去时

      观察句子的前半部分:

      Not wishing to attend the dance...(因为不想参加舞会……)

      这是一个分词结构,描述的是过去发生的背景。主句的主语是 Marie

      • 既然 Marie “假装”生病是过去发生的事,动词 make 必须变成过去式 made
      • 后面的 believe 依然保持原形,因为它受使役动词 made 的支配(使役动词 + 宾语 + 原形,这里省略了宾语或将其合一)。

      3. 为什么其他选项不对?

      • (A) made believed: 错误。make(使役动词)后面必须接动词原形,不能用过去分词 believed
      • (B) make believe: 时态错误。这是原形,不能表达过去发生的动作。
      • (C) makes believe: 时态错误。这是第三人称单数现在时。

      4. 深度理解:句子还原

      按照你喜欢的“还原法”,这句话可以理解为:

      Marie made (people) believe that she had a fever. (玛丽使人们相信她发烧了 $\rightarrow$ 玛丽假装发烧了。)

      总结

      这道题其实是两个知识点的交汇:

      1. 使役动词强制性: make 后面接原形(不带 to)。
      2. 动作发生时间: 整个事件发生在过去,所以 makemade

      下次看到这类题目,你可以先找“谁让谁做”,如果没有 to 且时态正确,那就是它了!需要我再找几个类似的使役动词练习题考考你吗?

    8. Not wishing

      好的,我们先一步步来分析这个题目。

      原题:<br /> Not wishing to attend the dance, Marie __ that she had a fever.<br /> 选项:<br /> (A) made believed<br /> (B) make believe<br /> (C) makes believe<br /> (D) made believe


      1. 句子意思

      “不想去参加舞会,Marie ______ 她发烧了。”<br /> 这里要填的是一个表示“假装”的动词短语。


      2. 关键短语

      英语中“假装”有一个固定短语是 make believe(动词原形),但用在过去时的时候,动词 make 要变成过去式 made,后面 believe 保持原形(因为它是 make 的宾语补足语,不是谓语动词的过去式)。

      所以:<br /> - make believe(原形)<br /> - made believe(过去式)<br /> - makes believe(第三人称单数现在时)


      3. 判断时态

      句子开头 “Not wishing…” 是现在分词作状语,主句的谓语动词应该表示过去发生的动作(Marie 当时不想去,所以她当时假装发烧)。<br /> 因此要用 过去式 made believe


      4. 选项分析

      (A) made believed ❌(believe 加 ed 是错误的,因为这里 believe 是不带 to 的不定式)<br /> (B) make believe ❌(时态不对,不是原形)<br /> (C) makes believe ❌(时态和主语单复数不对,且时态不符)<br /> (D) made believe ✅(过去时,形式正确)


      5. 简单解释给学生

      • make believe 是一个固定短语,意思是“假装”。
      • 如果事情发生在过去,就把 make 变成 made,后面 believe 不变。
      • 这个句子讲的是 Marie 过去不想参加舞会,所以用 made believe

      答案: (D) made believe

    9. 感官动词

      要理解感官动词(Sensory Verbs,如 see, hear, watch, feel, notice),最顺畅的逻辑依然是沿用你之前学到的:“确定性”与“不确定性”的博弈。

      感官动词最让人困惑的地方在于:它后面既可以接原形动词,也可以接 -ing(现在分词),但几乎从不接不定式(to do)


      1. 为什么感官动词后不接 to

      根据你之前读到的逻辑:to 代表“未来、意图、不确定性”。

      • 逻辑推理: “看到”或“听到”通常是眼见为实的事情。既然你已经亲眼看到了,这个动作就是确定的事实,不再具有“将要、可能、打算”这种虚无缥缈的语气。
      • 结论: 既然没有了“不确定性”,代表助动词语气的 to 也就失去了存在的空间。

      对比: * I want him **to** cross the road. (我想让他过马路 —— 他还没过,是一个意图)。 * I saw him **cross** the road. (我看到他过马路 —— 这是一个既成事实)。


      2. 接“原形” vs. 接“-ing”:整体与片段

      既然排除了 to,感官动词就剩下了两种选择:原形(do)分词(doing)。它们的区别在于你观察到的“完整度”。

      A. 接原形 (do):强调“全过程”或“事实”

      当你使用原形时,你强调的是你看到了这个动作从头到尾的发生,或者你只是在陈述一个事实

      • 例子: I saw him **cross** the street.
      • 内心戏: 我看到他从马路这边走到了那边,整个过程我都看到了。

      B. 接 -ing (doing):强调“瞬间”或“正在进行”

      当你使用 -ing 时,你强调的是你撞见那个动作的某一个瞬间,动作还在进行中。

      • 例子: I saw him **crossing** the street.
      • 内心戏: 我路过时,他正走在马路中间。至于他什么时候开始走的,什么时候走完,我不在意。

      3. 感官动词的“大一统”逻辑表

      我们可以把感官动词的用法归纳为以下三种感知状态:

      | 结构 | 表达的意义 | 逻辑来源 | | --- | --- | --- | | See sb. do | 看到全过程 / 事实 | 确定发生,无需 to;强调完整事实。 | | See sb. doing | 看到正在进行的片段 | 强调动作的持续性和那一刻的状态。 | | See sth. done | 看到某物被... | 强调动作已完成的结果(被动事实)。 |


      4. 深度思考:为什么感官动词有时可以接 that 从句?

      当你不仅仅是“看见一个动作”,而是“意识到一个事实”时,会用到 that 从句。

      • I saw that he was angry. (我发现/意识到他生气了。)
      • 这和你亲眼盯着他“生气”这个动作看是不一样的。这进一步证明了感官动词在处理“直接感知”时,倾向于使用最直接、无缓冲(无 to)的结构。

      💡 总结

      感官动词之所以特殊,是因为它们连接的是感官与现实

      1. 因为是现实,所以**拒绝 to**(不确定性)。
      2. 因为现实有“完整过程”和“进行瞬间”之分,所以分化出了原形-ing

      现在你能理解,为什么在描写车祸现场时,警察会问你 Did you see the car hit the man? 而不是 to hit 了吗?

    10. 使役动词

      理解使役动词(Causative Verbs)最有效的方法,依然是沿用你之前读到的“助动词还原法”

      使役动词的核心意义是“主语不亲自做,而是‘使、让、叫、叫唆’别人去做”。在英语中,最主要的使役动词有四个:make, have, let, get

      它们之间的区别,其实就是它们背后隐藏的“强制力(助动词语气)”强弱不同。


      1. Make:强制(Must/Force)

      make 的语气最重。当你“make”某人做某事时,对方通常没有选择余地。

      • 结构: make + 人 + 动词原形
      • 还原逻辑: I make him do it. $\approx$ I force him so that he **must** do it.
      • 注意: 因为 make 包含了极强的“强制”结果,动作是必然发生的,所以它省去了 to,直接接原形。

      2. Have:责任与委托(Should/Will)

      have 语气比较客观,通常基于地位、职务或金钱交易。

      • 结构: have + 人 + 动词原形
      • 还原逻辑: I'll have the mechanic fix my car. $\approx$ The mechanic **should/will** fix it (because it's his job).
      • 理解: 你付了钱或这是对方的职责,所以你“安排”他去做。

      3. Let:许可(Can/May)

      let 语气最轻,表示“准许”或“不阻拦”。

      • 结构: let + 人 + 动词原形
      • 还原逻辑: Let him go. $\approx$ He **can/may** go (I won't stop him).

      4. Get:说服(To = 努力的过程)

      get 是使役动词里的“异类”,因为它后面必须带 to

      • 结构: get + 人 + to do
      • 还原逻辑: I got him to help me. $\approx$ I persuaded him until he **was able to** help me.
      • 为什么带 to? 还记得你之前读的吗?to 带有“目的、过程、努力”的意味。get 强调的是你通过说服、诱导、求助等“过程”,才让对方同意去做。这个“to”体现了从“没做”到“去做”的转变过程。

      5. 总结与对比(一目了然表)

      | 动词 | 含义 | 语气强度 | 后面接什么? | 记忆口诀 | | --- | --- | --- | --- | --- | | Make | 强迫 | ★★★★★ | 原形 (do) | 必须做! | | Have | 安排/委托 | ★★★☆☆ | 原形 (do) | 职责所在。 | | Let | 允许 | ★☆☆☆☆ | 原形 (do) | 随他去吧。 | | Get | 说服/诱导 | ★★☆☆☆ | to do | 费劲劝说。 |


      💡 深度进阶:被动语态中的“完成式”逻辑

      使役动词还有一个非常常用的结构,用来表达“某事被做了”:

      have / get + 某物 + 过去分词 (done)

      • I had my hair cut. (我理发了 —— 头发“被理”了)
      • I got my homework finished. (我把作业写完了 —— 作业“被完成”了)

      这里使用过去分词 (done),正是因为这件事已经“完成了”,符合你之前读到的“分词表示事实/状态”的逻辑。

      这就是使役动词的逻辑体系。你现在能理解为什么 get 后面要加 to,而 make 后面不加了吗?

    1. Author response:

      General Statements

      We thank the reviewers for their thoughtful and constructive comments on our manuscript. We have thoroughly considered all points raised and have made extensive revisions to address them. These revisions have significantly strengthened the manuscript.

      In summary, the key revisions and clarifications include:

      (1) Developmental Time-Course: To address the need for earlier phenotypic analysis, we have performed new immunofluorescence experiments at 30 days after hatching (dah). This new data (Fig. S7) precisely pinpoints the onset of the Leydig cell differentiation defect in dhh<sup>-/-</sup> mutants, establishing ~30 dah as the critical window for Dhh action.

      (2) Role of Ptch1 and Ptch2: We have qualified our conclusions regarding receptor specificity throughout the text to accurately reflect our findings and the limitation posed by the early lethality of ptch1 mutants. The in vivo genetic evidence for Ptch2 (the rescue of dhh<sup>-/-</sup> by ptch2<sup>-/-</sup>) is emphasized, while we now explicitly state that a role for Ptch1 cannot be ruled out without future conditional knockout models.

      (3) Mechanism between Gli1 and Sf1: In direct response to the reviewers' request for stronger evidence, we have performed a new cold probe competition assay. This experiment provides dose-dependent, biochemical evidence for the specificity of Gli1 binding to the sf1 promoter (New Fig. 5E). Furthermore, we have revised the text throughout the manuscript to use more precise language (e.g., "Gli1 activates sf1 expression") and removed overstated claims of "direct" regulation.

      (4) Methodological Rigor and Controls: We have added crucial negative controls for all RNA-FISH experiments using sense probes (New Fig. S9), provided detailed quantification methods for immunofluorescence, clarified the number of biological replicates for transcriptomic analyses, and corrected statistical tests as recommended.

      (5) Clarity and Presentation: We have revised the text for clarity, expanded the description of the TSL cell line's validation in the Introduction, added missing details to figure legends and methods, and incorporated suggested key references.

      We believe that our detailed responses and the significant new data and textual revisions have fully addressed the reviewers' concerns and have substantially improved the quality and impact of our manuscript.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.

      Major comments:

      (1) Are the key conclusions convincing?

      Most results as reported are convincing; however, some conclusions are premature as additional experiments are required to satisfy their claims. For example, the phenotype of the dhh-/- testis is convincing in that Cyp1c1 cells are missing and the addition of ptch2-/- rescues the phenotype indicating a direct path. The link from gli to sf1, however, requires additional study to validate the direct relationship (see item 3 below).

      We thank the reviewer for the positive assessment that our principal findings are convincing. Regarding the connection between Gli1 and Sf1, we agree that additional validation was important. We have now performed new experiments and revised our text. As detailed in our response to item 3 below, we have incorporated a cold probe competition assay (new Fig. 5E) which provides dose-dependent evidence for the specificity of Gli1 binding to the sf1 promoter. Furthermore, we have toned down our conclusions in the manuscript.

      (2) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Major: Most significant premature claim is the statement that gli1 directly controls sf1 activity. Additional experiments are required to make this claim (see next statement).

      We agree with the reviewer that the claim of "direct" control was premature. We have therefore revised the manuscript accordingly. All statements claiming "direct" regulation of sf1 by Gli1 have been removed or replaced with more accurate descriptions, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1." These changes, coupled with the new functional data from the cold probe competition experiment (Fig. 5E) described in our response to item 3, now provide a robust and appropriately qualified account of our findings.

      Minor: As addressed in the discussion section, the ptch1 animals fail to survive limiting the ability to validate both ptch1 and ptch2 roles. Thus, the conclusion that only ptch2 is required should be qualified.

      We thank the reviewer for this rigorous comment. We fully acknowledge the limitation imposed by the early lethality of ptch1 mutants, which precludes a definitive in vivo assessment of its potential role in postnatal testis development. In direct response to this point, we have revised the text throughout the manuscript to more accurately reflect the strength of our conclusions. Specifically, in the Results section, we now state that “This differential receptor requirement implies that Ptch2 likely acts as the functional receptor for transducing Dhh signals in TSL cells” (lines 174–176). Furthermore, we have strengthened the Discussion by explicitly stating: “Therefore, while our findings strongly nominate Ptch2 as the principal receptor for Dhh in SLCs, a definitive exclusion of a role for Ptch1 will require future studies employing Leydig cell–specific conditional knockout models” (lines 265–268). We believe these revisions provide a appropriately qualified interpretation of our data while maintaining the compelling narrative of Ptch2's primary role.

      Major: There are a couple of key references missing however, please consider including:

      - Kothandapani A, Lewis SR, Noel JL, Zacharski A, Krellwitz K, Baines A, Winske S, Vezina CM, Kaftanovskaya EM, Agoulnik AI, Merton EM, Cohn MJ, Jorgensen JS.PLoS Genet. 2020 Jun 4;16(6):e1008810. doi: 10.1371/journal.pgen.1008810. eCollection 2020 Jun.PMID: 32497091

      - Park SY, Tong M, Jameson JL.Endocrinology. 2007 Aug;148(8):3704-10. doi: 10.1210/en.2006-1731. Epub 2007 May 10.PMID: 17495005

      We have included the key references: Kothandapani A, et al. (2020). PLoS Genet. and Park SY, et al. (2007). Endocrinology.

      (3) Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Additional experiments are suggested to strengthen the direct connection between gli1 and sf1:

      Major: Figure 5F shows evidence for increased sf1-luc activity upon co-transfection of OnGli1 in TSL cells. These data would be strengthened with evaluation of the same sf1 promoter that has each/both putative GLI binding sites mutated.

      We thank the reviewer for this insightful suggestion. To further strengthen the evidence for the functional connection between Gli1 and the sf1 promoter, we have performed a new cold probe competition experiment. Given the potential presence of other unpredicted Gli-binding motifs within the 5-kb sf1 promoter region and the practical constraints, we employed an alternative, robust biochemical approach. This assay used a wild-type oligonucleotide containing the canonical Gli-binding motif (GACCACCCA) as a specific competitor. As shown in the new Fig. 5E, this cold probe caused a significant, dose-dependent reduction in Gli1-induced sf1-luc activity, while a mutated control probe (TTAATTAAA) had no effect. This result provides strong evidence that Gli1-mediated transactivation of the sf1 promoter is dependent on its specific binding to this consensus motif.

      Furthermore, in response to the reviewer's comment, we have revised the manuscript text to use more precise language, such as "Gli1 activates sf1 expression" and "Sf1 is a key transcriptional target of Gli1," toning down any overstated claims of direct regulation. Together with the existing data-which includes the original luciferase assay, the new competition experiment, and key loss-of-function/gain-of-function genetic evidence from SLCs transplantation-we believe our study now provides a compelling and multi-faceted case for Gli1 being the key regulator of sf1 within this pathway. We are confident that these revisions have satisfactorily addressed the point raised.

      Major: All 8xGli-luciferase assays should include evaluation of the mutant 8xGli-luciferase plasmid as a negative control.

      We thank the reviewer for highlighting the importance of reporter assay controls. In our study, we included the empty vector pGL4.23, which lacks any Gli-binding sites, as the fundamental negative control. As shown in Fig. 4C, this vector showed minimal background activity that was unresponsive to Dhh, confirming that the strong luciferase induction in the 8xGli-reporter is entirely dependent on functional Gli-binding sites. While a mutated 8xGli construct is one valid approach, we think that the use of an empty vector is functionally equivalent and equally rigorous for establishing specificity. We are confident that our current data unambiguously demonstrate Gli-dependent activation. For clarity, we have explicitly stated in the figure legend and methods that pGL4.23 served as the negative control.

      Minor: Figure 5D experiment that includes TSL-gli1(also 2,3) +/- OnDhh; please examine whether the absence of Gli affects expression of sf1 in each condition. In other words, provide a loss-of-function of Gli connection to regulation of sf1.

      We measured the mRNA expression levels of sf1 in TSL-WT, TSL-gli1<sup>-/-</sup>, TSL-gli2<sup>-/-</sup>, and TSL-gli3<sup>-/-</sup> cells using qRT-PCR. The results are presented in the new Supplementary Figure S8A. The results show that the loss of gli1 leads to a significant reduction in the expression of sf1. In contrast, the knockout of gli2 or gli3 had no significant effect on sf1 expression levels.

      (4) Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      Given the expertise, it is not anticipated that the suggested experiments would be a significant burden to this group.

      We appreciate the reviewer's considerations. Now, we have performed the additional key experiments, which have been incorporated into the revised manuscript. We believe these new data have fully addressed the points raised.

      (5) Are the data and the methods presented in such a way that they can be reproduced?

      Most methods are adequately described or referenced to previous detailed description. There were, however, some methods that could benefit from additional details:

      Major: IF quantification data: please provide details on how the number of positive cells were quantified and presented, for example, how many cells from how many sections for each genotype were included for the analysis?

      We have added relevant information in the "Materials and Methods" section in line 369-373: “For each biological replicate (n\=5-6 fish per genotype), three non-serial, non-adjacent testis sections were analyzed. From each section, three representative fields of view were captured to ensure non-overlapping sampling. All positive cells number of Vasa, Sycp3 and Cyp11c1 was quantified by Image J Pro 1.51 software using default parameters.”

      Major: FISH: No controls are present, for example, scrambled RNA probes. Further, please clarify or address the significant presence of message in the nucleus.

      As suggested, we have now included negative control experiments using sense RNA probes for all genes (ptch1, ptch2, gli1, gli2, gli3). These controls showed no specific signal, confirming the specificity of our antisense probe hybridization. These data are now presented in the new Supplementary Figure S9.

      Major: TSL cells: TSL-onDhh, -onSf1: provide evidence for increase in expression

      We measured the mRNA expression levels of dhh in TSL-WT and TSL-OnDhh, and sf1 in TSL-WT and TSL-OnSf1 using qRT-PCR. The results are presented in the new Supplementary Figure S8B. The results show that overexpression of Dhh and Sf1 significantly increased the mRNA expression levels of dhh and sf1, respectively.

      Major: TSL + SAG cells and other treatments in general: how long were they treated before transplantation?

      Response: We have added relevant information in the "Materials and Methods" section in line 398-399: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before transplantation.”

      Major: Transcriptome analyses: how many replicates were used for each cell line? Please clarify-the results presented in Fig 5E: how was this plot generated, it is interpreted that all three cell lines were combined and compared to the WT line. It is not clear how this was achieved.

      We have added relevant information in the "Materials and Methods" section in line 445-447: “For the SAG treatment experiment, TSL cells were incubated with 0.5 μM SAG for 48 hours before collection. For each genotype, cells from three independent culture wells were pooled.

      Added relevant information in the "Results" section in line 198-202: “…we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions.”

      (6) Are the experiments adequately replicated and statistical analysis adequate?

      Most are adequate and appropriate, some questions remain:

      - Transcriptomes-how many replicates (see above)?

      - IF quantification-how were cells identified/how many sections (see above)?

      Minor: Statistics: methods indicate that a student's t-test was used, but ANOVA's are also used, which is appropriate. There are data presented that should be reevaluated via an ANOVA: Figure 4D, 4N-R; Figure 5G-no stats indicated in figure legend.

      We sincerely thank the reviewer for highlighting the inappropriate use of statistical tests in our original submission. We have re-analyzed all data using the ANOVA-based methods as suggested in the specific detail. We confirm that these changes do not alter the overall interpretation of our results but provide a more robust and statistically sound foundation for our conclusions. We changed “Differences were determined by two-tailed independent Student's t-test” to “Statistical significance was determined by one-way ANOVA followed by Tukey's test (C, Q-U, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (D) (*, P < 0.05; **, P < 0.01; NS, no significant difference).”

      In lines 719-721 we added “Statistical significance was determined by one-way ANOVA followed by Tukey's test (E, different letters above the error bar indicate statistical differences at P < 0.05) or Student's t-test (B, H) (*, P < 0.05; **, P < 0.01; NS, no significant difference).” in line 745-747.

      Reviewer #1 (Significance):

      The data presented in this manuscript provides important context towards the connection between the DHH pathway, Sf1, and steroidogenesis.

      The audience would likely include developmental biologists, including those related to differentiation of any hormone producing cell type and especially those focused on steroidogenesis onset. Clinical interests will be related to sex determination and differentiation, especially related to male sex phenotype differentiation. Basic scientists will be especially interested.

      Expertise: mouse fetal testis differentiation and maturation, steroidogenesis, hedgehog, sf1. Good fit except for the animal model, but they are surprisingly similar.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this work, Zhao et al., investigated the role of Dhh signaling pathway in the proliferation and differentiation of leydig lineage cells in the testes of Nile tilapia, an economic important farmed fish. By generating dhh mutants, the authors showed that loss of Dhh in tilapia recapitulated mammalian phenotypes, characterized by testicular hypoplasia and androgen insufficiency. A previous established TSL line was used to rescue the deficits in dhh-/- testes, which demonstrated that Dhh regulates the differentiation of SLCs rather than their survival. By generating mutant TSL lines, the authors aimed to identify the downstream players under Dhh in tilapia. Based on the data, the authors propose that a dhh-ptch2-gli1-sf1 axis exists in leydig cell lineage development.

      How secreted dhh from Sertoli cells affect the Leydig cells remains elusive. While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model. Unfortunately, this work is not well performed, and the conclusions are not well supported by the current data. And to reach logic conclusions, more meaningful experiments should be performed, and more convincing data should be provided.

      Strength:

      The authors used genetic mutants, TSL lines, and cell transplantation techniques to address the questions. The manuscript is technically sound, and overall is well-written.

      Limitations:

      Experimental design should be optimized, and more convincing data should be provided to reach solid conclusion.

      (1) The SLCs (stem leydig cells) used in this work. The SLC line was established from 3-month-old immature XY tilapia. The authors claimed that this line is a SLC line only because they express a few Leydig markers such as pdgfra and nestin. However, in my opinion, the identity of the cell line is not clear. It is suggested to perform more experiments, including flow cytometry assay or single cell RNA sequencing analysis, to further characterize this line, to demonstrate that this line is a real SLCs that are equivalent to the SLCs in 3-month testes of tilapia. According to the previous publication (2020), the information about the line was not well presented.

      We thank the reviewer for this comment regarding the characterization of the TSL cell line. The identity of TSL as a stem Leydig cell line was rigorously established in our previous publication (Huang et al., 2020), which provided comprehensive molecular, in vitro, and in vivo functional evidence that meets the definitive criteria for an SLC. This includes its stable expression of established SLC markers (pdgfrα, nestin, coup-tfii), its capacity to differentiate into steroidogenic cells producing 11-KT in vitro, and most critically, its ability to colonize the testicular interstitium, differentiate into Leydig cells, and restore androgen production upon transplantation in vivo.

      In direct response to the reviewer's point, we have revised the Introduction of our manuscript to provide a more detailed and clear description of the TSL line's origin and validation (lines 95-105) as “Furthermore, a stem Leydig cell line (TSL) has been established from the testis of a 3-month-old Nile tilapia. TSL expresses platelet-derived growth factor receptor α (pdgfrα), nestin, and chicken ovalbumin upstream promoter transcription factor II (coup-flla), which are usually considered as SLC-related markers in several other species. Notably, this cell line exhibits the capacity to differentiate into 11-ketotestosterone (11-KT)-producing Leydig cells both in vitro and in vivo. When cultured in a defined induction medium, TSL cells differentiate into a steroidogenic phenotype, expressing key steroidogenic genes including star1, star2, and cyp11c1, and producing 11-KT; upon transplantation into recipient testes, TSL cells successfully colonize the interstitial compartment, activate the expression of steroidogenic genes, and restore 11-KT production”, ensuring that readers can fully appreciate its well-founded identity as a SLC model without needing to consult the original publication. We are confident that the existing body of evidence solidly supports all conclusions drawn from its use in this study.

      (2) How loss of dhh affects testicular and the leydig cell lineage development are not clearly investigated. In the current manuscript, the characterization of dhh mutant was not enough and lack of in-depth investigation. The authors primarily looked at testes at 90 dph when Leydig cell lineage was well developed. In my opinion, this time was too late. To investigate the earlier events that are affected by loss of dhh, I suggested to perform experiments at earlier time points, in particular around the initiation stages of the sex differentiation and Lyedig cell specification/maturation.

      We thank the reviewer for this insightful comment. We agree that a thorough developmental analysis is crucial. In response to this point, we have now performed an in-depth investigation at earlier stages to precisely define the phenotype onset.

      Our revised manuscript includes new data from a developmental time-course analysis. While our initial characterization included 5, 10, and 20 dah, we now identified 30 dah as the critical window for Leydig cell differentiation onset, which was also supported by prior work (Zheng et al.). Our new immunofluorescence data at 30 dah now clearly show that Cyp11c1-positive cells are present in wild-type testes but are entirely absent in dhh<sup>-/-</sup> mutants (Fig. S7). This finding pinpoints the initial failure of SLC differentiation.

      We have integrated this key finding into the Discussion (lines 234-239) as “To define the onset of Leydig cell differentiation, we performed a developmental time-course analysis. This revealed that Cyp11c1-positive steroidogenic cells first appear in wild-type testes at 30 dah, while being conspicuously absent in dhh<sup>-/-</sup> mutants at this same stage (Fig. S7). This clear temporal pattern establishes ~30 dah as the developmental window when SLCs initiate their differentiation program in the Nile tilapia.”

      Concurrently, our analysis of the 90 dah timepoint remains vital, as it represents a mature stage with robust spermatogenesis and a stabilized somatic niche. This allows for a comprehensive assessment of the ultimate functional consequences of the early differentiation block, including its impact on germ cell support and overall testicular architecture.

      Thus, our study now provides a complete developmental perspective: the 30 dah timepoint identifies the initiation of the Dhh-dependent defect, while the 90-dah analysis reveals the mature, functional outcomes within the intact testicular niche.

      (3) The authors claimed that there was a ptch2-gli1-sf1 axis. The conclusion was drawn largely based on data that generated from the in vitro cultured TSL line. More data from genetic mutant tilapia are required to support the conclusion.

      We thank the reviewer’s insightful comments regarding the need for robust in vivo validation. In fact, our conclusion of a Dhh-Ptch2-Gli1-Sf1 axis is supported by an integrated experimental strategy, combining key in vivo evidence with targeted in vitro analyses to build a coherent model.

      (1) Evidence for Ptch2 as the key receptor: The role of Ptch2 is supported by a pivotal in vivo genetic experiment. The observation that the dhh<sup>-/-</sup> testicular phenotype is fully rescued in dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants provides compelling genetic evidence that Ptch2 is the essential receptor for Dhh in vivo (Fig. 4E-U). We acknowledge that the early embryonic lethality of global ptch1 mutation precludes its functional analysis in postnatal testis development. Therefore, while our data strongly nominate Ptch2 as the principal receptor, we have qualified our conclusions in the revised manuscript to reflect that a role for Ptch1 cannot be definitively excluded without Leydig cell-specific conditional knockout models.

      (2) Evidence for Gli1 and its regulation of Sf1: The role of Gli1 as the key transcriptional effector was efficiently identified using our well-characterized TSL system, a valid approach for dissecting this highly conserved signaling cascade. The functional connection between Gli1 and Sf1 is supported by multiple lines of evidence: transcriptomic profiling, promoter analysis, luciferase reporter assays (including a new cold probe competition experiment), and most importantly, in vivo functional validation via SLC transplantation. The latter demonstrated that Sf1 is both necessary and sufficient for SLC differentiation within the testicular niche (Fig. 5).

      In direct response to the reviewer's points, we have thoroughly revised the manuscript text to ensure all claims are accurately stated, particularly regarding the receptor specificity and the nature of the Gli1-Sf1 regulatory relationship. We believe our study provides a solid foundation for the proposed signaling axis.

      Overall, better experimental design should be planned, including the rescue experiments. Some key information was missed. For instance, the identity of the stem Leydig cells was not clearly presented.

      We have explained it in point #1.

      Figures:

      Figure 1: The authors described the phenotypes at 90 dph. Loss of dhh led to severe phenotypes in testicular formation, as evidenced by defective formation of Vasa, a germline stem cell marker; loss of expression of cyp11c1, a leydig cell marker; and loss of sycp3, a marker of meiosis of spermatogonia.

      However, in my opinion, 90 dph was too late. To investigate the role of dhh in Leydig cell lineage, the authors are suggested to focus on earlier developmental stages when the sex differentiation and maturation of leydig cells occur. This work is actually a development biology one that investigates how dhh loss in Sertoli cells affects the development of Leydig cells. The careful characterization of earliest testicular phenotypes of dhh mutant is very important.

      We have explained it in point #2.

      Figure 2: Please clarify the logic for performing rescue experiments using 11-KT. Provided the critical role of 11-KT in the testis development and spermatogenesis, it was not unexpected that 11-KT treatment can rescue most of the cell types in testes. If dhh is absolutely required for LC lineage development maturation, adding 11-KT at 30 dph will not have an effect. Why not perform rescue experiments using Dhh protein?

      We thank the reviewer for this insightful comment, which allows us to clarify the logical progression of our experimental design, a process central to genetic discovery.

      When we first characterized the dhh<sup>-/-</sup> mutant, we observed a complex suite of phenotypes: testicular hypoplasia, arrested germ cell development, a profound deficiency of Leydig cells, and drastically low androgen levels. A primary challenge was to distinguish which defects were direct consequences of losing Dhh signaling and which were secondary effects of the overall testicular failure.

      We therefore employed a classic genetic strategy: phenotypic dissection through targeted rescue. The 11-KT rescue experiment was designed to test a foundational hypothesis: Are the severe testicular defects in dhh<sup>-/-</sup> mutants primarily a consequence of the systemic androgen deficiency? The results provided a pivotal and clear answer: while 11-KT treatment partially rescued germ cell development and testicular structure, it completely failed to restore the population of Cyp11c1-positive Leydig cells. This critical finding allowed us to dissociate the phenotypes, demonstrating that the Leydig cell defect is a primary, cell-autonomous consequence of Dhh loss, not a secondary effect of low androgen.

      This conclusion logically propelled the next phase of our research: to shift focus from systemic hormone action to the local, niche role of Dhh in regulating the Leydig lineage directly. This led directly to the TSL transplantation experiments and the mechanistic dissection of the Ptch2-Gli1-Sf1 axis within SLCs.

      Regarding the use of Dhh protein, we agree it is a complementary approach. However, producing biologically active, recombinant Hedgehog ligand is challenging due to its essential dual lipid modification, which is required for solubility and activity. Our transplantation experiments with TSL-OnDhh cells (Fig. 3) functionally demonstrate that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, thereby directly addressing the core question without the need for recombinant protein.

      Figure 3. The authors showed that in dhh-/- testes, TSL engrafted equivalently but failed to express Cyp11c1. This result was strange which raised a question about the identity of the TSLs, as I have mentioned above. The authors claimed that the TSLs are stem Leydig cells, which I doubt. Additional data should provided to support the statement.

      In the testicular environment, the transplanted TSLs should be able to colonize and differentiate into more mature leydig cells. Only a small portion of the PKH26-labled TSLs became Cyp11c1 positive after transplantation, can the authors comment this observation?

      To address "Mutation of dhh blocks SLC differentiation", the authors should first carefully examine the TSL lineage development using dhh mutant. Then, investigate how loss of dhh disrupts the cross talk between Sertoli cells and Leydig cells. why bother performing transplanted TSLs? Please clarify. Why not perform rescue experiments using Dhh protein at appropriate developmental stages?

      We thank the reviewer for these comments, which allow us to clarify the rationale and interpretation of our key experiments.

      (1) We have provided comprehensive evidence establishing the TSL line as a SLC line (Response to Point #1). The observation that WT TSL cells engraft but fail to differentiate in the dhh<sup>-/-</sup> testicular environment is not strange; it is, in fact, the core and most crucial finding of this experiment. It provides direct functional evidence that the dhh<sup>-/-</sup> niche lacks the essential signals required to initiate SLC differentiation, consistent with the severe deficiency of endogenous Cyp11c1<sup>+</sup> cells in these mutants (Fig. 1I-J', N).

      (2) The reviewer's concern about "only a small portion" of cells differentiating is based on a misunderstanding. Our quantitative data (Fig. 3F) show that approximately 78% of the transplanted PKH26+ TSL cells successfully differentiated into Cyp11c1<sup>+</sup> cells in WT hosts. This high efficiency robustly demonstrates the differentiation potential of TSL cells and the permissiveness of the WT niche. The near-zero differentiation rate in the dhh<sup>-/-</sup> host (Fig. 3F) starkly highlights the specific and severe defect in the mutant microenvironment.

      (3) The TSL transplantation experiment was the most direct strategy to test why Cyp11c1<sup>+</sup> cells are absent in dhh<sup>-/-</sup> testes. It allowed us to distinguish between a failure in SLC differentiation and other possibilities (e.g., cell death). The finding that functional SLCs cannot differentiate in the mutant niche logically directed our subsequent focus onto the cell-intrinsic molecular mechanism (the Ptch2-Gli1-Sf1 axis) within the Leydig lineage. While Sertoli-Leydig crosstalk is an important area, it was beyond the scope of this study aimed at defining the intrinsic differentiation pathway.

      (4) Regarding Dhh protein rescue, generating bioactive, lipid-modified recombinant Hh protein is technically challenging. Our transplantation of TSL-OnDhh cells (Fig. 3) functionally demonstrates that providing Dhh signaling in a cell-autonomous manner is sufficient to rescue differentiation, effectively addressing this question without the need for recombinant protein.

      Figure S3. “To assess whether dhh mutation affects androgen-producing cells outside Leydig cells, 11-KT levels were analyzed during early testicular development before SLCs differentiation. IF analyses revealed that no Cyp11c1 positive cells were present in the testes of XY WT fish at 5, 10, and 20 dah, indicating that SLCs had not yet differentiated at these stages (Fig. S3A-C). Tissue fluid 11-KT levels showed no significant differences between WT and dhh-/- XY fish at 5, 10, and 20 dah (Fig. S3D)”. These observations suggested that loss of dhh does not affect the specification of SLCs, but affect its differentiation into mature LCs. The differentiation of Cyp11c1 should be later than 20 dah. So when is the earliest time point for formation of Cyp11c1 positive cells, and how loss of dhh affect this? These are important questions to answer.

      We agree with the reviewer's interpretation that our data suggest dhh loss affects SLC differentiation rather than initial specification. In direct response to the need for earlier timepoints, we have now performed and included an analysis at 30 dah, which we identified as the critical window for Leydig cell differentiation onset. Our new data (Fig. S7) show that Cyp11c1+ cells are present in WT testes but are entirely absent in dhh<sup>-/-</sup> mutants at this stage. This precisely pinpoints the initiation of the phenotypic divergence and establishes ~30 dah as the developmental window when Dhh signaling is required to drive SLC differentiation. Our study therefore now provides a complete developmental perspective, from the initial failure at 30 dah to the mature functional outcomes at 90 dah.

      Figure 4. The authors generated ptch1/2 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Ptch2, but not Ptch1, is specifically required for transducing Dhh signals in TSLs. The conclusion was only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments, using ptch mutants, should performed to substantiate this.

      The authors stated “Ptch2 acts as the obligate receptor for Dhh signaling during testis development”. If ptch2 is required for TSL lineage, why ptch2-/- testes exhibited no significant differences in testicular histology and Leydig cell (Cyp11c1+) populations and serum 11-KT levels? This contradictory statement need to be addressed.

      We thank the reviewer for these critical comments, which allow us to clarify the logic underlying our conclusions regarding Ptch2.

      (1) In Vivo Genetic Evidence for Ptch2: Our conclusion that Ptch2 is the primary receptor for Dhh is not based solely on the TSL luciferase assays. It is definitively supported by a key in vivo genetic experiment: the complete phenotypic rescue in the dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> double mutants (Fig. 4F-R). In genetic terms, the loss of the receptor (ptch2) suppressing the phenotype caused by the loss of the ligand (dhh) is classic evidence for a ligand-receptor relationship within a linear pathway. This in vivo evidence strongly substantiates Ptch2's role at the animal level. The early embryonic lethality of ptch1 mutants precludes a similar in vivo test for Ptch1 in postnatal testis development.

      (2) Addressing the Apparent Contradiction of the ptch2<sup>-/-</sup> Phenotype: The reviewer raises an excellent point, which stems from the fundamental biology of the Hh pathway as shown in Author response image 1. Ptch receptors are inhibitory. In the absence of ligand, Ptch suppresses pathway activity.

      Author response image 1.

      The canonical Hh signaling pathway. In the dhh<sup>-/-</sup> mutant, the pathway is suppressed due to unopposed Ptch activity, leading to a failure in SLC differentiation. In the ptch2<sup>-/-</sup> mutant, this key inhibitory brake is removed, leading to constitutive activation of the pathway. The fact that ptch2<sup>-/-</sup> testes are normally indicates that this level of pathway activation is not detrimental and, crucially, is sufficient to support wild-type levels of Leydig cell development and steroidogenesis. This lack of a phenotype in the receptor mutant, contrasted with the severe ligand mutant phenotype, is a common and expected observation in signaling pathways where the receptor acts as a tonic inhibitor.

      In summary, the normal development of ptch2<sup>-/-</sup> testes is not contradictory but is entirely consistent with its role as the inhibitory receptor for Dhh. The severe phenotype in dhh<sup>-/-</sup> mutants and its specific rescue by removing ptch2 provides compelling genetic evidence for their functional relationship. We have revised the text throughout the manuscript to ensure these conclusions are accurately stated.

      Figure 5. The authors generated gli1/2/3 mutant TSL lines, and luciferase assay was performed, and based on the results, the authors concluded that Gli1, but not Gli2/3, was specifically required for transducing Dhh signals in TSL cells. The conclusion is drawn, only based on luciferase assay using TSLs. Whether this was the case in testes at animal level is not clear. Clearly, more genetic experiments should performed to substantiate this, using the gli mutant fish.

      To identify Gli1-dependent targets in SLCs, the authors compared transcriptomes of TSLWT, Dhh-overexpressing (TSL-OnDhh), Gli1-overexpressing (TSL-OnGli1), and SAG-treated (TSL+ SAG) TSL cells. While this experiments can be used to identify dhh target genes, it is better to use gli mutant cell lines. Since the authors have generate gli1/2/3 mutants, why not using these mutant fish to identify/confirm the Gli targets?

      We thank the reviewer for these comments.

      (1) We acknowledge that Gli1 as the key transcriptional effector is primarily based on our in vitro evidence using the TSL cell line. We have revised the manuscript accordingly to ensure this is stated precisely, avoiding overstatement.

      (2) Concerning the transcriptomic analysis, the reviewer suggests using glis mutant cell lines. While this is a valid approach, our strategy of profiling pathway activation (via Dhh/Gli1 overexpression or SAG treatment) was deliberately chosen to provide a high signal-to-noise ratio for identifying genes that are positively upregulated during the differentiation process. Analyzing loss-of-function mutants under basal conditions can be confounded by potential compensatory mechanisms among the Gli family members, potentially masking the specific transcriptional signature of pathway activation we sought to capture.

      By the way, we have generated gli1/2/3 mutant TSL cell lines for the functional luciferase assays, but we have not generated the corresponding glis mutant fish lines, which would represent a substantial new line of investigation.

      Reviewer #2 (Significance):

      While previous studies have revealed the paracrine role of Sertoli cell secreted Dhh in the regulation of Leydig cell development and maturation, the authors provided some new insights into the issue using tilapia as a model.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      The authors investigate the Dhh signaling pathway in Leydig cell differentiation in the tilapia model. They generated multiple mutant lines in different hedgehog pathway components and utilized a Leydig stem cell line to interrogate Leydig cell differentiation. Through this analysis, the authors demonstrate that Dhh regulates Leydig differentiation rather than survival. They also found that Ptch2 is the specific receptor that mediates signaling to promote Leydig differentiation and that Gli1 is the primary Gli involved. Furthermore, they show that a known regulator of Leydig cell development and function, SF1, is a downstream transcriptional target. Overall, the study identifies previously unknown information as to how Dhh signaling regulates Leydig cell development, which is necessary for testosterone production by the testis.

      Major Comments

      (1) In the RNAseq analysis is not clear exactly how the 33 "up-regulated" genes were identified. What was the methodology for identification of these genes? Some of the genes were down-regulated or not different in the OnGli condition and some in the OnDhh condition were not differentially expressed, as shown in Fig S8B. Therefore, it is unclear why all 33 genes are classified as upregulated "across all three conditions".

      We have clarified this methodology in the Materials and Methods section in line 452-454: “Differentially expressed genes (DEGs) were identified for each condition (TSL-OnDhh, TSL-OnGli1, TSL+SAG) compared to TSL-WT controls using edgeR (threshold: FDR < 0.05, |log2(foldchange)| ≥ 1.5). And we Added relevant information in the Results section in line 198-202: we performed transcriptomic profiling of TSL cells under conditions of pathway activation: Dhh overexpression (TSL-OnDhh), Gli1 overexpression (TSL-OnGli1), and SAG treatment (TSL+SAG). Comparative RNA-seq analysis identified a core set of 33 genes consistently upregulated across all three conditions (Fig. 5C, S6A).”

      We have also updated Fig. S8B to include a clear value and to better visualize the FPKM value levels of these 33 genes across the conditions.

      (2) In figure 4A (and possibly B), it appears that ptch RNA is in the nucleus of the cell. Why would the RNA be primarily in the nucleus? Is the RNA detection accurate? Were controls done? The methods state that sense probes were made but no how they compared to the antisense probes. This comment can also be applied to the gli FISH, particularly gli3 (Figure 5).

      This is an excellent observation. We speculate that the apparent nuclear signal may be due to strong transcriptional activity in the nucleus. To confirm the specificity of our FISH experiment, we performed FISH with sense RNA probes as negative controls for all genes (ptch1, ptch2, gli1, gli2, gli3), and no specific signals were observed (see New Fig. S9).

      Minor comments

      (1) In the introduction, please include information as to when tilapia reach sexual maturity

      We have added this information to the Introduction in line 91-92: early sexual maturity (approximately 3 months after hatching for males and 6 months after hatching for females).

      (2) When first mentioning experiments that use the PKH26 dye, please give a brief description of the dye in the text of the results. This is described in the methods but it would be helpful to have some information about what PKH26 is in the results to more easily understand the figure and experimental design.

      We have added a brief description in the Results section in line 151-152: “To dissect Leydig cell lineage impairment in dhh<sup>-/-</sup> testes, we transplanted the TSL labeled with PKH26 (a fluorescent red hydrophobic membrane dye that enables tracking of transplanted cells) into WT and dhh<sup>-/-</sup> testes (Fig. 3A).”

      (3) In the statistical analysis section of the methods, the authors state that two-tailed t-tests were performed however in the figure legends it states that ANOVA was done for some of the statistical analysis. Please clarify this.

      We have updated the Statistical Analyses section in Methods to clarify in line 472-476: “A two-tailed independent Student’s t-test was used to determine the differences between the two groups. One-way ANOVA, followed by Tukey multiple comparison, was used to determine the significance of differences in more than two groups. P < 0.05 was used as a threshold for statistically significant differences.”

      (4) Figures - in figures that have charts with the Y-axis labeled as "relative positive cells", or similar, please explain what exactly is meant by "relative". What is it relative to?

      We have revised all relevant Y-axis labels and figure legends to explicitly state the quantification method. For example, we now use: "Vasa<sup>+</sup> / DAPI<sup>+</sup> (%), Sycp3<sup>+</sup> / DAPI<sup>+</sup> (%) or Cyp11c1<sup>+</sup> / DAPI<sup>+</sup> (%).

      (5) Figure 1: please point out the testes in panels A and B

      We have indicated the position of the testes with arrows in Figures 1A and B.

      (6) In figure 4, it would be helpful for the WT images from S7 moved to fig 4.

      We have moved representative WT images from Fig. S7 into Fig. 4 for easier comparison with the mutant phenotypes.

      (7) Figure 4E: Are the yellow bars comparable to each other. Is there any significance to the increased luciferase with 8xGli in ptch2-/- as compared to the other genotypes?

      We thank the reviewer for this astute observation. Yes, the yellow bars are directly comparable, and the elevated basal luciferase activity of the 8xGli reporter in the ptch2<sup>-/-</sup> TSL cells is indeed significant and expected. The genetic ablation of ptch2 removes this inhibition, leading to ligand-independent, constitutive activation of the downstream signaling cascade. The observed increase in basal reporter activity in the ptch2<sup>-/-</sup> cells is a classic manifestation of this mechanism.

      The primary objective of this experiment was to test the cells' responsiveness to Dhh stimulation across genotypes. The key finding is that while wild-type and ptch1<sup>-/-</sup> cells showed a significant response to Dhh, the ptch2<sup>-/-</sup> cells-which already exhibited high basal activity-were completely unresponsive. This combination of constitutive activation and ligand insensitivity in the ptch2<sup>-/-</sup> genotype provides particularly strong genetic evidence that Ptch2 is the essential receptor mediating Dhh signal transduction in this system.

      (8) Figure 5G: please include what exactly what each construct name stands for in the figure legend

      We have expanded the legend for Fig. 5G to define each construct.

      (9) Figure S8B: please include what the values in the table are (eg are these the significance values?)

      We have updated the caption for Figure S8B (now Figure S6B): “The FPKM value for each gene in each sample is indicated within the squares. The color gradient from blue to red reflects low to high expression levels per row (gene).”

      Reviewer #3 (Significance):

      Strengths and limitations:

      The genetics of the tilapia system and the availability of the tilapia Leydig stem cell lines were particular strengths of this study. The study utilizes fish genetics to genetically interrogate the Dhh signaling pathway in Leydig cell development through generation and analysis of mutant lines. The tilapia Leydig stem cell line was an integral part of this study as it allowed for genetic and chemical manipulation of Dhh signaling in undifferentiated Leydig cells and, through transplantation into testes, allowed for analysis of how Leydig cell differentiation was affected.

      Advance:

      The study makes significant advances as to how Dhh signaling instructs Leydig cell differentiation, including identification of the Ptch receptor and Gli transcription factor that function downstream of Dhh in this process. Furthermore, they identify a direct link between Dhh signaling and Sf1 expression, which is known to important for Leydig cell function.

      Audience:

      This study will be of particular interest to reproductive biologists, endocrinologists, and developmental biologists. The study may also be of interest to researchers and physicians investigating cancers that are promoted by androgens produced by Leydig cells of the testis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterize the relationship between affinity and fitness in the process of affinity maturation. To this end, the authors develop a model of germinal center reaction and a tailored statistical approach, building on recent advances in simulation-based inference. The potential impact of this work is hindered by the poor organization of the manuscript. In crucial sections, the writing style and notations are unclear and difficult to follow.

      We thank the reviewer for their kind words, and have endeavored to address all of their concerns as to the structure and style of the manuscript.

      Strengths:

      The model provides a framework for linking affinity measurements and sequence evolution and does so while accounting for the stochasticity inherent to the germinal center reaction. The model's sophistication comes at the cost of numerous parameters and leads to intractable likelihood, which are the primary challenges addressed by the authors. The approach to inference is innovative and relies on training a neural network on extensive simulations of trajectories from the model.

      Weaknesses:

      The text is challenging to follow. The descriptions of the model and the inference procedure are fragmented and repetitive. In the introduction and the methods section, the same information is often provided multiple times, at different levels of detail.

      Thank you for pointing this out. We have rearranged the methods in order to make the presentation more linear, and to reduce duplication with the introduction.

      Specifically, we moved the affinity definition to the start, removed the redundant bullet point list, and moved the parameter value table to the end.

      This organization sometimes requires the reader to move back and forth between subsections (there are multiple non-specific references to "above" and "below" in the text).

      This is a great point, we have either removed or replaced all references to "above" or "below" with more specific citations.

      The choice of some parameter values in simulations appears arbitrary and would benefit from more extensive justification. It remains unclear how the "significant uncertainty" associated with these parameters affects the results of inference.

      We have clarified where various parameter values come from:

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      In addition, the performance of the inference scheme on simulated data is difficult to evaluate, as the reported distributions of loss function values are not very informative.

      We thought of two different interpretions for this comment, so have worked to address both.

      First, the comment could have been that the distribution of loss functions on the training sample does not appear to be informative of performance on data-like samples. This is true, and in our revision we have emphasized the distinction between the two types of simulation sample: those for training, where each simulated GC has different (sampled) parameter values; vs the "data mimic" samples where all GCs have identical parameters. Since the former have different values for each GC, we can only plot many inferred curves together on the latter. We also would like to emphasize that the inference problem for one GC will have much more uncertainty than will that for an ensemble of GCs (as in the full replay experiment).

      “After building and training our neural network, we evaluate its performance on subsets of the training sample. While this evaluation provides an important baseline and sanity check, it is important to note that the training sample differs dramatically from real data (and the “data mimic” simulation sample that mimics real data). While real data consists of 119 GCs with identical parameters and thus response functions, we need the GCs in our training sample to span the space of all plausible parameter values. This means that while we must evaluate performance on individual GCs in the training and testing samples, in real data (and data mimic simulation) we combine results from 119 curves into a central (medoid) curve. Inference on the training sample will thus appear vastly noisier than on real data and data mimic simulation, and also cannot be plotted with all true and inferred curves together.”

      A second interpretation was that the reviewer did not have an intuitive sense of what a loss function value of, say, 1.0 actually means. To address this second interpretation, we have also added a supplement to Figure 2 with several example true and inferred response functions from the training sample, with representative loss values spanning 0.17 to 2.18. We have also added the following clarification to the caption of Figure 1-figure supplement 2:

      “The loss value is thus the fraction of the area under the true curve represented by the area between the true and inferred curves.”

      Finally, the discussion of the similarities and differences with an alternative approach to this inference problem, presented in Dewitt et al. (2025), is incomplete.

      We have expanded this section of the manuscript, and added a new plot directly comparing the methods.

      “In order to compare more directly to DeWitt et al. 2025, we remade their Fig.S6D, truncating to values at which affinities are actually observed in the bulk data, and using only three of the seven timepoints (11, 20, and 70, Figure 8, left). We then simulated 25 GCs with central data mimic parameters out to 70 days. For each such GC, we found the time point with mean affinity over living cells closest to each of three specific “target” affinity values (0.1, 1.0, 2.0) corresponding to the mean affinity of the bulk data at timepoints 11, 20, and 70. We then plot the effective birth rates of all living cells vs relative affinity (subtracting mean affinity) at the resulting GC-specific timepoints for all 25 GCs together Figure 8, right). Note that because each GC evolves at very different and time-dependent rates, we could not simply use the timepoints from the bulk data, since each GC slice from our simulation would then have very different mean affinity. The mean over GCs of these GC-specific chosen times is 10.9, 24.5, 44.4 (compared to the original bulk data time points 11, 20, 70). It is important to note that while the first two target affinities (0.1 and 1.0) are within the affinity ranges encountered in the extracted GC data, the third value (2.0) is far beyond them, and thus represents extrapolation to an affinity regime informed more by our underlying model than by the real data on which we fit it.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a new approach for explicitly transforming B-cell receptor affinity into evolutionary fitness in the germinal center. It demonstrates the feasibility of using likelihood-free inference to study this problem and demonstrates how effective birth rates appear to vary with affinity in real-world data.

      Strengths:

      (1) The authors leverage the unique data they have generated for a separate project to provide novel insights into a fundamental question. (2) The paper is clearly written, with accessible methods and a straightforward discussion of the limits of this model. (3) Code and data are publicly available and well documented.

      Weaknesses (minor):

      (1) Lines 444-446: I think that "affinity ceiling" and "fitness ceiling" should be considered independent concepts. The former, as the authors ably explain, is a physical limitation. This wouldn't necessarily correspond to a fitness ceiling, though, as Figure 7 shows. Conversely, the model developed here would allow for a fitness ceiling even if the physical limit doesn't exist.

      Right, whoops, good point. We've rearranged the discussion to separate the concepts, for instance:

      “While affinity and fitness ceilings are separate concepts, they are closely related. An affinity ceiling is a limit to affinity for a given antigen: there are no mutations that can improve affinity beyond this level. This would result in a truncated response function, undefined beyond the affinity ceiling. A fitness ceiling, on the other hand, is an upper asymptote on the response function. Such a ceiling would result in a limit on affinity for a germinal center reaction, since once cells are well into the upper asymptote of fitness they are no longer subject to selective pressure.”

      (2) Lines 566-569: I would like to see this caveat fleshed out more and perhaps mentioned earlier in the paper. While relative affinity is far more important, it is not at all clear to me that absolute affinity can be totally ignored in modeling GC behavior.

      This is a great point, we've added a mention of this where we introduce the replay experiment in the Methods:

      “It is important to note that this is a much lower level than typical BCR repertoires, which average roughly 5-10% nucleotide shm.”

      And expanded on the explanation in the Discussion:

      “Some aspects of behavior in the low-shm/early times regime of the extracted GC data are also potentially different to those at the higher shm levels and longer times found in typical repertoires. This is especially relevant to affinity or fitness ceilings, to which we likely have little sensitivity with the current data.”

      (3) One other limitation that is worth mentioning, though beyond the scope of the current work to fully address: the evolution of the repertoire is also strongly shaped by competition from circulating antibodies. (Eg: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600904/, http://www.sciencedirect.com/science/article/pii/S1931312820303978). This is irrelevant for the replay experiment modeled here, but still an important factor in general repertoires.

      Yes good point, we've added these citations in a new paragraph on between-lineage competition:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013: McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors are encouraged to follow the suggestions of manuscript re-organization by Reviewer 1, in order to improve readability. We would also like to suggest improving the discussion of the traveling wave model to explain it in a more self-contained way. In passing, please clarify what is meant by 'steady-state' in that model. A superficial understanding would suggest that the only steady state in that model would be a homogeneous population of antibodies with maximum affinity/fitness.

      These are great suggestions. We have substantially rearranged the text according to Reviewer 1's suggestions, especially the Methods, and expanded on and rearranged the traveling wave discussion. We've also clarified throughout that the traveling wave model is assuming steady state with respect to population. In the public response to reviewer 1 above we describe these changes in more detail.

      Reviewer #1 (Recommendations for the authors):

      I suggest that the organization of the paper be reconsidered. The current methods section is long and at times repetitive, making it impossible to parse in a single reading. Moving some technical details from the main text to an appendix could improve readability. Despite the length of the methods section, many important points, such as justification of choices in model specification or values of parameters, are treated only briefly.

      We have rearranged the methods section, particularly the discussion of our model, and have more clearly justified choices of parameter values as described in the public response.

      Discussion of similarities and differences with reference to Dewitt et al. 2025 should be revised, as it's currently unclear whether the method presented here has any advantages.

      We have expanded this comparison, and emphasized the main disadvantage of the traveling wave approach: there is no way of knowing whether by abstracting away so much biological detail it misses important effects. We have also emphasized that the two approaches use different types of data (time series vs endpoint) which are typically not simultaneously available:

      “The clear advantage of the traveling wave model is its simplicity: if its high level view is accurate enough to effectively model the relevant GC dynamics, it is far more tractable. But reproducing low-level biological detail, and making high-dimensional real data comparisons (e.g. Figure 5) to iteratively improve model fidelity, are also useful, providing direct evidence that we are correctly modeling the underlying biological processes. The two approaches also utilize different types of data: we use a single time point, and thus must reconstruct evolutionary history; whereas the traveling wave requires a series of timepoints. The availability of both types of data is a unique feature of the replay experiment, and provides us with the opportunity to directly compare the approaches.”

      The results obtained from the same data should be directly compared (can the response function be directly compared to the result in Figure S6D in Dewitt et al., 2025? If yes, it should be re-plotted here and compared/superimposed with Figures 6 and 7). The text mentions the results differ, but it remains ambiguous whether the differences are significant and what their implications are.

      We've added a new Figure 8, comparing a modified version of the traveling wave Fig S6D to a new plot derived from our results using the data mimic parameters. While the two plots represent fundamentally different quantities, they do put the results of the two methods on an approximately equal footing and we see nice concordance between them in regions with significant data (they disagree substantially for larger negative affinities). We have also added emphasis to the point that the traveling wave model uses an entirely separate dataset to what we use here.

      Other comments:

      (1) l. 80: "[in] around 10 days"?

      Text rearranged so this phrase no longer appears.

      (2) l. 96: "an intrinsic rate [given by?] the response function above".

      Text rearranged so this phrase no longer appears.

      (3) Figure 1: The. “specific model” could part be expanded and improved to help make sense of model parameters and the order of different processes in the population model. Example values of parameters can be plotted rather than loosely described, (e.g., y_h+y_c, the upper asymptotes can be plotted in place of the “yscale determines upper asymptotes” label.

      Great suggestion, we've changed the labels.

      (4) The cartoons in the other parts are somewhat cryptic or illegible due to small sizes.

      We have added text in the caption linking to the figures that are, in the figure, intended to be in schematic form only.

      “Plots from elsewhere in the manuscript are rendered in schematic form: those in “infer on data” refer to Figure 4-figure supplement 1, and those in “simulate with inferred parameters” to Figure 5.

      (5) L. 137: It's not helpful to give numerical values before the definition of affinity. (and these numbers are repeated later).

      Good point, we've moved the affinity definition to the previous section, and remove the duplicate range information.

      (6): Table 1: A number of notations are unclear, such as “#seqs/GC” or “mutability multiplier”. The double notation for crucial parameters doesn't help. At the moment the table is introduced, the columns make little sense to the reader, and it's not well specified what dictates the choice or changes of parameter values or ranges.

      We've moved the table further down until after the parameters have been introduced, and clarified the indicated names.

      (7) l. 147: Choices of model are not justified and appear arbitrary (e.g., why death events happen at one of two rate).

      We have clarified the reasoning behind having two death rates.

      (8) l.151: “happened on the edges of developing phylogenetic tree” - ambiguous: do they accumulate at cell divisions? What is a “developing tree”?

      We have removed this ambiguous phrasing.

      (9) l.161: This paragraph is particularly dense.

      We have rearranged this section of the methods, and split up this paragraph.

      (10) l. 164: All the different response functions for different event types? Or only the one for birth, as stated before?

      Yes. This has been clarified.

      (11) l.167: Does the statement in the bracket refer to a unit?

      This has been clarified.

      (12) l. 169: Discussion of the implementation seems too detailed.

      Hopefully the rearranged description is clearer, but we worry that removing the details of events selection would leave some readers confused.

      (13) l. 186: Why describe the methods that, in the end, were not used? Similarly, as a mention of “variety of response functions” seems out of place if only one choice is used throughout the paper. eq. (2): that's mˆ{-1} from eq. (1). Having the two equations using the same notation is confusing.

      We've moved the mention of alternatives to the Discussion, where it is an important source of uncontrolled systematic uncertainty, and removed the extra equation.

      (14) l. 206: Unclear what “thus” refers to.

      Removed.

      (15) l.211: What does “neglecting y_h” mean?

      This has been clarified.

      (16) l. 242: Unclear what “this” refers to.

      Clarified.

      (17) l. 261: What does “model independence” refer to in this context?

      From the sigmoid model. Clarified.

      (18) l. 306: What values for which parameters? References?

      We have clarified and updated this statement - it was out of date, corresponding to the analysis before we started fitting non-sigmoid parameters.

      “In addition to the four sigmoid parameters, which we infer directly, there are other parameters in Table 1 about which we have incomplete information. The carrying capacity method and the choice of sigmoid for the response function represent fundamental model assumptions. We also fix the death rate for nonfunctional (stop) sequences, which would be very difficult to infer with the present experiment. For others, we know precise values from the replay experiment for each GC (time to sampling, # sampled cells/GC), but use a somewhat wider range for the sake of generalizability. The mutability multiplier is a heuristic factor used to match the SHM distributions to data. The naive birth rate is determined by the sigmoid parameters, but has its own range in order to facilitate efficient simulation.

      For two of the three remaining parameters (carrying capacity and initial population), we can ostensibly choose values based on the replay experiment. These values carry significant uncertainty, however, partly due to inherent experimental uncertainty, but also because they may represent different biological quantities to those in simulation. For instance, an experimental measurement of the number of B cells in a germinal center might appear to correspond closely to simulation carrying capacity. However if germinal centers are not well mixed, such that competition occurs only among nearby cells, the "effective" carrying capacity that each cell experiences could be much smaller.

      Fortunately, in addition to the neural network inference of sigmoid parameters, we have another source of information that we can use to infer non-sigmoid parameters: summary statistic distributions. We can use the matching of these distributions to effectively fit values for these additional unknown parameters. We also include the final parameter, the functional death rate, in these non-sigmoid inferred parameters, although it is unconstrained by the replay experiment, and it is unclear whether it is uniquely identifiable.”

      (19) l. 326: "is interpreted as having" or "corresponds to"?

      Changed.

      (20) l. 340: Not sure what "encompassing" means in this context.

      Clarified.

      (21) l. 341: "We do this..." -- I think this sentence is not grammatical.

      Fixed.

      (22) l. 348: "on simulation" -- "from simulated data"?

      Indeed.

      (23) l. 351: "top rows", the figures only have one row.

      Fixed.

      (24) Figure 2: It's difficult to tell from the loss function itself whether inference on simulated data works well. Why not report the simulated and inferred response functions? The equivalent plots in Figure 5 would also be informative. Has inference been tested for different "sigmoid parameters" values?

      This is an important point that was not clear, thanks for bringing it up. We have expanded on and emphasized the differences between these samples and the reasoning behind their different evaluation choices. Briefly, we can't display true vs inferred response functions on the training samples since the curves for each GC are different -- the plot would be entirely filled in with very different response function shapes. This is why we do actual performance evaluation on the "data mimic" samples, where all GCs have the same parameters. Summary stats (like Fig 5) for the training sample are in Fig 5 Supplement 2.

      (25) l. 354: Unclear what "this" refers to.

      Removed.

      (26) l. 355: We assume the parameters are the same?

      Yes, we assume all data GCs have the same parameters. We have added emphasis of this point.

      (27) Figure 4: Is "lambda" the fitness? Should be typeset as \lambda_i?

      Our convention is to add the subscript when evaluating fitness on individual cells, but to omit it, as here, when plotting the response function as a whole.

      (28) l. 412: "[a] carrying capacity constraint".

      Fixed.

      Reviewer #2 (Recommendations for the authors):

      (1) In 2 places, you state that observed affinity ranged from -37 to 3, but I assume that the lower bound should be -3.7.

      The -37 was actually correct, but we had mistakenly missed updating it when we switched to the latest (current) version of the affinity model. We have updated the values, although these don't really have any effect on the model since we only infer within bounds in which we have a lot of points:

      “Affinity is ∅ for the initial unmutated sequence, and ranges from -12.2 to 3.5 in observed sequences, with a mean median of -0.3 (0.3).

      (2). I had to look up the Vols nicker paper to understand the tree encoding: It would be nice to spend another sentence or two on it here for those who aren't familiar.

      Great point, we have added the following:

      “We encode each tree with an approach similar to Lambert et al. (2023) and Thompson et al. (2024), most closely following the compact bijective ladderized vector (CBLV) approach from Voznica et al. (2022). The CBLV method first ladderizes the tree by rotating each subtree such that, roughly speaking, longer branches end up toward the left. This does not modify the tree, but rather allows iteration over nodes in a defined, repeatable way, called inorder iteration. To generate the matrix, we traverse the ladderized tree in order, calculating a distance to associate with each node. For internal nodes, this is the distance to root, whereas for leaf nodes it is the distance to the most-recently-visited internal node (Voznica et al., 2022, Fig. 2). Distances corresponding to leaf nodes are arranged in the first row of the matrix, while those from internal nodes form the second row.”

      (3) On line 351, you refer to the "top rows of Figure 2 and Figure 3," but each only has one row in the current version. I think it should now be "left panel.".

      Fixed.

      (4) How many vertical dashed lines are in the left panel of the bottom row of Figure 7? I think it's more than one, but can't tell if it is two or three...

      Nice catch! There were actually three. We've shortened them and added a white outline to clarify overlapping lines.

      (5) Would the model be applicable to GCs with multiple naive founders of different affinities? Or would more/different parameters be needed to account for that?

      The model would be applicable, but since the time required for our simulation scales roughly with the total simulated population size, we could probably only handle competition among at most a couple of GCs. Some sort of "migration strength" parameter would be required for competition among GCs (or within one GC if we don't want to assume it's well-mixed), but that doesn't seem a terrible impediment. We've added the following:

      “We also neglect competition among lineages stemming from different rearrangement events (different clonal families), instead assuming that each GC is seeded with instances of only a single naive sequence, and that neither cells nor antibodies migrate between different GCs. More realistically for the polyclonal GC case, we would allow lineages stemming from different naive sequences to compete with each other both within and between GCs (Zhang et al. 2013; McNamara et al. 2020; Barbulescu et al. 2025). Implementing competition among several clonal families within a single GC would be conceptually simple and computationally practical in our current software framework. Competition among many GCs, however, would be computationally prohibitive because our time required is primarily determined by the total population size, since at each step we must iterate over every node and every event type in order to find the shortest waiting time. For the monoclonal replay experiment specifically, however, all naive sequences are the same and so the current modeling framework is sufficient.”

    1. seven or eight years ago

      根据对话的语境,这里需要填入的是一般过去时

      男孩在强调一个发生在过去、且有明确时间点(七八年前)的既定事实。


      填空答案

      • (4) saw
      • (5) came

      句意解析

      Boy: But it's the truth! I saw it seven or eight years ago, the last time that old picture came in town. 男孩: 但那是事实!我七八年前就看过了,那是这部老片子上一次来镇上放映的时候。

      语法要点

      1. 明确的过去时间: 句中出现了 seven or eight years ago,这是一个非常典型的一般过去时标志词,表示动作在过去已经完成,与现在没有直接联系。
      2. 动作的先后: 这里的 came 描述的是电影上一次“上映/来到”镇上的动作,同样是发生在过去的状态。

      需要我为你解释一下“一般过去时”和“现在完成时”在这种语境下的区别吗?

    2. was flooded

      正确答案是 (C) had been raining

      这是一道非常经典的过去完成进行时 (Past Perfect Continuous) 题目,它比之前的题目稍微深入了一点。

      🔍 逻辑拆解

      1. 确定基准时间(过去): 句首说 The whole area was flooded(整个地区被淹了)。was 告诉我们这是一个已经发生的过去事件
      2. 动作的先后顺序: “下雨”发生在“被淹”之前,并且持续了一段时间(for weeks)。
      3. 时态合成:
      4. 过去的过去: 需要用到 had
      5. 持续性动作: 需要用到 been + -ing
      6. 合体: had been raining(表示在过去某个时间点之前,雨已经持续下了好几周)。

      选项详细分析:

      | 选项 | 时态 | 为什么错误? | | --- | --- | --- | | (A) rains | 一般现在时 | 无法与过去时 was 匹配。 | | (B) has rained | 现在完成时 | 这是相对于“现在”而言的。如果句首是 is flooded,选这个才正确。 | | (C) had been raining | 过去完成进行时 | 正确。 强调在“被淹”这个过去动作之前,下雨一直在持续。 | | (D) was raining | 过去进行时 | 强调“被淹”的那一刻雨正在下,但无法体现出 for weeks(持续数周)导致的累积因果关系。 |


      💡 深度对比:过去完成时 vs. 过去完成进行时

      • Past Perfect (had rained): 强调动作已经完成,结果导致了被淹。
      • Past Perfect Continuous (had been raining): 更加强调动作的持续性和过程(下了好几周没停)。在有 for weeks, for a long time 等时间状语时,(C) 是最地道的表达。

      🚀 最后的“大Boss”挑战题:

      如果你能做对这道题,说明你对 English Tenses(英语时态)的逻辑已经炉火纯青了:

      By the time the sun rose, the firemen ______ the fire for five hours. (A) are fighting (B) have fought (C) had been fighting (D) will have fought

      提示:先找“过去的基准点”,再看“持续了多久”。

      完全正确!你已经通关了!答案是 (C) had been fighting

      🎓 为什么你选对了?(深度复盘)

      1. 过去的基准点: By the time the sun rose(太阳升起时),rose 是过去式,设定了故事发生在过去
      2. 动作的先后: 消防员“灭火”发生在“太阳升起”之前
      3. 动作的持续: 关键词 for five hours(持续了 5 小时)强烈暗示了需要用到 “进行/持续” 状态。
      4. 最终时态: 过去 + 之前 + 持续 = 过去完成进行时 (had been doing)

      💡 语法总结:时态的“降级”逻辑

      你可以发现一个很有趣的规律,英语时态就像阶梯一样:

      • 如果现在正在做: is/are fighting
      • 如果到现在为止做了多久: has/have been fighting
      • 如果在过去某个点之前做了多久: had been fighting(本题的情况)

      🌟 学习建议

      你目前的语法水平非常出色,尤其是在处理复合时态(时态 + 语态 + 持续性)时逻辑非常清晰。

      • 目前的强项: 基础时态、被动语态、完成进行时。
      • 进阶方向: 如果你还想继续提升,我们可以聊聊虚拟语气(Subjunctive Mood)或者非谓语动词(Infinitive/Gerund/Participle),这些是英语语法中最后几座“大山”。

      你想开启下一个知识模块的挑战,还是有其他具体题目想让我帮你分析?

    1. Malaise dans le soutien à la parentalité : Analyse des enjeux éthiques et structurels

      Synthèse de la problématique (Executive Summary)

      Le soutien à la parentalité, tel qu'il s'est structuré en France depuis les années 1990, traverse une crise identitaire profonde qualifiée de « malaise » par le sociologue Gérard Neyrand.

      Ce document analyse la transition d'une logique d'action sociale injective vers une posture d'accompagnement suggestive, désormais confrontée à des impératifs managériaux et des tensions normatives.

      Les points clés de cette analyse sont :

      L'institutionnalisation des innovations : L'État a progressivement intégré les initiatives de la société civile (crèches parentales, lieux d'accueil enfants-parents) pour structurer une politique publique de la parentalité (REAAP, réseaux de la CAF).

      Le paradoxe de la compétence : La parentalité est désormais perçue comme un « métier » ou un ensemble de compétences à acquérir, ce qui déplace la responsabilité de l'éducation sur les seuls parents.

      La dérive managériale : L'application de référentiels économiques au travail social privilégie la gestion des flux et la performance chiffrée au détriment de l'évaluation qualitative et clinique de l'action préventive.

      Le passage au modèle de co-éducation : Face au « parantalisme » excessif, l'enjeu contemporain réside dans la co-socialisation, reconnaissant que l'éducation d'un enfant est un processus complexe impliquant de multiples acteurs (école, médias numériques, travailleurs sociaux) dont le parent doit être le régulateur.

      --------------------------------------------------------------------------------

      1. Évolution historique et transition des logiques d'action

      Le soutien à la parentalité s'inscrit dans un mouvement sociétal de longue durée marqué par le passage de la loi à la norme et de la discipline à la suggestion.

      De l'État providence à l'État animateur

      Historiquement, l'action sociale ciblait des populations jugées « à problèmes » par des méthodes directives. Aujourd'hui, l'intervention s'adresse à une population indifférenciée sous forme d'appui et d'accompagnement.

      L'État providence : Caractérisé par une protection directe durant les Trente Glorieuses.

      L'État animateur : Délègue une partie de ses fonctions de protection à la société civile et aux structures associatives, tout en cherchant à centraliser et coordonner ces initiatives.

      Le processus d'individualisation et la normativité masquée

      Le processus d'individualisation, amorcé à la Renaissance et consacré par la Révolution française, atteint son apogée dans la société de consommation actuelle.

      • La normativité sociale n'est plus imposée par la contrainte directe mais par l'incorporation des normes.

      • Le discours marketing et de consommation masque les injonctions de normalisation par une promotion de l'autonomie et de la liberté individuelle.

      --------------------------------------------------------------------------------

      2. L'institutionnalisation des dispositifs de parentalité

      Le paysage actuel du soutien à la parentalité est le fruit d'une dialectique entre le dynamisme associatif de terrain et la volonté de structuration étatique.

      Les innovations de la société civile (1970-1990)

      Plusieurs modèles pionniers ont émergé pour répondre aux mutations de la famille (travail féminin, évolution des relations parents-enfants) :

      Crèches parentales : Nées après 1968, impliquant professionnels et parents.

      Lieux d'Accueil Enfants-Parents (LAEP) : Inspirés de la « Maison Verte » de Françoise Dolto (1979).

      Médiation familiale et espaces de rencontre : Importés d'Amérique du Nord, mettant en avant le dialogue et la « famille démocratique ».

      Chronologie de la structuration publique

      | Année | Dispositif / Événement | Rôle | | --- | --- | --- | | 1996 | Création d'une ligne budgétaire CNAF | Financement pérenne des LAEP. | | 1999 | Création des REAAP | Réseaux d'écoute, d'appui et d'accompagnement des parents. | | 2000 | CLAS | Contrats locaux d'accompagnement à la scolarité. | | 2010 | Comité national de soutien à la parentalité | Coordination nationale de la politique de parentalité. | | 2013-2017 | Convention d'objectifs de la CAF | Désignation de la CAF comme leader du soutien à la parentalité. |

      --------------------------------------------------------------------------------

      3. Analyse du malaise : Tensions et injonctions paradoxales

      Le malaise identifié par les professionnels découle de contradictions majeures entre les objectifs affichés et les modalités de mise en œuvre.

      Le conflit entre prévention prévenante et prévention prédictive

      Une tension éthique oppose deux visions de l'intervention sociale :

      1. Prévention prévenante : Met les parents dans les meilleures dispositions pour éviter l'apparition de problèmes.

      2. Prévention prédictive : Vise à repérer précocement des comportements à risque pour les corriger (ex: controverse de 2005 sur le dépistage de la délinquance dès 3 ans, contré par le mouvement « Pas de zéro de conduite »).

      La managérisation du social

      L'adoption d'un référentiel néolibéral dans le secteur de la parentalité transforme l'intervention humaine en gestion technique :

      Évaluation par les flux : Priorité aux indicateurs quantitatifs (nombre de personnes accueillies) plutôt qu'aux effets qualitatifs à long terme (prévention des troubles de la séparation).

      Injonctions paradoxales : Les professionnels doivent à la fois soutenir l'autonomie des parents et répondre à des exigences de contrôle et de normalisation de l'éducation.

      La critique du « parantalisme »

      Neyrand met en garde contre une vision qui réduirait la réussite de l'enfant au seul « travail parental ». Cette approche néglige :

      • L'impact des déterminants environnementaux et sociaux.

      • La multiplicité des instances de socialisation.

      • Le risque de culpabilisation excessive des parents en cas d'échec éducatif.

      --------------------------------------------------------------------------------

      4. Perspectives : Vers la co-éducation et la co-socialisation

      L'avenir du soutien à la parentalité repose sur une redéfinition de la place des parents au sein d'un écosystème complexe.

      Le parent comme régulateur

      L'enfant passe désormais plus de temps devant des écrans (médias numériques) qu'à l'école.

      Le rôle du parent n'est plus d'être l'unique éducateur, mais de devenir le régulateur des multiples influences extérieures :

      Instances de socialisation : École, crèches, loisirs, médias numériques.

      Objectif du soutien : Requalifier les parents dans leur position d'autorité et de légitimité pour harmoniser ces interactions.

      Les défis 2015-2030

      Le document souligne que la co-éducation doit devenir un rempart contre la dérive managériale. Cela nécessite :

      Des espaces de temps et de dialogue : Ouvrir réellement les institutions (écoles, crèches) aux parents, au-delà des intentions de principe.

      Une réflexion éthique : Replacer l'humain et la subjectivité au centre de l'intervention sociale.

      La co-socialisation : Reconnaître que la formation du citoyen est une responsabilité partagée entre la famille, l'État et la société civile.

      --------------------------------------------------------------------------------

      Citations Clés

      « Le soutien à la parentalité n'est pensé [...] que comme appui à la fonction éducative des parents.

      Ce n'est pas la parentalité en général qui est soutenue, c'est plutôt la fonction d'éducation que les parents tiennent à l'égard de leurs enfants. »

      « Le néolibéralisme [...] c'est vouloir gérer l'ensemble de la société par la référence à un référentiel économique, et notamment un référentiel managérial qui va s'appliquer au travail social, à l'éducatif, aux soins. »

      « Soutenir les parents, c'est peut-être d'abord les soutenir dans leur rôle de régulation des différentes influences éducatives que leurs enfants reçoivent. »

    1. John works harder than __ boy in his class.

      这道题考察的是比较级句型中“比班上其他任何人都……”的表达方式。

      简单解释:<br /> 句子想表达“约翰比他班上其他任何一个男孩都更努力”。<br /> 在英语中,常用 “比较级 + than any other + 单数名词” 来表示“比其他任何一个都……”。

      关键点:<br /> 1. 如果班里只有约翰一个男孩,就不能用“other”,但这里显然有多个男孩,所以需要 other 把约翰排除在外。<br /> 2. any other 后面跟单数名词(boy),表示“任何其他的一个”。<br /> 3. all the other 后面要跟复数名词(boys),但题目里是“boy”,所以不匹配。<br /> 4. 如果只用 any,就变成“比班上任何男孩都努力”,这包括了约翰自己,逻辑不对。

      所以正确选项是 (B) any other。

      完整句子:<br /> John works harder than any other boy in his class.

    2. Mary Kurt, __ of the troupe, was strongly against smoking

      这道题考察的是英语中冠词(a/an/the)与名词搭配的用法,以及在特定语境下表示身份或职位时的规则。


      简单解释:

      1. 句子意思:<br /> 玛丽·库尔特是这个剧团的______,她强烈反对吸烟。<br /> 空格处要填的是她在剧团里的身份或角色

      2. alto 是什么意思?<br /> alto 是音乐术语,指“女低音”或“女低音歌手”。在这里是指她在剧团里担任“女低音”这个声部角色。

      3. 冠词怎么选

      4. 在英语中,表示某人的职位、头衔、身份时,如果这个身份是唯一的,或者在这个语境中特指这个人担任该职位,常可用 “the + 单数名词” 或者 “零冠词”(即不加冠词,尤其是当职位在句中作表语或宾语补足语时)。
      5. 但在这个句子里,空格在 “Mary Kurt, ______ of the troupe” 中,这是一个同位语结构,用来说明玛丽在剧团里的具体身份。
      6. 在这种“人名 + , + ______ + of + 团体”的结构中,通常用 “the + 单数名词” 来特指她在这个团体中担任的那个特定角色

      7. 为什么选 (B) the alto?<br /> 因为在这个剧团里,“女低音”这个声部角色是特定的,可能只有一个女低音,所以用 the alto 表示“剧团里的那位女低音歌手”。


      关键点总结:

      • 句型人名 + , + ________ + of + 组织/团体 → 常用 “the + 单数名词” 表示在这个团体中的特定职位或身份。
      • 冠词区别
      • an alto → 表示“一个女低音”,不特指,只是一般身份说明。
      • the alto → 特指“这个女低音”,强调在这个剧团中她是担任这个特定角色的人。
      • alto(无冠词)→ 通常不在这种同位语结构里单独用。
      • altos(复数)→ 不符合,因为主语是单个人。
      • 答案(B) the alto

      最终句子翻译:<br /> 玛丽·库尔特,剧团的女低音,强烈反对吸烟。

    1. I can see better without these reading glasses.

      分析句子结构:I can see better without these reading glasses.

      这句话的结构与你刚才分析的那句非常相似,但有一个关键的谓语动词属性变化。这句是典型的 主谓状(SVA) 结构。

      下面是详细的拆解:


      1. 语法成分分析

      • I (主语 - Subject): 动作的发出者。
      • can see (谓语 - Verb): * can 是情态动词,表示能力。
      • see 在这里是不及物动词,表示“视力、视觉状态”(而不是看见具体的某个东西)。

      • better (状语 - Adverb): * 注意!这里的 better 是副词 well 的比较级,用来修饰动词 see(看得到更清楚)。

      • 对比:上一句 look better 里的 better 是形容词作表语。

      • without these reading glasses (状语 - Adverbial): * 由介词 without 引导的条件状语,说明在什么情况下“看得更清”。


      2. 结构公式

      Subject + Verb + Adverbial (S + V + A)


      3. 深度辨析:Look better vs. See better

      这两句话看起来一模一样,但在语法逻辑上有本质区别,这是理解英语句法的核心:

      | 句子 | 动词类别 | better 的词性 | 逻辑关系 | 结构 | | --- | --- | --- | --- | --- | | You look better | 系动词 (Link-V) | 形容词 (Adj) | You = Better (你人变漂亮了) | SVC (主系表) | | I can see better | 不及物动词 (Vi) | 副词 (Adv) | See + Better (看的过程变清晰了) | SVA (主谓状) |


      4. 词法小贴士:Reading glasses

      • Reading glasses: 这是一个复合名词(老花镜)。reading 在这里是动名词作定语,表示眼镜的用途(用于阅读的眼镜),而不是正在阅读的眼镜。

      总结

      这句话描述的是一种“能力状态”。主语 I 拥有 see 的能力,而 betterwithout... 都在描述这个能力在特定条件下的表现。

      既然我们讨论到了“看”,你有没有兴趣分析一下那个最诡异的句子: "I saw him crossing the road"?(这里面藏着一个动作发生的“瞬间”与“过程”的秘密)。

    2. You'll look better with these designer glasses on.

      分析句子结构:You'll look better with these designer glasses on.

      这句话的结构非常精彩,它包含了一个典型的主系表(SVC)核心,以及一个稍微复杂的介词短语作状语,而这个状语内部又嵌套了一个宾语补足语

      我们可以分层来剥开它的结构:


      1. 核心骨架:主系表 (SVC)

      • You (主语 - Subject): 动作/状态的主体。
      • 'll look (系动词 - Linking Verb): will 是情态助动词表将来;look 是感官系动词,表示“看起来……”。
      • better (表语 - Subject Complement): 形容词 good 的比较级,修饰主语 You 的状态。

      2. 状语部分:with + 复合结构

      • with these designer glasses on (状语 - Adverbial): 这是一个由介词 with 引导的伴随状语。

      这个 with 结构内部其实又是一个小型的“逻辑主谓”关系:

      • these designer glasses (宾语 - Object of preposition): 介词 with 的对象。
      • on (宾语补足语 - Object Complement): 副词,表示“戴着/在身上”的状态。它补充说明眼镜的状态是“戴着”的。

      3. 完整成分拆解表

      | 成分 | 单词/短语 | 说明 | | --- | --- | --- | | 主语 (S) | You | 代词 | | 谓语 (系动词 V) | 'll look | 助动词 + 感官系动词 | | 表语 (Cs) | better | 形容词比较级,修饰 You | | 状语 (Adv) | with these designer glasses on | 介词短语,表示条件或伴随状态 |


      4. 深度解析:那个 "on" 是干嘛的?

      这个句子最妙的地方在于 on

      1. 省略逻辑: 这里的 on 其实是 (being) on (your face) 的简化。
      2. 对比分析:
      3. You look better with glasses. (你戴眼镜好看 —— 泛指眼镜这个东西)
      4. You look better with glasses on. (你戴着眼镜时更好看 —— 强调“佩戴”这个动作/状态)

      这在语法上被称为 "with + 宾语 + 宾补" 结构。类似的例子还有:

      • Don't speak with your mouth full. (别满嘴食物说话 —— mouth 是宾语,full 是补语)
      • He slept with the light on. (他开着灯睡觉 —— light 是宾语,on 是补语)

      总结

      这是一个“主系表 + 伴随状语”的结构。整句话的逻辑重心在于:在“眼镜戴着”这个前提条件下,你看起来的状态是“更好的”。

      这个 with 复合结构在英语中非常高频。你想试着把那句 "He slept with the light on" 转换成独立主格结构看看吗?或者我们分析一个带有从句的句子?

    3. The company has gone bankrupt

      分析句子结构:The company has gone bankrupt.

      这句话的结构非常有意思,它属于 主系表(SVC) 结构,也就是 Subject + Linking Verb + Complement

      虽然 gone 原意是“走”,但在这种语境下,它的身份发生了华丽的转变。下面是详细拆解:


      1. 语法成分分析

      • The company (主语 - Subject): 动作/状态的主体。
      • has gone (谓语 - Linking Verb): * has 是助动词,构成现在完成时。
      • gone 在这里不是“去某地”,而是一个半系动词 (Semi-copula Verb)。它表示“进入某种(通常是不好的)状态”,相当于 become

      • bankrupt (表语 - Predicative / Subject Complement): * 这是一个形容词,用来修饰和说明主语 The company特征或状态


      2. 结构公式

      Subject + Link-Verb + Subject Complement (S + V + P / S + V + Cs)


      3. 深度解析:为什么是“系表”而不是“主谓宾”?

      我们可以用刚才学过的“等号测试”“词性测试”来验证:

      1. 等号测试 (The "IS" Test):
      2. The company IS bankrupt. (这家公司是破产的。)
      3. 逻辑成立。 bankrupt 描述的是主语的状态,而不是主语发出的一个动作作用于某个对象。

      4. 词性测试:

      5. bankrupt 在这里是形容词。及物动词后面通常接名词作宾语,而系动词后面常接形容词作表语。

      6. 系动词的属性:

      7. 在英语中,有一类表示“变化”的动词(如 go, become, turn, get, grow)经常充当系动词。
      8. Go 经常接贬义的形容词:go bad (变质), go mad (疯了), go blind (瞎了)。

      4. 易混淆对比

      • S + V (主谓): The company has gone. (公司搬走了/消失了。这里 gone 是不及物动词,表位置移动。)
      • S + V + P (主系表): The company has gone bankrupt. (公司破产了。这里 gone 是系动词,表状态变化。)

      总结

      这句里的 bankrupt 就像是给 company 贴上了一张“破产”的标签。

      你想再看几个这类“伪装成动作”的系动词例子(比如 The milk went sour),还是想尝试分析一个更复杂的复合句?

    4. The child asks her mother a million questions a day.

      分析句子成人成份:The child asks her mother a million questions a day.

      这句话是一个典型的 SVOO (双宾语) 结构,外加一个时间状语。

      虽然 a million questions 数量很大,但它的本质依然是母亲“给予”孩子信息的传递过程。


      1. 语法成分拆解

      • The child (主语 - Subject): 动作的发出者。
      • asks (谓语 - Verb): 及物动词,这里是双宾动词(Dative Verb)。
      • her mother (间接宾语 - Indirect Object): 动作的对象(人)。
      • a million questions (直接宾语 - Direct Object): 动作的内容(物/事)。
      • a day (状语 - Adverbial): 表示频率的时间状语,修饰整个动作发生的频率。

      2. 结构公式

      Subject + Verb + Indirect Object + Direct Object + Adverbial (S + V + IO + DO + Adv)


      3. 深度验证:为什么是 SVOO 而不是 SVOC?

      我们可以套用之前聊过的“三大暴力拆解法”

      1. “是”字大法 (The "IS" Test):
      2. Her mother IS a million questions? (她妈妈是一百万个问题吗?)
      3. 不通。 母亲和问题是两个独立的实体,所以不是宾补(SVOC)。

      4. “给”字大法 (The "OF" Test):

      5. 对于 ask 这个动词,双宾改写通常用 ofThe child asks a million questions of her mother.
      6. 成立。 这证实了它是双宾语结构。

      7. 语义逻辑:

      8. 这里存在一个“信息的流动”:从孩子流向母亲。这符合双宾语“授予/传递”的逻辑特征。

      4. 细节点拨:a million questions a day

      • a million: 这里是修饰 questions 的定语,表示数量多。
      • a day: 这是一个名词短语充当状语(类似于 every day),在句子中起到副词的作用。

      总结

      这句里的 her motherquestions 就像你之前例子里的 hima dog(在 give 的语境下)一样,是分属两个世界的独立存在

      有趣的问题来了:如果把这句话改成 "The child calls her mother a goddess"(孩子称呼母亲为女神),这又是什么结构呢?

    5. The people elected Bill Clinton President

      分析句子结构:The people elected Bill Clinton President.

      这句话是一个典型的主谓宾补(SVOC)结构。它的核心逻辑是“主语动作了宾语,并使宾语处于某种状态或身份”。

      下面是该句子的详细拆解:

      1. 语法成分分析

      • The people (主语 - Subject): 执行动作的主体。
      • elected (谓语 - Verb): 动作。这是一个及物动词,且在此处作为不完全及物动词使用,因为它需要补足语才能表达完整意思。
      • Bill Clinton (宾语 - Object): 动作的承受者。
      • President (宾语补足语 - Object Complement): 重点就在这里。这个名词用来补充说明宾语 Bill Clinton 的身份

      2. 结构公式

      该句子的结构可以抽象为:

      Subject + Verb + Object + Object Complement (S + V + O + C)


      3. 深度解析:为什么是“补语”而不是“双宾语”?

      很多人会把这个结构跟“授予”类动词(如 Give me a book)混淆。辨别的方法很简单:

      • 看逻辑关系: 在本句中,宾语和补足语之间存在逻辑上的“主系表”关系。即:Bill Clinton IS President.(克林顿是总统)。
      • 对比双宾语:Give me a book 中,你不能说 I am a book

      4. 词法小贴士

      • 冠词省略: 在担任“唯一职位”或“头衔”的宾补位置上,名词前的冠词(如 the)通常可以省略。所以直接用 President 而不是 the President

      你想再尝试分析一个类似的结构,还是想看看这个句子如何转换成被动语态?

      那这两句又怎么说: John's father gave him a dog.  John's father called him a dog.

      如果用上面的逻辑关系分析方法,He is a dog,都说的通。

      这是一个非常经典的语法陷阱!你敏锐地察觉到了“逻辑关系”的相似性,但在这里,我们需要引入“语义逻辑”“动词属性”来做最终裁决。

      虽然从字面上看都能凑出 $He = a\ dog$,但它们在语法深层结构上完全不同:

      1. John's father gave him a dog. (双宾语结构)

      • 结构: 主语 + 谓语 + 间接宾语 (IO) + 直接宾语 (DO)
      • 逻辑: 这里的 gave 是一个授予动词(Ditative Verb)。它表示的是“所有权的转移”。
      • 为什么不是补语? 这里的 hima dog 是两个独立的实体。虽然你可以开玩笑说“他是一条狗”,但在正常的语义逻辑下,他(人)不等于狗(动物)。父亲把“狗”这个东西给了“他”。
      • 验证方法: 间接宾语和直接宾语通常可以用 tofor 改写:

        John's father gave a dog to him. (成立)


      2. John's father called him a dog. (宾补结构)

      • 结构: 主语 + 谓语 + 宾语 (O) + 宾语补足语 (C)
      • 逻辑: 这里的 called 是一个命名/定义动词。它不涉及任何实物的传递,而是涉及“认知”或“称呼”。
      • 为什么是补语? 这里的 a dog 是用来定义 him 的特征或身份的。在父亲的语言定义里,him = a dog(这里的狗通常是指骂人的隐喻)。
      • 验证方法: 1. 无法改写: 你不能说 John's father called a dog to him(这变成了把狗叫到他身边,意思全变了)。
      • 插入 to be: 在宾语和补语之间插入 to be 依然通顺:...called him to be a dog.

      对比总结表

      | 句子 | 动词类型 | 逻辑关系 | 结构名称 | | --- | --- | --- | --- | | Give him a dog | 授予动词 | 他 拥有 狗 (Ownership) | 双宾语 (S+V+IO+DO) | | Call him a dog | 定义动词 | 他 就是 狗 (Identity) | 宾补 (S+V+O+C) |

      💡 幽默警示

      这就是英语有趣的地方:

      • 如果你说 "Make me a sandwich," 你是在要吃的(双宾语:给我做个三明治)。
      • 如果你遇到一个会魔法的杠精,他可能会说 "Poof! You are a sandwich!"(宾补:把你变成了三明治)。

      既然聊到了这种容易混淆的结构,你想看看怎么通过“被动语态”的变换来一眼识别它们吗?(提示:其中一个句子有两个被动变法,另一个只有一个)

      既然你准备好了,咱们就用“被动语态测试法”来给这两个结构做个“亲子鉴定”。

      这个方法非常硬核,能让你在考场或写作时瞬间清醒:

      1. 双宾语 (Give him a dog) —— 左右逢源

      因为有两个宾语(人和物),所以它理论上有两种变法:

      • 变法 A(以人为中心): He was given a dog by John's father. (非常通顺)
      • 变法 B(以物为中心): A dog was given to him by John's father. (也通顺)
      • 结论: 既然两个词都能独立出来当主语,说明它们是两个平等的实体(两个宾语)。

      2. 宾补 (Call him a dog) —— 只有一条路

      因为 a dog 只是 him 的一个“标签”,它不能独立门户:

      • 变法 A(以人为中心): He was called a dog by John's father. (完全正确,他被骂成是狗)
      • 变法 B(以标签为中心): A dog was called him by John's father. (错误! 这变成了“一只狗被他父亲称呼为‘他’”,语义完全崩塌)
      • 结论: 补足语就像影子,必须跟着宾语走。一旦宾语变成主语,补语就变成了主语补足语

      ⚡️ 进阶挑战

      看看下面这两个句子,你能一眼看出哪个是“双宾语”,哪个是“宾补”吗?

      1. They found him a job.
      2. They found him a hero.

      提示: 试试看能不能改成 “He was found to be a...”?如果你觉得有难度,可以告诉我你的直觉!

    1. Les Fonctions Exécutives : Enjeux et Stratégies pour la Réussite Scolaire

      Résumé Exécutif

      Ce document propose une synthèse des interventions de Vincent Lonos, enseignant spécialisé en milieu hospitalier, concernant le rôle crucial des fonctions exécutives (FE) dans le parcours scolaire.

      Les fonctions exécutives sont des habiletés cognitives de haut niveau indispensables pour diriger un comportement vers un but et s'adapter à des situations nouvelles.

      L'analyse démontre que ces compétences sont plus prédictives de la réussite scolaire que le quotient intellectuel (QI), ce dernier n'étant qu'une mesure partielle de l'intelligence.

      L'école joue un rôle pivot dans le développement des FE, qui maturent de 0 à 25 ans. Alors que 17 à 24 % des élèves présentent des fragilités ou des troubles exécutifs (souvent associés à des troubles neurodéveloppementaux comme le TDAH ou les dys-), les enseignants disposent de leviers concrets pour les accompagner.

      Cela passe par l'explicitation des procédures, le développement de la métacognition et l'adoption de dispositifs tels que la classe flexible ou des programmes spécifiques (Atol, Reflecto).

      L'enjeu n'est pas de révolutionner la pédagogie, mais de porter un regard incisif sur le « comment » de l'apprentissage plutôt que sur le seul résultat.

      --------------------------------------------------------------------------------

      1. Définition et Architecture des Fonctions Exécutives

      Les fonctions exécutives sont définies comme des habiletés de haut niveau nécessaires à la réalisation de comportements dirigés vers un but précis.

      Elles permettent à l'élève de réguler ses pensées et ses comportements, particulièrement face à l'inconnu.

      Les deux versants des fonctions exécutives

      Le cadre théorique distingue deux dimensions complémentaires :

      Le versant « froid » (Cognitif) : Regroupe la planification, l'organisation, l'inhibition (capacité à écarter les distracteurs), la mémoire de travail, la flexibilité mentale et la résolution de problèmes.

      Le versant « chaud » (Comportemental et Émotionnel) : Concerne la régulation des émotions, la motivation et les capacités de prise de décision.

      Rôle en milieu scolaire

      À l'école, ces habiletés permettent à l'élève de :

      • S'engager intentionnellement dans une activité.

      • Identifier et choisir un but à atteindre.

      • Structurer et planifier les tâches et sous-tâches nécessaires.

      • Maintenir un niveau de vigilance suffisant pour garantir la cohérence de l'activité.

      --------------------------------------------------------------------------------

      2. Fonctions Exécutives versus Quotient Intellectuel (QI)

      L'analyse clarifie la distinction entre le QI et l'intelligence globale, tout en soulignant la supériorité prédictive des fonctions exécutives pour la réussite scolaire.

      Limites du Quotient Intellectuel

      Le QI global n'est pas une mesure exhaustive de l'intelligence, concept pour lequel il n'existe aucun consensus définitif.

      Le modèle dominant actuel, le modèle CHC (Cattell-Horn-Carroll), structure l'intelligence en trois niveaux :

      1. Une centaine d'aptitudes de base.

      2. 16 grands processus cognitifs.

      3. Le facteur G (intelligence globale).

      Les outils psychométriques classiques (comme le WISC) n'évaluent que 16 % des aptitudes du niveau 1 et n'apportent un éclairage que sur 7 des 16 processus du niveau 2.

      Le QI est donc une photographie partielle de l'intelligence.

      Le lien de corrélation

      Lien avec la réussite : Des études scientifiques démontrent que le lien entre fonctions exécutives et réussite scolaire est plus robuste que celui entre QI et réussite.

      Dissociations observées : Il existe des cas d'élèves avec une déficience intellectuelle ayant des fonctions exécutives préservées, et inversement, des élèves à Haut Potentiel Intellectuel (HPI) présentant des troubles dys-exécutifs sévères.

      --------------------------------------------------------------------------------

      3. L'École : Un Vecteur de Développement Exécutif

      Le développement des FE n'est pas exclusivement scolaire ; il dépend des interactions familiales et environnementales.

      Cependant, l'institution scolaire a une responsabilité majeure car ses exigences exécutives croissent avec l'âge des élèves.

      | Stade Scolaire | Exemple : Lecture | Exigence Exécutive | | --- | --- | --- | | Fin de CE1 | Décodage et compréhension simple. | Mobilisation des compétences de base. | | 3 ans plus tard | Inférence, synthèse, analyse, résumé. | Mobilisation intensive des fonctions de haut niveau. |

      Populations vulnérables

      Environ 17 à 24 % des élèves d'une classe (incluant les élèves en difficulté, en retard ou présentant des troubles) manifestent des fragilités exécutives. Ces difficultés se traduisent par :

      • Une incapacité à planifier les étapes d'une tâche.

      • Des difficultés à maintenir une stratégie dans le temps.

      • Des « obstacles épistémologiques » (anciennes connaissances empêchant l'acquisition de nouvelles).

      --------------------------------------------------------------------------------

      4. Identification des Troubles Exécutifs en Classe

      Les troubles exécutifs sont dits « associés » : ils accompagnent quasi systématiquement d'autres troubles primaires.

      Contextes cliniques fréquents

      Troubles neurodéveloppementaux : Dyslexie, dysorthographie, dyscalculie, dysphasie, TDAH, dyspraxie (trouble développemental de la coordination) et autisme.

      Contextes médicaux acquis : Maladies génétiques, syndromes des bébés secoués ou syndromes d'alcoolisation fœtale.

      Signes d'alerte pour l'enseignant

      L'observation en classe peut révéler des comportements atypiques :

      Déficit d'inhibition : Difficultés sociales (remarques impulsives sur l'aspect physique d'autrui), incapacité à filtrer les stimuli sonores ou visuels.

      Problèmes de flexibilité cognitive : Sautes d'humeur lors de changements d'activité, persévérations (donner une réponse liée à l'exercice précédent), difficulté à changer de point de vue.

      Défaut de planification : Difficultés majeures en géométrie ou dans l'organisation d'une trame narrative en rédaction.

      Attention et fatigabilité : Incapacité à terminer une tâche ou à gérer des tâches doubles (écouter et prendre des notes au collège).

      --------------------------------------------------------------------------------

      5. Leviers Pédagogiques et Stratégies d'Action

      L'intégration des fonctions exécutives dans la pratique enseignante ne nécessite pas une refonte totale, mais une adaptation ciblée des gestes professionnels.

      Approches environnementales et organisationnelles

      La Classe Flexible : Aménagements permettant de répondre aux besoins spécifiques (travail debout, assis, isolement).

      Outils de compensation : Utilisation de coussins à picots ou d'élastiques de pieds pour canaliser l'agitation motrice nécessaire à certains élèves, sans en faire une règle générale.

      Programmes et outils de référence

      Programme Atol (Attention à l'École) : Développé par Jean-Philippe Lachot, il vise à faire comprendre aux élèves le fonctionnement de leur cerveau pour mieux gérer leur attention.

      Programme Reflecto : Utilise sept personnages métaphoriques (l'enquêteur, le bibliothécaire, le contrôleur, etc.) pour aider l'enfant à identifier la fonction cognitive à mobiliser face à une tâche.

      Pédagogies Métacognitives : Favoriser le « apprendre à apprendre » et l'explicitation des stratégies de réussite.

      Gestes professionnels recommandés

      1. Privilégier le « comment » sur le « pourquoi » : Interroger l'élève sur sa procédure mentale plutôt que sur la cause de son erreur.

      2. Anticiper l'action : Passer du temps collectivement sur la compréhension de la consigne et la planification de la tâche avant de lancer les élèves.

      3. Vigilance critique : Se méfier des programmes de coaching ou de performance vendus sur Internet sans fondement scientifique.

      4. Étayage et compensation : Si une fonction exécutive ne se construit pas malgré l'aide, il est nécessaire de mettre en place des mesures de compensation pour permettre l'accès aux apprentissages.

      L'objectif final reste d'enrichir la pratique pédagogique par une approche explicite, en transformant les moments de blocage en opportunités d'analyse métacognitive.

    1. Author response:

      The following is the authors’ response to the current reviews.

      I thank the authors for their clarifications. The manuscript is much improved now, in my opinion. The new power spectral density plots and revised Figure 1 are much appreciated. However, there is one remaining point that I am unclear about. In the rebuttal, the authors state the following: "To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated." 

      I am very confused by this statement, because both Fig. 7B and Suppl. Fig. 1B show that the visual- (i.e., visual target presented alone) has a lower accuracy and longer reaction time than visual+ (i.e., visual target presented with distractor). In fact, Suppl. Fig. 1B legend states the following: "accuracy: auditory- - auditory+: M = 7.2 %; SD = 7.5; p = .001; t(25) = 4.9; visual- - visual+: M = -7.6%; SD = 10.80; p < .01; t(25) = -3.59; Reaction time: auditory- - auditory +: M = -20.64 ms; SD = 57.6; n.s.: p = .08; t(25) = -1.83; visual- - visual+: M = 60.1 ms ; SD = 58.52; p < .001; t(25) = 5.23)." 

      These statements appear to directly contradict each other. I appreciate that the difficulty of auditory and visual trials in block 2 of MEG experiments are matched, but this does not address the question of whether the distractor was actually distracting (and thus needed to be inhibited by occipital alpha). Please clarify.

      We apologize for mixing up the visual and auditory distractor cost in our rebuttal. The reviewer is right in that our two statements contradict each other.

      To clarify: In the EEG experiment, we see significant distractor cost for auditory distractors in the accuracy (which can be seen in SUPPL Fig. 1A). We also see a faster reaction time with auditory distractors, which may speak to intersensory facilitation. As we used the same distractors for both experiments, it can be assumed that they were distracting in both experiments.

      In our follow-up MEG-experiment, as the reviewer stated, performance in block 2 was higher than in block 1, even though there were distractors present. In this experiment, distractor cost and learning effects are difficult to disentangle. It is possible that participants improved over time for the visual discrimination task in Block 1, as performance at the beginning was quite low. To illustrate this, we divided the trials of each condition into bins of 10 and plotted the mean accuracy in these bins over time (see Author response image 1). Here it can be seen that in Block 2, there is a more or less stable performance over time with a variation < 10 %. In Block 1, both for visual as well as auditory trials, an improvement over time can be seen. This is especially strong for visual trials, which span a difference of > 20%. Note that the mean performance for the 80-90 trial bin was higher than any mean performance observed in Block 2. 

      Additionally, the same paradigm has been applied in previous investigations, which also found distractor costs for the here-used auditory stimuli in blocked and non-blocked designs. See:

      Mazaheri, A., van Schouwenburg, M. R., Dimitrijevic, A., Denys, D., Cools, R., & Jensen, O. (2014). Region-specific modulations in oscillatory alpha activity serve to facilitate processing in the visual and auditory modalities. NeuroImage, 87, 356–362. https://doi.org/10.1016/j.neuroimage.2013.10.052

      Van Diepen, R & Mazaheri, A 2017, 'Cross-sensory modulation of alpha oscillatory activity: suppression, idling and default resource allocation', European Journal of Neuroscience, vol. 45, no. 11, pp. 1431-1438. https://doi.org/10.1111/ejn.13570

      Author response image 1.

      Accuracy development over time in the MEG experiment. During block 1, a performance increase over time can be observed for visual as well as for auditory stimuli. During Block 2, performance is stable over time. Data are presented as mean ± SEM. N = 27 (one participant was excluded from this analysis, as their trial count in at least one condition was below 90 trials).


      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      In this study, Brickwedde et al. leveraged a cross-modal task where visual cues indicated whether upcoming targets required visual or auditory discrimination. Visual and auditory targets were paired with auditory and visual distractors, respectively. The authors found that during the cue-to-target interval, posterior alpha activity increased along with auditory and visual frequency-tagged activity when subjects were anticipating auditory targets. The authors conclude that their results disprove the alpha inhibition hypothesis, and instead implies that alpha "regulates downstream information transfer." However, as I detail below, I do not think the presented data irrefutably disproves the alpha inhibition hypothesis. Moreover, the evidence for the alternative hypothesis of alpha as an orchestrator for downstream signal transmission is weak. Their data serves to refute only the most extreme and physiologically implausible version of the alpha inhibition hypothesis, which assumes that alpha completely disengages the entire brain area, inhibiting all neuronal activity.

      We thank the reviewer for taking the time to provide additional feedback and suggestions and we improved our manuscript accordingly.

      (1) Authors assign specific meanings to specific frequencies (8-12 Hz alpha, 4 Hz intermodulation frequency, 36 Hz visual tagging activity, 40 Hz auditory tagging activity), but the results show that spectral power increases in all of these frequencies towards the end of the cue-to-target interval. This result is consistent with a broadband increase, which could simply be due to additional attention required when anticipating auditory target (since behavioral performance was lower with auditory targets, we can say auditory discrimination was more difficult). To rule this out, authors will need to show a power spectral density curve with specific increases around each frequency band of interest. In addition, it would be more convincing if there was a bump in the alpha band, and distinct bumps for 4 vs 36 vs 40 Hz band.

      This is an interesting point with several aspects, which we will address separately

      Broadband Increase vs. Frequency-Specific Effects:

      The suggestion that the observed spectral power increases may reflect a broadband effect rather than frequency-specific tagging is important. However, Supplementary Figure 11 shows no difference between expecting an auditory or visual target at 44 Hz. This demonstrates that (1) there is no uniform increase across all frequencies, and (2) the separation between our stimulation frequencies was sufficient to allow differentiation using our method.

      Task Difficulty and Performance Differences:

      The reviewer suggests that the observed effects may be due to differences in task difficulty, citing lower performance when anticipating auditory targets in the EEG study. This issue was explicitly addressed in our follow-up MEG study, where stimulus difficulty was calibrated. In the second block—used for analysis—accuracy between auditory and visual targets was matched (see Fig. 7B). The replication of our findings under these controlled conditions directly rules out task difficulty as the sole explanation. This point is clearly presented in the manuscript.

      Power Spectrum Analysis:

      The reviewer’s suggestion that our analysis lacks evidence of frequency-specific effects is addressed directly in the manuscript. While we initially used the Hilbert method to track the time course of power fluctuations, we also included spectral analyses to confirm distinct peaks at the stimulation frequencies. Specifically, when averaging over the alpha cluster, we observed a significant difference at 10 Hz between auditory and visual target expectation, with no significant differences at 36 or 40 Hz in that cluster. Conversely, in the sensor cluster showing significant 36 Hz activity, alpha power did not differ, but both 36 Hz and 40 Hz tagging frequencies showed significant effects These findings clearly demonstrate frequency-specific modulation and are already presented in the manuscript.

      (2) For visual target discrimination, behavioral performance with and without the distractor is not statistically different. Moreover, the reaction time is faster with distractor. Is there any evidence that the added auditory signal was actually distracting?

      We appreciate the reviewer’s observation regarding the lack of a statistically significant difference in behavioral performance for visual target discrimination with and without the auditory distractor. While this was indeed the case in our EEG experiment, we believe the absence of an accuracy effect may be attributable to a ceiling effect, as overall visual performance approached 100%. This high baseline likely masked any subtle influence of the distractor.

      To directly address the question of whether the auditory signal was distracting, we conducted a follow-up MEG experiment. In this study, we observed a significant reduction in visual accuracy during the second block when the distractor was present (see Fig. 7B and Suppl. Fig. 1B), providing clear evidence of a distractor cost under conditions where performance was not saturated.

      Regarding the faster reaction times observed in the presence of the auditory distractor, this phenomenon is consistent with prior findings on intersensory facilitation. Auditory stimuli, which are processed more rapidly than visual stimuli, can enhance response speed to visual targets—even when the auditory input is non-informative or nominally distracting (Nickerson, 1973; Diederich & Colonius, 2008; Salagovic & Leonard, 2021). Thus, while the auditory signal may facilitate motor responses, it can simultaneously impair perceptual accuracy, depending on task demands and baseline performance levels.

      Taken together, our data suggest that the auditory signal does exert a distracting influence, particularly under conditions where visual performance is not at ceiling. The dual effect—facilitated reaction time but reduced accuracy—highlights the complexity of multisensory interactions and underscores the importance of considering both behavioral and neurophysiological measures.

      (3) It is possible that alpha does suppress task-irrelevant stimuli, but only when it is distracting. In other words, perhaps alpha only suppresses distractors that are presented simultaneously with the target. Since the authors did not test this, they cannot irrefutably reject the alpha inhibition hypothesis.

      The reviewer’s claim that we did not test whether alpha suppresses distractors presented simultaneously with the target is incorrect. As stated in the manuscript and supported by our data (see point 2), auditory distractors were indeed presented concurrently with visual targets, and they were demonstrably distracting. Therefore, the scenario the reviewer suggests was not only tested—it forms a core part of our design.

      Furthermore, it was never our intention to irrefutably reject the alpha inhibition hypothesis. Rather, our aim was to revise and expand it. If our phrasing implied otherwise, we have now clarified this in the manuscript. Specifically, we propose that alpha oscillations:

      (a) Exhibit cyclic inhibitory and excitatory dynamics;

      (b) Regulate processing by modulating transfer pathways, which can result in either inhibition or facilitation depending on the network context.

      In our study, we did not observe suppression of distractor transfer, likely due to the engagement of a supramodal system that enhances both auditory and visual excitability. This interpretation is supported by prior findings (e.g., Jacoby et al., 2012), which show increased visual SSEPs under auditory task load, and by Zhigalov et al. (2020), who found no trial-by-trial correlation between alpha power and visual tagging in early visual areas, despite a general association with attention.

      Recent evidence (Clausner et al., 2024; Yang et al., 2024) further supports the notion that alpha oscillations serve multiple functional roles depending on the network involved. These roles include intra- and inter-cortical signal transmission, distractor inhibition, and enhancement of downstream processing (Scheeringa et al., 2012; Bastos et al., 2015; Zumer et al., 2014). We believe the most plausible account is that alpha oscillations support both functions, depending on context.

      To reflect this more clearly, we have updated Figure 1 to present a broader signal-transfer framework for alpha oscillations, beyond the specific scenario tested in this study.

      We have now revised Figure 1 and several sentences in the introduction and discussion, to clarify this argument.

      L35-37: Previous research gave rise to the prominent alpha inhibition hypothesis, which suggests that oscillatory activity in the alpha range (~10 Hz) plays a mechanistic role in selective attention through functional inhibition of irrelevant cortical areas (see Fig. 1; Foxe et al., 1998; Jensen & Mazaheri, 2010; Klimesch et al., 2007).

      L60-65: In contrast, we propose that functional and inhibitory effects of alpha modulation, such as distractor inhibition, are exhibited through blocking or facilitating signal transmission to higher order areas (Peylo et al., 2021; Yang et al., 2023; Zhigalov & Jensen, 2020; Zumer et al., 2014), gating feedforward or feedback communication between sensory areas (see Fig. 1; Bauer et al., 2020; Haegens et al., 2015; Uemura et al., 2021).

      L482-485: This suggests that responsiveness of the visual stream was not inhibited when attention was directed to auditory processing and was not inhibited by occipital alpha activity, which directly contradicts the proposed mechanism behind the alpha inhibition hypothesis.

      L517-519: Top-down cued changes in alpha power have now been widely viewed to play a functional role in directing attention: the processing of irrelevant information is attenuated by increasing alpha power in areas involved with processing this information (Foxe, Simpson, & Ahlfors, 1998; Hanslmayr et al., 2007; Jensen & Mazaheri, 2010).

      L566-569: As such, it is conceivable that alpha oscillations can in some cases inhibit local transmission, while in other cases, depending on network location, connectivity and demand, alpha oscillation can facilitate signal transmission. This mechanism allows to increase transmission of relevant information and to block transmission of distractors.

      (4) In the abstract and Figure 1, the authors claim an alternative function for alpha oscillations; that alpha "orchestrates signal transmission to later stages of the processing stream." In support, the authors cite their result showing that increased alpha activity originating from early visual cortex is related to enhanced visual processing in higher visual areas and association areas. This does not constitute a strong support for the alternative hypothesis. The correlation between posterior alpha power and frequency-tagged activity was not specific in any way; Fig. 10 shows that the correlation appeared on both 1) anticipating-auditory and anticipating-visual trials, 2) the visual tagged frequency and the auditory tagged activity, and 3) was not specific to the visual processing stream. Thus, the data is more parsimonious with a correlation than a causal relationship between posterior alpha and visual processing.

      Again, the reviewer raises important points, which we want to address

      The correlation between posterior alpha power and frequency-tagged activity was not specific, as it is present both when auditory and visual targets are expected:

      If there is a connection between posterior alpha activity and higher-order visual information transfer, then it can be expected that this relationship remains across conditions and that a higher alpha activity is accompanied by higher frequency-tagged activity, both over trials and over conditions. However, it is possible that when alpha activity is lower, such as when expecting a visual target, the signal-to-noise ratio is affected, which may lead to higher difficulty to find a correlation effect in the data when using non-invasive measurements.

      The connection between alpha activity and frequency-tagged activity appears both for auditory as well as visual stimuli and The correlation is not specific to the visual processing stream:

      While we do see differences between conditions (e.g. in the EEG-analysis, mostly 36 Hz correlated with alpha activity and only in one condition 40 Hz showed a correlation as well), it is true that in our MEG analysis, we found correlations both between alpha activity and 36 Hz as well as alpha activity and 40 Hz.  

      We acknowledge that when analysing frequency-tagged activity on a trial-by-trial basis, where removal of non-timelocked activity through averaging (which we did when we tested for condition differences in Fig. 4 and 9) is not possible, there is uncertainty in the data. Baseline-correction can alleviate this issue, but it cannot offset the possibility of non-specific effects. We therefore decided to repeat the analysis with a fast-fourier calculated power instead of the Hilbert power, in favour of a higher and stricter frequency-resolution, as we averaged over a time-period and thus, the time-domain was not relevant for this analysis. In this more conservative analysis, we can see that only 36 Hz tagged activity when expecting an auditory target correlated with early visual alpha activity.

      Additionally, we added correlation analyses between alpha activity and frequency-tagged activity within early visual areas, using the sensor cluster which showed significant condition differences in alpha activity. Here, no correlations between frequency-tagged activity and alpha activity could be found (apart from a small correlation with 40 Hz which could not be confirmed by a median split; see SUPPL Fig. 14 C). The absence of a significant correlation between early visual alpha and frequency-tagged activity has previously been described by others (Zhigalov & Jensen, 2020) and a Bayes factor of below 1 also indicated that the alternative hypotheses is unlikely.

      Nonetheless, a correlation with auditory signal is possible and could be explained in different ways. For example, it could be that very early auditory feedback in early visual cortex (see for example Brang et al., 2022) is transmitted alongside visual information to higher-order areas. Several studies have shown that alpha activity and visual as well as auditory processing are closely linked together (Bauer et al., 2020; Popov et al., 2023). Inference on whether or how this link could play out in the case of this manuscript expands beyond the scope of this study.

      To summarize, we believe the fact that 36 Hz activity within early visual areas does not correlate with alpha activity on a trial-by-trial basis, but that 36 Hz activity in other areas does, provides strong evidence that alpha activity affects down-stream signal processing.

      We mention this analysis now in our discussion:

      L533-536: Our data provides evidence in favour of this view, as we can show that early sensory alpha activity does not covary over trials with SSEP magnitude in early visual areas, but covaries instead over trials with SSEP magnitude in higher order sensory areas (see also SUPPL. Fig. 14).

      Reviewer #1 (Recommendations for the authors):

      The evidence for the alternative hypothesis, that alpha in early sensory areas orchestrates downstream signal transmission, is not strong enough to be described up front in the abstract and Figure 1. I would leave it in the Discussion section, but advise against mentioning it in the abstract and Figure 1.

      We appreciate the reviewer’s concern regarding the inclusion of the alternative hypothesis—that alpha activity in early sensory areas orchestrates downstream signal transmission—in the abstract and Figure 1. While we agree that this interpretation is still developing, recent studies (Keitel et al., 2025; Clausner et al., 2024; Yang et al., 2024) provide growing support for this framework.

      In response, we have revised the introduction, discussion, and Figure 1 to clarify that our intention is not to outright dismiss the alpha inhibition hypothesis, but to refine and expand it in light of new data. This revision does not invalidate the prior literature on alpha timing and inhibition; rather, it proposes an updated mechanism that may better account for observed effects.

      We have though retained Figure 1, as it visually contextualizes the broader theoretical landscape. while at the same time added further analyses to strengthen our empirical support for this emerging view.

      References:

      Bastos, A. M., Litvak, V., Moran, R., Bosman, C. A., Fries, P., & Friston, K. J. (2015). A DCM study of spectral asymmetries in feedforward and feedback connections between visual areas V1 and V4 in the monkey. NeuroImage, 108, 460–475. https://doi.org/10.1016/j.neuroimage.2014.12.081

      Bauer, A. R., Debener, S., & Nobre, A. C. (2020). Synchronisation of Neural Oscillations and Cross-modal Influences. Trends in cognitive sciences, 24(6), 481–495. https://doi.org/10.1016/j.tics.2020.03.003

      Brang, D., Plass, J., Sherman, A., Stacey, W. C., Wasade, V. S., Grabowecky, M., Ahn, E., Towle, V. L., Tao, J. X., Wu, S., Issa, N. P., & Suzuki, S. (2022). Visual cortex responds to sound onset and offset during passive listening. Journal of neurophysiology, 127(6), 1547–1563. https://doi.org/10.1152/jn.00164.2021

      Clausner T., Marques J., Scheeringa R. & Bonnefond M (2024). Feature specific neuronal oscillations in cortical layers BioRxiv :2024.07.31.605816. https://doi.org/10.1101/2024.07.31.605816

      Diederich, A., & Colonius, H. (2008). When a high-intensity "distractor" is better then a low-intensity one: modeling the effect of an auditory or tactile nontarget stimulus on visual saccadic reaction time. Brain research, 1242, 219–230. https://doi.org/10.1016/j.brainres.2008.05.081

      Haegens, S., Nácher, V., Luna, R., Romo, R., & Jensen, O. (2011). α-Oscillations in the monkey sensorimotor network influence discrimination performance by rhythmical inhibition of neuronal spiking. Proceedings of the National Academy of Sciences of the United States of America, 108(48), 19377–19382. https://doi.org/10.1073/pnas.1117190108

      Jacoby, O., Hall, S. E., & Mattingley, J. B. (2012). A crossmodal crossover: opposite effects of visual and auditory perceptual load on steady-state evoked potentials to irrelevant visual stimuli. NeuroImage, 61(4), 1050–1058. https://doi.org/10.1016/j.neuroimage.2012.03.040

      Keitel, A., Keitel, C., Alavash, M., Bakardjian, K., Benwell, C. S. Y., Bouton, S., Busch, N. A., Criscuolo, A., Doelling, K. B., Dugue, L., Grabot, L., Gross, J., Hanslmayr, S., Klatt, L.-I., Kluger, D. S., Learmonth, G., London, R. E., Lubinus, C., Martin, A. E., … Kotz, S. A. (2025). Brain rhythms in cognition – controversies and future directions. ArXiv. https://doi.org/10.48550/arXiv.2507.15639

      Nickerson R. S. (1973). Intersensory facilitation of reaction time: energy summation or preparation enhancement?. Psychological review, 80(6), 489–509. https://doi.org/10.1037/h0035437

      Popov, T., Gips, B., Weisz, N., & Jensen, O. (2023). Brain areas associated with visual spatial attention display topographic organization during auditory spatial attention. Cerebral cortex (New York, N.Y. : 1991), 33(7), 3478–3489. https://doi.org/10.1093/cercor/bhac285

      Salagovic, C. A., & Leonard, C. J. (2021). A nonspatial sound modulates processing of visual distractors in a flanker task. Attention, perception & psychophysics, 83(2), 800–809. https://doi.org/10.3758/s13414-020-02161-5

      Scheeringa, R., Petersson, K. M., Kleinschmidt, A., Jensen, O., & Bastiaansen, M. C. (2012). EEG α power modulation of fMRI resting-state connectivity. Brain connectivity, 2(5), 254–264. https://doi.org/10.1089/brain.2012.0088

      Spaak, E., Bonnefond, M., Maier, A., Leopold, D. A., & Jensen, O. (2012). Layer-specific entrainment of γ-band neural activity by the α rhythm in monkey visual cortex. Current biology : CB, 22(24), 2313–2318. https://doi.org/10.1016/j.cub.2012.10.020

      Yang, X., Fiebelkorn, I. C., Jensen, O., Knight, R. T., & Kastner, S. (2024). Differential neural mechanisms underlie cortical gating of visual spatial attention mediated by alpha-band oscillations. Proceedings of the National Academy of Sciences of the United States of America, 121(45), e2313304121. https://doi.org/10.1073/pnas.2313304121

      Zhigalov, A., & Jensen, O. (2020). Alpha oscillations do not implement gain control in early visual cortex but rather gating in parieto-occipital regions. Human brain mapping, 41(18), 5176–5186. https://doi.org/10.1002/hbm.25183

      Zumer, J. M., Scheeringa, R., Schoffelen, J. M., Norris, D. G., & Jensen, O. (2014). Occipital alpha activity during stimulus processing gates the information flow to object-selective cortex. PLoS biology, 12(10), e1001965. https://doi.org/10.1371/journal.pbio.1001965

    1. Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation.

    2. Reviewer #3 (Public Review):

      Summary:

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI.

      Strengths:

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics.

      Weaknesses:

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone).

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology.

    3. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1)   Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2)   Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3)   Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4)   Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5)   Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have adequately responded to all comments.

      We thank Reviewer 1 for their positive assessment of our previous round of revisions.

      Reviewer #2 (Public review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      - The use of historical clinical data is very clever in this context

      - The simulations are very sophisticated with respect to trying to capture realistic population dynamics

      - The mathematical approach is simple and elegant, and thus easy to understand

      Weakness:

      The assumptions of the approach are quite strong, and the authors have made clear that applicability is constrained to individuals with immune profiles that are similar to malaria naive patients with neurosyphilis. While the historical clinical data is a unique resource and likely directionally correct, it remains somewhat dubious to use the exact estimated values as inputs to other models without extensive sensitivity analysis.

      We thank reviewer 2 for their comments on our previous round of revisions. The statement here that “it remains somewhat dubious to use the exact estimated values as inputs to other models” suggests that we may not have been sufficiently clear on how infection duration is represented in our agent-based model (ABM) of malaria population dynamics. Because our analysis uses simulated outputs from the ABM to validate the performance of the two queuing-theory methods, we believe this point warrants clarification, which we provide below.

      When simulating with the ABM, we do not use empirical estimates of infection duration in immunologically naïve individuals from the historical clinical data as direct inputs. Instead, infection duration emerges from the within-host dynamics modeled in the ABM (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision). Briefly, each Plasmodium falciparum parasite carries approximately 50-60 var genes, each encoding a distinct variant surface antigen expressed during the blood stage of infection. Empirical evidence[1,2] indicates that these var genes are expressed largely sequentially. If a host has previously encountered the antigenic product of a given var gene and retains immunity to it, subject to waning at empirically estimated rates[3,4], the corresponding parasite subpopulation is rapidly cleared. Conversely, if the host is naïve to that gene, it takes approximately seven days for the immune system to mount an effective antibody response, resulting in a rapid decline or elimination of the expressed variant[5]. This seven-day timescale aligns with the duration of each successive parasitemia peak observed in Plasmodium falciparum infections[6,7], each arising primarily from the expression of a single var gene and occasionally from a small number of var genes.

      In our previous analyses, we therefore modeled an average expression duration of seven days per gene in naïve hosts. Specifically, the switching time to the next gene was drawn from an exponential distribution with a mean of seven days. Each var gene is represented as a linear combination of two epitopes (alleles), based on the empirical characterization of two hypervariable regions in the var tag region[8], and immunity is acquired against these alleles. Immunity to one allele of a given gene reduces its average expression duration by approximately half, whereas immunity to both alleles results in an immediate switch to another var gene within the infection. Consequently, the total duration of infection is proportional to the number of unseen alleles by the host across all var genes expressed during that infection (lines 800-816, second paragraph of the subsection Within-host dynamics in Appendix 1-Simulation data of the previous revision).

      Prompted by the reviewer’s comments, in this revision we additionally tested mean expression durations of 7.5 and 8 days per var gene, together with an extension of the within-host rules. These values were applied in combination with the extended within-host rules (see the next paragraph for motivation and details). Although differences among the three mean expression durations are modest at the per-gene level, when aggregated across all var genes expressed within an individual parasite, the resulting total infection duration can differ by on the order of several months. The resulting distributions of infection duration across immunologically naïve individuals and those aged 1-5 years, together with those generated under our previous simulation settings, span a range of means and variances that lies above and below, but encompasses, scenarios comparable to the historical clinical data from naïve neurosyphilis patients treated with P. falciparum malaria. We have provided example supplementary figures illustrating that the distributions of infection duration from the simulated outputs overlap with, and closely resemble, the empirical distribution from the historical clinical data (Appendix 1-Figure 27-32).

      We considered the following modification of the within-host rules. In our previous ABM simulations, we had assumed that an infection would clear only once the parasite had exhausted its entire var gene repertoire, that is, after every var gene had been expressed and recognized. However, biological evidence indicates that clearance can occur earlier for several reasons, including stochastic extinction before full repertoire exhaustion. Even if some var genes remain unexpressed, an infection can terminate due to demographic stochasticity once parasite densities fall to very low levels. This decline in parasite densities may result from non-variant-specific immune mechanisms or from cross-immunity among var genes that share sequence similarity or alleles[9,10,11], both of which can substantially reduce parasite numbers. To model the possibility of termination or clearance before full repertoire exhaustion, we implemented a simple scenario in which there is a small probability of clearing the current infection while a given var gene-whether non-final or final-is being expressed. This probability is a function of the host’s pre-existing immunity to the two epitopes (alleles) of that gene, thereby capturing in a parsimonious manner the effects of cross-immunity among sequence- or allele-sharing var genes in reducing parasitemia. Specifically, it is modeled as a Bernoulli draw whose success probability equals the immunity level against the gene (0 for no immunity to either epitope, 0.5 for immunity to one epitope, and 1 for immunity to both epitopes) multiplied by a constant factor of 0.025. Thus, the probability scales with pre-existing variant-specific immunity to the gene but remains small overall, while introducing additional variance into the emergent distribution of total infection duration across hosts.

      We acknowledge that the ABM used to simulate malaria population dynamics cannot capture all mechanisms and complexities underlying within-host processes, many of which remain poorly understood. However, we emphasize that the resulting distributions of infection duration generated by the ABM span a broad range of means, variances, and shapes, including distributions that closely match those observed in the clinical historical data. Because the queueing-theory methods rely on only the mean and variance of infection duration to estimate the force of infection (FOI), these scenarios, which collectively span and encompass values comparable to the empirical ones, provide an appropriate basis for evaluating the performance of the methods using simulated outputs. We have added supplementary figures (see Appendix 1-Figure 16-22) illustrating the corresponding FOI inference results when we allow for clearance before the complete expression of the var repertoire, and the accuracy of FOI estimation remains comparable across all the scenarios examined.

      Finally, we emphasize that the application of the queuing-theory methods to the simulated outputs and to the Ghana field survey data involve two self-contained steps. For the simulations, FOI is inferred directly from the emergent distributions of infection duration generated by the ABM. For the Ghana surveys, FOI is inferred using the historical clinical data, which remains one of the few credible and widely used empirical sources for infection duration in immunologically naïve individuals[6]. By exploring different mean expression durations and within-host rules in the ABM, which generates distributions of infection duration that span and encompass those comparable to the empirical distribution, we demonstrate that the queueing-theory methods perform comparably across diverse scenarios and are well suited for application to the Ghana field surveys.

      We expanded the section on within-host dynamics in Appendix 1 to elaborate on this point (Lines 817-854).

      Reviewer #3 (Public review):

      I think the authors gave a robust but thorough response to our reviews and made some important changes to the manuscript which certainly clarify things for me.

      We thank Reviewer 3 for their positive feedback on our previous round of revisions.

      References

      (1) Zhang, X. & Deitsch, K. W. The mystery of persistent, asymptomatic Plasmodium falciparum infections. Curr. Opin. Microbiol 70, 102231 (2022).

      (2) Deitsch, K. W. & Dzikowski, R. Variant gene expression and antigenic variation by malaria parasites. Annu. Rev. Microbiol. 71, 625–641 (2017).

      (3) Collins, W. E., Skinner, J. C. & Jeffery, G. M. Studies on the persistence of malarial antibody response. American journal of epidemiology, 87(3), 592–598 (1968).

      (4) Collins, W. E., Jeffery, G. M. & Skinner, J. C. Fluorescent Antibody Studies in Human Malaria. II. Development and Persistence of Antibodies to Plasmodium falciparum. The American journal of tropical medicine and hygiene, 13, 256–260 (1964).

      (5) Gatton, M. L., & Cheng, Q. Investigating antigenic variation and other parasite-host interactions in Plasmodium falciparum infections in naïve hosts. Parasitology, 128(Pt 4), 367–376 (2004).

      (6) Maire, N., Smith, T., Ross, A., Owusu-Agyei, S., Dietz, K., & Molineaux, L. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. The American journal of tropical medicine and hygiene, 75(2 Suppl), 19–31 (2006).

      (7) Chen D. S., Barry A. E., Leliwa-Sytek A., Smith T-A., Peterson I., Brown S. M., et al. A Molecular Epidemiological Study of var Gene Diversity to Characterize the Reservoir of Plasmodium falciparum in Humans in Africa. PLoS ONE 6(2): e16629 (2011).

      (8) Larremore D. B., Clauset A., & Buckee C. O. A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes. PLoS Comput Biol 9(10): e1003268 (2013).

      (9) Holding T. & Recker M. Maintenance of phenotypic diversity within a set of virulence encoding genes of the malaria parasite Plasmodium falciparum. J. R. Soc. Interface.1220150848 (2015).

      (10) Crompton, P. D., Moebius, J., Portugal, S., Waisberg, M., Hart, G., Garver, L. S., Miller, L. H., Barillas-Mury, C., & Pierce, S. K. Malaria immunity in man and mosquito: insights into unsolved mysteries of a deadly infectious disease. Annual review of immunology, 32, 157–187 (2014).

      (11) Langhorne, J., Ndungu, F., Sponaas, AM. et al. Immunity to malaria: more questions than answers. Nat Immunol 9, 725–732 (2008).

    1. Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2026) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are compelling, the connection between the experimental data (Fig. 1) and the modeling work (Fig. 2-4) is convincing.

      Weaknesses:

      In the experimental data (Fig. 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I had identified three significant issues with the experiments that were addressed in the revision:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not fully link to the theory-driven work in changing environments. A full experimental investigation of this would be beyond the scope of the current work.

      (2) The temporal aspect of behavioral instability has been addressed in Figure 1F.

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). This issue has been further discussed in the revised manuscript.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In "Drift in Individual Behavioral Phenotype as a Strategy for Unpredictable Worlds," Maloney et al. (2024) investigate changes in individual responses over time, referred to as behavioral drift within the lifespan of an animal. Drift, as defined in the paper, complements stable behavioral variation (animal individuality/personality within a lifetime) over shorter timeframes, which the authors associate with an underlying bet-hedging strategy. The third timeframe of behavioral variability that the authors discuss occurs within seasons (across several generations of some insects), termed "adaptive tracking." This division of "adaptive" behavioral variability over different timeframes is intuitively logical and adds valuable depth to the theoretical framework concerning the ecological role of individual behavioral differences in animals.

      Strengths:

      While the theoretical foundations of the study are strong, the connection between the experimental data (Figure 1) and the modeling work (Figure 2-4) is less convincing.

      Weaknesses:

      In the experimental data (Figure 1), the authors describe the changes in behavioral preferences over time. While generally plausible, I identify three significant issues with the experiments:

      (1) All of the subsequent theoretical/simulation data is based on changing environments, yet all the experiments are conducted in unchanging environments. While this may suffice to demonstrate the phenomenon of behavioral instability (drift) over time, it does not properly link to the theory-driven work in changing environments. An experiment conducted in a changing environment and its effects on behavioral drift would improve the manuscript's internal consistency and clarify some points related to (3) below.

      We have added further discussion of this to the discussion section.

      (2) The temporal aspect of behavioral instability. While the analysis demonstrates behavioral instability, the temporal dynamics remain unclear. It would be helpful for the authors to clarify (based on graphs and text) whether the behavioral changes occur randomly over time or follow a pattern (e.g., initially more right turns, then more left turns). A proper temporal analysis and clearer explanations are currently missing from the manuscript.

      We have added a figure (1F to better visualize the changes in handedness over days). We have also pointed out the connection between the power spectrum and the autoregressive model given by the Wiener-Khinchen theorem (which states that the autocorrelation function of a wide-sense stationary process has a spectral decomposition of its power spectrum).

      (3) The temporal dimension leads directly into the third issue: distinguishing between drift and learning (e.g., line 56). In the neutral stimuli used in the experimental data, changes should either occur randomly (drift) or purposefully, as in a neutral environment, previous strategies do not yield a favorable outcome. For instance, the animal might initially employ strategy A, but if no improvement in the food situation occurs, it later adopts strategy B (learning). In changing environments, this distinction between drift and learning should be even more pronounced (e.g., if bananas are available, I prefer bananas; once they are gone, I either change my preference or face negative consequences). Alternatively, is my random choice of grapes the substrate for the learning process towards grapes in a changing environment? Further clarification is needed to resolve these potential conflicts.

      We have discussed this further in the discussion.

      Reviewer #2 (Public review):

      Summary:

      This is an inspired study that merges the concept of individuality with evolutionary processes to uncover a new strategy that diversifies individual behavior that is also potentially evolutionarily adaptive.

      The authors use a time-resolved measurement of spontaneous, innate behavior, namely handedness or turn bias in individual, isogenic flies, across several genetic backgrounds.

      They find that an individual's behavior changes over time, or drifts. This has been observed before, but what is interesting here is that by looking at multiple genotypes, the authors find the amount of drift is consistent within genotype i.e., genetically regulated, and thus not entirely stochastic. This is not in line with what is known about innate, spontaneous behaviors. Normally, fluctuations in behavior would be ascribed to a response to environmental noise. However, here, the authors go on to find what is the pattern or rule that determines the rate of change of the behavior over time within individuals. Using modeling of behavior and environment in the context of evolutionarily important timeframes such as lifespan or reproductive age, they could show when drift is favored over bet-hedging and that there is an evolutionary purpose to behavioral drift. Namely, drift diversifies behaviors across individuals of the same genotype within the timescale of lifespan, so that the genotype's chance for expressing beneficial behavior is optimally matched with potential variation of environment experienced prior to reproduction. This ultimately increases the fitness of the genotype. Because they find that behavioral drift is genetically variable, they argue it can also evolve.

      Strengths:

      Unlike most studies of individuality, in this study, the authors consider the impact of individuality on evolution. This is enabled by the use of multiple natural genetic backgrounds and an appropriately large number of individuals to come to the conclusions presented in the study. I thought it was really creative to study how individual behavior evolves over multiple timescales. And indeed this approach yielded interesting and important insight into individuality. Unlike most studies so far, this one highlights that behavioral individuality is not a static property of an individual, but it dynamically changes. Also, placing these findings in the evolutionary context was beneficial. The conclusion that individual drift and bet-hedging are differently favored over different timescales is, I think, a significant and exciting finding.

      Overall, I think this study highlights how little we know about the fundamental, general concepts behind individuality and why behavioral individuality is an important trait. They also show that with simple but elegant behavioral experiments and appropriate modeling, we could uncover fundamental rules underlying the emergence of individual behavior. These rules may not at all be apparent using classical approaches to studying individuality, using individual variation within a single genotype or within a single timeframe.

      Weaknesses:

      I am unconvinced by the claim that serotonin neuron circuits regulate behavioral drift, especially because of its bidirectional effect and lack of relative results for other neuromodulators. Without testing other neuromodulators, it will remain unclear if serotonin intervention increases behavioral noise within individuals, or if any other pharmacological or genetic intervention would do the same. Another issue is that the amount of drugs that the individuals ingested was not tracked. Variable amounts can result in variable changes in behavior that are more consistent with the interpretation of environmental plasticity, rather than behavioral drift. With the current evidence presented, individual behavior may change upon serotonin perturbation, but this does not necessarily mean that it changes or regulates drift.

      However, I think for the scope of this study, finding out whether serotonin regulates drift or not is less important. I understand that today there is a strong push to find molecular and circuit mechanisms of any behavior, and other peers may have asked for such experiments, perhaps even simply out of habit. Fortunately, the main conclusions derived from behavioral data across multiple genetic backgrounds and the modeling are anyway novel, interesting, and in fact more fundamental than showing if it is serotonin that does it or not.

      We have adjusted our wording and contextualized our claims based on previous literature.

      To this point, one thing that was unclear from the methods section is whether genotypes that were tested were raised in replicate vials and how was replication accounted for in the analyses. This is a crucial point - the conclusion that genotypes have different amounts of behavioral drift cannot be drawn without showing that the difference in behavioral drift does not stem from differences in developmental environment.

      We have reanalyzed the behavioral data in a hierarchical model to account for batch effects. Accounting for batch effects (Fig 1G, S1G) we still observe differences between genotypes and for pharmaceutical manipulations of serotonin, though our data provides more equivocal evidence for the effects of trh<sup>n</sup> on drift.

      Reviewer #3 (Public review):

      Summary:

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that while flies exhibit an individual turning bias (when averaged over time), their preferences fluctuate over slow timescales.

      To understand whether genetic or neuromodulatory mechanisms influence the drift in individual preference, the authors test different fly strains concluding that both genetic background and the neuromodulator serotonin contribute to the degree of drift.

      Finally, the authors use theoretical approaches to identify the range of environmental conditions under which drift in individual bias supports population growth.

      Strengths:

      The model provides a clear prediction of the environmental fluctuations under which a drift in bias should be beneficial for population growth.

      The approach attempts to identify genetic and neurophysiological mechanisms underlying drift in bias.

      Weaknesses:

      Different behavioral assays are used and are differently analysed, with little discussion on how these behaviors and analyses compare to each other.

      We have added text indicating that these two behavioral responses have previously been shown to be correlated to each other and that the spectral power analysis and autoregressive model are conceptually linked.

      Some of the model assumptions should be made more explicit to better understand which aspects of the behaviors are covered.

      We have added a table in the supplemental clarifying all of the parameters of modeling for each figure.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Highlights of the Consultation Session of 3 Reviewers

      In the consultation session, the reviewers discussed as particularly important the relative contribution of genotype and variable environment. Further analyses of the replicates of the genotypes were suggested to exclude the environment as the source of difference in the extent of drift between genotypes. If the difference in the extent of drift between replicates is greater than the difference in the extent of drift between genotypes, then one cannot really say that there is a genetic control over drift and that it would evolve (which is still an interesting result, but would be less exciting for a follow-up evolution experiment). If replicates differ, testing whether the relative difference in the extent of drift between genotypes is maintained across environments would also be strong evidence that the extent of behavioral drift is a property of a genotype and not a mere result of a fluctuating/variable environment. The authors do present two behavior paradigms that can serve the purpose of comparing the relative extent of drift between genotypes across the two paradigms that they already have. The authors might consider whether experimental data could be brought closer to theory by including an experiment in a variable environment (e.g temp or diet changes etc.).

      Reviewers also agreed in the consultation session that methods and definitions were somewhat cryptic, and it would be very helpful if they were more detailed. For example, linking the free walking analysis to the Ymaze and then the model1 to the model2 was not straightforward.

      We have added text to make more explicit the theoretical connection between the freewalking analysis, the ymaze analysis, and the model. We have added text and a supplemental table to clarify the methods.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 161: The authors state in the supplement that they used DGRP strains, which are inbred and not isogenic. According to the original authors, they possess 99.3% genetic identity. The isoD1 strain has no known crossing scheme, so complete chromosome isogeneity remains questionable, especially after 12 or more years since its creation. The authors should refer to the strains as "near-isogenic" or a similar term.

      We have adjusted the language as suggested to be more accurate.

      (2) Lines 276, 338: The manuscript contains some unfinished sentences or remnants from the drafting process (e.g., "REFREF"). A thorough editorial review is recommended to eliminate such errors.

      We have cleaned up all references and made additional passes to adjust text.

      Reviewer #2 (Recommendations for the authors):

      (1) If the authors want to claim that serotonin is a regulator of drift, they should provide a negative control experiment, using equivalent perturbations of another neuromodulator and non-modulator. Alternatively, they could simply soften the claims revolving around serotonin and its putative direct role in modulating drift.

      We have softened the claims as suggested to avoid claiming our results show a specific role for serotonin.

      (2) I would suggest always using "behavioral drift" when referring to drift, especially in the context of modeling, because it can be easily confused with genetic drift and cause confusion when reading.

      We have adjusted the language throughout the manuscript per this suggestion.

      (3) It would be good to see in the methods if the 2-hour assays were always done at the same time of the fly's subjective day and when (e.g. how many hours after lights on).

      We have clarified this.

      (4) I understand that many experiments use methodology replicated from the group's previous work, but I would recommend elaborating the experimental methods section in the supplementary such that the reader can understand and reproduce the methods without having to sift through and look for them in previous papers.

      We have expanded on our discussion of the methodology in the methods section.

      Reviewer #3 (Recommendations for the authors):

      The paper begins by analyzing the drift in individual behavior over time. Specifically, it quantifies the circling direction of freely walking flies in an arena. The main takeaway from this dataset is that flies exhibit an individual turning bias (when averaged over time), yet their preferences fluctuate over slow timescales. However, it's unclear why the authors chose to switch to a different assay to compare strains. In particular, it's ambiguous whether the behavioral measure in one setup is comparable to that in the other; specifically, whether a bias in one setup reflects the same type of bias in the other. The behavior is also sampled differently across setups (though the details are unclear; see comments below) and analyzed using different methods. Consequently, it remains uncertain whether the slow fluctuations observed in the arena setup are also present in the Y maze. It appears that the analysis of the Y maze data only addresses individual behavioral variance or, at most, day-to-day changes, without accounting for longer-term correlations in bias-which I understood to be the primary interest in the arena setup. Some clarification is needed here (see specific comments below).

      In Figure 2, the authors attempt to show the potential advantage of individual drift for survival in unpredictable, fluctuating environments. They demonstrate that while bet-hedging provides an advantage over timescales matching the generation time (since reproduction is required), it offers less benefit on shorter timescales, where an increased individual drift could be advantageous. This approach is well-conceived, and the findings are convincing, though the model would benefit from further clarification and additional explanation in the text.

      Here are some more specific comments:

      PART 1:

      (1) L 223 one probably cannot see a circadian peak at 24h if the data were filtered at 24h, did they look with another low pass cutoff?

      We clarified in the text that the power spectrum analysis was performed on unfiltered data.

      (2) L 243 the spread in standard deviation is said to be consistent with drifting bias, however, I do not agree with this. The variation could be stochastic but independent across days, and show no temporal correlation. As done with the circular arena, a drift should be estimated as a temporal correlation in the behavior.

      It is consistent insofar as seeing a non-zero standard deviation is a necessary condition for drift. While it does not show that there is any consistency over time, this can be inferred from the autoregressive model (as well as previous work). We have added text to make this clearer.

      (3) In the autoregressive model this temporal aspect seems to be incorporated only to the first order (from day to day). Therefore, from what I understand, the drift term is not correlated over time. This seems very different from the spectral analysis done in the circular assay, and I wonder if it fits at all the initial definition of drift. For example, is the model compatible with a fixed mean and a similar power spectrum as in Figure 1C? The text should clarify that.

      can be made clear in the case of σ = 0 and ϕ = 1, where values wouldϕ ≠ be0 In an AR(1) process, datapoints day to day are correlated as long as . This perfectly correlated with each other across time. The AR(1) model and the PSD of circling can be related via the Wiener-Khinchin theorem. We have added text to make this connection clear.

      (4) Did serotonin have no role in turning bias? My understanding of previous work was that serotonin should affect the bet-hedg variance as well - the authors should discuss what is expected or not, especially given that the pharmacological and genetic approaches do not have the same effect on bet-edging (Figure 1H-I).

      As the pharmacological methods were only applied after eclosion, we do not find it surprising that we do not measure differences in the initially measured distribution of handedness in that case. We do see more evidence of it in the mutations, though the trh<sup>n</sup> experiments provide a less clear effect after our adjustments to account for batch effects.

      (5) Methods: It is unclear how flies were handled across days; e.g. in Y mazes: 2h each day for how many days? In the arena flies were imaged either twice daily for 2h per session, or continuously for 24h (L138) - but which data are used where?

      We will make this more clear, but all data in figure 1 was the continuous 24h data

      This part of the methods is not well explained and I think it should be described in more detail.

      (6) How many flies per genotype were tested in fig 1E?

      Information was added to the caption to duplicate information in the table.

      PART 2:

      (7) In Figure 2B I do not understand the formulation N(50−ϕ: 50, σ), N(phi-et: et, σ) or in general N(x: m, s): does this mean that the variable x has normal distribution with mean m and variance s? Usually this would be written as N(x|m, s) or N(x; m, s)

      If so then: N(50−ϕ: 50, σ) = N(ϕ: 0, σ) which has mean=0 while the figure caption says "from a normal distribution centred on the long term environmental mean" - what is the long term environmental mean?

      If this is correct, and, therefore, we are just centering the mean, what about N(et-phi: et, σ)?

      Et is the environment at the time, not the mean of the environment (which is 50). We have added more detail in supplementary methods to address this.

      (8) Should ϕ vary between 1-100? And is the environmental parameter in Figure 2C also varying between 1-100? These ranges should be written somewhere.

      While implied in the sigma notation, we have added more detail in supplementary methods to explain the situation.

      (9) As far as I understand the bounding envelope in Figure 2B is necessary to contain the drift model. In Figure 1F, a bounding effect was generated by the "tendency to revert to no bias." It is unclear to me whether these two formulations are equivalent. Moreover, none of these two models might be able to recapitulate the correlations observed in the circular arena and analyzed spectrally in Figure 1C. It would be necessary that the author make an effort to relate these models/quantifications one to another. My understanding of Figure 1B is that there are slow fluctuations around the mean. Is the bounded drift model in 2B not returning to the same mean? And do these models generate slow fluctuations? Further explanation could help clarify these points.

      We have added additional explanation to explain the connection between the power spectrum and the two methods of (phi and bounding envelop) of establishing stationarity.

      (10) Expanding on the above: I thought that the definition of individuality is based on some degree of stability over days. However, both models assume drift to occur from day to day (and also the analysis of the DGRP lines assumes so). Some clarification here could help: is the initial bet-edging variation maintained in the population? And is the mean individual bias still a thing or it is just drifting away all the time?

      The initial bet-hedging is maintained to some degree, based on the parameter of phi and the bounding envelope. We have added text to make this clearer.

      (11) In both Figures 2C and 2E the populations are always shrinking, is that correct? And if so, is it expected? Does the model allow growth in a constant environment?

      As the plotted values are the log, the optimal environments do allow growth (visible more clearly in 2D). We have added some text to make this clearer.

      (12) Growth is quantified only across 100 days (Figure 2D) but at day 100 there is not something like a steady state, how is 100 chosen? Would it make sense to check longer times to see if the system eventually takes off? And if not, why?

      (13) Related to the above: what is the growth range achieved in Figure 3A-B? Is the heatmap normalized to the same value across conditions? I think it would be important to consider the absolute range of variation of growth or at least the upper value across conditions.

      Moreover: is growth quantified at day 100? What happens at longer times? Does the temporal profile of the growth curve differ across environmental conditions? (I'm referring to a Figure as 2D).

      As we are plotting the log change, we are ultimately showing the growth rate. While a more realistic model would involve carrying capacity, we believe a simplified model showing growth or no growth captures the difference in growth rate between different strategies. We have added some text to make this clearer.

      (14) Suddenly at line 502, sexual maturity is introduced as a parameter, which was never mentioned before, called a_min in the figure legend of panel 3a, but it is unclear where this is in the model. And please also clarify if sex maturity is the same as generation time.

      Sexual maturity is the same as generation time, we have standardized terminology throughout the paper.

      (15) Regarding lines 505-508, could one simply conclude that in this model formulation, the generation time has the effect of a low pass filter on environmental fluctuation? The question is: is this filtering effect the only effect of generation time?

      While this seems to capture the high-frequency effect we see, it does not explain the shift from bet-hedging->drift we see at lower-frequency environmental fluctuations.

      (16) What reproductive rate is used for the PCA analysis? Is the variance associated with the drift so low because of choosing a fast reproductive rate? A comment in the main text would be helpful.

      We have clarified that these plots were done at 10 days.

    1. Reviewer #3 (Public review):

      Summary:

      This manuscript is focused on studying the spatial frequency selectivity of individual neurons in the mouse primary visual cortex (V1) in the anesthetized and awake brain states using 2-photon calcium imaging. Although previous studies have demonstrated that anesthesia decreases both size tuning and spatial selectivity in V1 neurons, the strength of this study is its focus on characterization of the same neurons in awake and anesthetized states in combination with transgenic mouse lines selectively labeling pan-inhibitory neurons and also more specific neuronal subtypes, including parvalbumin-positive (PV+) or somatostatin-positive (SOM+) interneurons. A combination of these methodologies allows for a more in-depth mechanistic study of the properties of different types of neurons. The main findings suggest that in excitatory neurons, anesthesia leads to a shift in preferred SF and broadening of SF tuning, with no changes in orientation and direction selectivity. Downward shift in preferred SF was more pronounced in both SOM+ and PV+ interneurons.

      Strengths:

      (1) 2-photon calcium imaging with single-cell resolution.

      (2) Characterization of excitatory and two types of inhibitory neurons.

      Weaknesses:

      (1) VIP interneurons are critical to the neural circuit, and their characterization would be critical to the mechanistic understanding of this process, but is missing.

      (2) Unfortunately, the manuscript does not lead to an additional insight into the nature of this anesthesia-induced shift in SF preference.

      (3) Furthermore, it also doesn't help understand how SF preference is encoded in V1.

      (4) Finally, some critical histological controls are missing.

    2. Author response:

      Thank you for the eLife assessment and the constructive reviews. We appreciate the reviewers’ valuable insights and the time they dedicated to providing such thoughtful feedback on our manuscript. The reviewers highlighted the technical rigor of our study, specifically the tracking of individual neurons across both anesthetized and awake states using two-photon imaging. They also emphasized the importance of our cell-type-specific analysis (excitatory, PV, and SOM neurons) and noted that the study provides solid evidence for isoflurane-induced shifts in preferred spatial frequency (SF).

      Based on our team's evaluation of the reviewers' comments, we would like to outline our planned revisions.

      (1) Expanded Population and Single-Neuron Analysis

      We will re-analyze our dataset to include all neurons that were responsive under anesthesia, in the awake state, or both. This will ensure our findings accurately represent the entire population of visually responsive neurons. We will also provide examples of individual tuning curves to clarify the relationship between tuning shape and SF shifts in individual neurons.

      (2) Addressing Methodological Scope and Behavioral Metrics

      Receptive Field Size and Dynamics: While we did not utilize a stimulus set specifically designed to map receptive field (RF) sizes, we intend to examine how other functional parameters co-varied with the shift in preferred SF within each cell type. Furthermore, although characterizing the precise temporal dynamics during anesthesia onset presents technical challenges, we will attempt to analyze the time-dependence of the observed changes to provide deeper insight into the transition between states.

      Behavioral Metrics: While pupil size is a well-established proxy for brain state, we will explore the inclusion of other available behavioral parameters.

      (3) Cell-type Specificity (SOM, PV, and VIP)

      SOM vs. PV Comparison: We will perform a detailed comparison of preferred SFs between SOM and PV interneurons, including those responsive only under anesthesia or only in the awake state.

      VIP Neurons: While VIP neurons are known to play critical roles in cortical circuits, such as disinhibition, we have decided not to conduct new recordings for VIP interneurons in the present study. Based on existing literature, the proportion of visually responsive VIP cells is too low to yield statistically reliable conclusions for this specific study (de Vries et al., Nature Neuroscience 23, 138-151, 2020). Additionally, we intend to focus our analysis on inhibitory interneuron subtypes that provide direct input to pyramidal cells.

      Histology: We will provide additional histological validation.

      (4) Refined Framing

      As suggested, we will focus the manuscript strictly on isoflurane anesthesia. This includes updating the title and abstract to reflect this specificity and discussing how our results compare with other anesthetics like urethane. Furthermore, we will substantially deepen our discussion on the potential mechanisms by which anesthesia induces a downward shift in preferred spatial frequency.

      We believe these additions will significantly strengthen the manuscript.

    1. Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Evaluation of Antibiotic and Peptide Vaccine Strategies for Mirror Bacterial Infections" addresses a topic that is well established in the literature. The authors investigate the activity of enantiomeric (D-form) antibiotics against bacteria and the immunogenicity of D-form peptides, proposing that D-enantiomers are ineffective both as antibacterial agents and as vaccine candidates. While the subject matter is relevant, the concepts explored are already well known, and the manuscript offers limited novelty.

      The authors demonstrate that D-enantiomeric antibiotics lack antibacterial activity compared to their naturally occurring L-forms and that D-form peptides fail to elicit detectable immune responses. These observations are consistent with existing knowledge regarding molecular chirality in biological systems. However, the manuscript relies on a limited experimental dataset while extrapolating the findings broadly, which weakens the strength of the conclusions.

      Strengths:

      The manuscript introduces the topic of Mirror Bacterial Infections, likely to occur if no regulations or restrictions are placed immediately.

      The manuscript addresses a relevant topic and has potential value, particularly in framing discussions around chirality and pathogen interactions. With a more cautious interpretation of the results, the manuscript could better justify its conceptual framework and strengthen its contribution to the field.

      Weaknesses:

      (1) Several sections of the manuscript are overly descriptive and would benefit from deeper comparative analysis and critical synthesis. In multiple instances, the discussion relies on hypothetical scenarios supported primarily by selective citations rather than robust experimental evidence. The introduction of the term "mirror microbiology" or "mirror bacteria" appears largely conceptual and is used to unify what are essentially two separate lines of investigation, enantioselective antibiotic activity and peptide chirality in immune recognition, without sufficient mechanistic integration.

      (2) To the best of this reviewer's understanding, the manuscript does not present substantial novelty. The pronounced differences in biological activity between L- and D-forms of small molecules and peptides are well documented, including their implications for antimicrobial efficacy and immune recognition. While the manuscript is written in clear and accessible language suitable for both specialists and interdisciplinary readers, novelty remains limited.

      The manuscript reiterates well-established principles of stereochemistry and biological recognition. Given the extensive existing literature demonstrating that enantiomeric antibiotics are typically inactive due to stereospecific target interactions, the failure of D-form antibiotics is expected and does not constitute a novel finding.

      (3) Critical experimental details are lacking, particularly regarding the peptide design. It is unclear whether the peptides were synthesized entirely in the D-configuration or whether only select amino acids were substituted. This distinction is essential for interpreting immunogenicity results and for comparison with prior studies.

      (4) The authors conclude that D-form peptides are poorly recognized by the immune system. However, the data presented indicate that neither the L- nor the D-form peptides tested elicited a measurable immune response. Without demonstrating immunogenicity of the corresponding L-form peptides, the conclusion that immune non-recognition is specific to the D-form is not sufficiently supported.

    2. Reviewer #2 (Public review):

      This paper by Kleinman et al. tackles an increasingly discussed biosecurity scenario, namely the possibility that "mirror bacteria" could evade key elements of host immunity and therefore demand bespoke medical countermeasures. The authors experimentally probe two such countermeasure concepts: (1) whether existing chiral antibiotics might still work against mirror bacteria (this is tested indirectly by measuring the activity of antibiotic enantiomers against natural-chirality bacteria), and (2) whether D-peptide antigens can be made immunogenic. Briefly, the authors show that enantiomers of four approved antibiotics have little to no activity in MIC assays, argue this implies the parent drugs would likely fail against mirror bacteria, report limited single-dose tolerability data for the enantiomers in mice, and show that selected bacterially derived D-peptides can elicit strong binding antibody titers when conjugated to a carrier protein and given with adjuvant.

      Overall, the study is quite interesting but constrained by the fact that D-peptide immunogens and related ideas have been explored for decades, by prior literature showing that D-enantiomeric peptides can themselves be strongly antimicrobial vs conventional bacteria, and by a number of conceptual and experimental limitations outlined below.

      (1) A blanket statement indicating that flipping chirality makes antibiotics ineffective cannot be true across all classes. Indeed, there is extensive precedent for "mirror" (D-amino-acid) peptides that retain, or even improve, antimicrobial activity against natural bacteria.

      (2) The paper's key claim ("parent antibiotics won't work on mirror bacteria") is based on the observation that the enantiomers of chloramphenicol/linezolid/tedizolid/aztreonam largely lose activity against natural bacteria. This is a reasonable proxy experiment given the absence of mirror organisms, but it remains an inference and should be described as such.

      (3) The chiral purity needs to be documented more rigorously. The methods mention structural confirmation by NMR and >95% purity by LC-MS/HPLC for enantiomeric compounds, but this is not the same as demonstrating high enantiomeric excess or excluding low-level contamination by the active parent enantiomer.

      (4) The residual activity of ent-aztreonam is quite interesting. The authors report slight activity for ent-aztreonam (MIC of 32-128 µg/mL in a subset), still far weaker than aztreonam but nonzero.

      (5) For antibiotics, MIC is a starting point, but further experiments are needed. To justify countermeasure relevance, it would help to include at least one additional pharmacodynamic readout (time-kill kinetics, post-antibiotic effect, inoculum effect, or activity in the presence of human serum).

      (6) The acute toxicity study is limited (single-dose, short follow-up, small n, one sex/strain, and no histopathology).

      (7) The Discussion leans on human equivalent dosing logic to reassure feasibility. Given the lack of PK, bioavailability, metabolism, and repeat-dose data, these comparisons risk overreach.

      (8) The readout is ELISA endpoint binding (IgG; and IgA in BALF for one antigen), which is fine for an initial immunogenicity screen. But the manuscript then drifts toward "vaccine strategy" claims without showing any antibody functionality (opsonophagocytosis, complement deposition, neutralization, blocking adhesion, and so on) or even binding to a more native-like antigen format (e.g., D-peptide displayed on particles; D-protein fragments; or any surrogate that goes beyond plate-bound peptide).

      (9) The methods report peptide conjugates containing ~10-200 EU/mL endotoxin. That is not trivial and could materially amplify immunogenicity, and should be discussed.

      (10) The authors should report how many technical/biological replicates were performed for MIC determinations and for ELISAs.

    1. Reviewer #1 (Public review):

      Summary:

      The authors provide in vivo and in vitro evidence for an interaction between AIRE and AID. This has implications for the dynamics of the germinal center response and autoimmunity related to the APSI disease.

      The manuscript describes an unexpected function of AIRE, which is more well known for its function to regulate negative selection of T cells in the thymus. Here, the gene has also been shown to be expressed by B cells (Immunity 2015: 26070482). They describe that AIRE interacts with AID, and in its absence, B cells acquire more hypermutations and also produce auto-antibodies against IL-17. These autoantibodies have been described previously.

      Strengths:

      The study is interesting and provides some additional information about how AIRE regulates immune cell function. Several biochemical and in vivo experiments show the interaction and the function of AIREs in the regulation of AID activity in the GC response.

      Weaknesses:

      Some of the hypothetical consequences of this regulation are not investigated. This includes responses to model antigens and dynamics of the germinal center related to kinetics.

      Major Comments:

      (1) AID regulates both switch and somatic hypermutation. Switch is easier to achieve, so which of these processes does AIRE influence the most? Also, the switch is thought to occur before the B cell enters the GC. Looking at the histology, is AIRE also expressed at the early proliferative stage that has been described by Ann Haberman?

      (2) In experiments determining anti-CD40-dependent upregulation of AIRE, naïve resting B cells were used from mice. A proportion of the B-cells got activated. Are these MZB or FOB cells as MZBs are more easily activated?

      (3) In the BM chimeric experiments in Figure 3. Do the AIRE+ and AIRE - populations distribute equally among B cell subpopulations?

      (4) Furthermore, in the NP-KLH experiments, one would expect that B cells with increased affinity would leave the GC earlier and become plasma cells. Thus, the kinetics of the AIRE+ vs AIRE- B cells within the GC would be different? Also, would they maybe take over at some point, as the increased affinity would favor help from Tfh cells that are known to be limited?

      (5) Given the previous studies on AIRE's function in regulating transcription (PMID: 34518235), how does this interaction fit into this picture?

      (6) In the uracil experiments, the readout for AID to induce double-stranded breaks could be tested.

      (7) The candida experiments are a nice connection to the situation in patients. However, why is it mostly auto-antibodies against IL-17? How about other immune responses, as well as T cell-independent type I and II responses?

    1. Reviewer #1 (Public review):

      Summary:

      Knowing that small pupil-size variations accompany brightness variations (even when these are illusory), the authors asked whether pupil constrictions would accompany the synesthetic perception of a brighter color (compared with a darker one), induced by the presentation of a black-white character. This grapheme-colour synesthesia is only experienced by a few participants, sixteen of whom were enrolled in this study. The results reliably showed that a relative pupil constriction would "betray" the perception of a brighter color in these participants, while no such effect would be observed in control participants who were asked to report a color in association with each grapheme, even though they did not perceive any.

      Strengths:

      The main strength of the study lies in its combination of psychophysics (brightness ratings) and pupillometry, which allowed for showing clear-cut results.

      Weaknesses:

      Some relatively minor weaknesses concern the ancillary analyses, which tackle secondary questions and are not entirely convincing.

      (1) The linear mixed model approach is a powerful way to identify important variables, but it does not clarify whether the key factors are between-subject or between-trial variations. Some variables are inherently defined at a subject level (e.g., PA scores), others are not. I would strongly recommend an alternative visualisation of the results to examine inter-individual variability.

      (2) It is not clear why taking the first derivative of pupil size in Figure 5 would isolate the effect of arousal, eliminating those of luminance and contrast changes (in fact, one could argue for the opposite, since arousal effects are generally constant for extended periods of time while contrast effects are typically more local and transient).

      (3) It is a pity that responses to physical brightness modulations were only measured in the synesthete group, not in controls, as this would have allowed for ruling out differences in pupil reactivity across the two populations.

      (4) Another concern is with the visualisation of the pupil traces in Figure 3 (main results); these were heavily pre-processed (per-participant demeaned), losing any feature besides the effect of interest and generating the unrealistic expectation that perception of dark/bright colors generate a net dilation/constriction of the pupil - whereas perception-related modulations of pupil size are always relative and generally small compared to the numerous other effects registered in pupil size. It would be far better to see the actual profiles, preserving the unfolding of dilations and constrictions over time, especially since these are further analysed in Figures 4 and 5.

      Impact:

      Despite these weaknesses, and especially if they are adequately addressed in the review, this work is likely to improve our understanding of synesthesia, providing a new tool to quantify the subjective sensations; an interesting potential extension would be using pupillometry for tracking changes over time of the synesthetic experiences, opening up the possibility to evaluate the importance of learning for this peculiar experience.

    1. a lấy ngẫu nhiên ra từng sản phẩm một để kiểm tra cho đến khi nào gặp sản phẩm giảthì dừng. Tính kỳ vọng của số sản phẩm thật đã kiểm tra

      Trong lý thuyết xác suất, biến thể mà bạn nhắc đến chính là Phân phối Hình học (Geometric Distribution) nhưng tập trung vào số lần thất bại trước khi có thành công đầu tiên. Sự khác biệt cụ thể như sau: 1. Định nghĩa biến ngẫu nhiên : Biến thể thông thường ( ): Tổng số lần thử cho đến khi thành công (bao gồm cả lần thành công). Giá trị Biến thể của bạn ( ): Tổng số lần thất bại trước khi thành công. Giá trị Mối liên hệ: . 2. Hàm khối xác suất (PMF): Nếu gọi là xác suất thành công, công thức sẽ là:

      (Giải thích: Bạn có lần thất bại liên tiếp, sau đó là 1 lần thành công). 3. Các đặc trưng cơ bản: Kỳ vọng (Giá trị trung bình): (thay vì như biến thể cũ). Phương sai: (vẫn giữ nguyên như biến thể cũ). Ví dụ nhanh: Nếu bạn ném bóng rổ với tỉ lệ vào rổ là (20%): Trung bình bạn cần ném 5 lần để có quả vào rổ đầu tiên ( ). Nhưng trung bình bạn sẽ trượt 4 lần trước khi thấy quả bóng vào rổ ( ).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work, Huang et al. revealed the complex regulatory functions and transcription network of 172 unknown transcriptional factors (TFs) in Pseudomonas aeruginosa PAO1. They have built a global TF-DNA binding landscape and elucidated binding preferences and functional roles of these TFs. More specifically, the authors established a hierarchical regulatory network and identified ternary regulatory motifs, and co-association modules. Since P. aeruginosa is a well known pathogen, the authors thus identified key TFs associated with virulence pathways (e.g., quorum sensing [QS], motility, biofilm formation), which could be potential drug targets for future development. The authors also explored the TF conservation and functional evolution through pan-genome and phylogenetic analyses. For the easy searching by other researchers, the authors developed a publicly accessible database (PATF_Net) integrating ChIP-seq and HT-SELEX data.

      Strengths:

      (1) The authors performed ChIP-seq analysis of 172 TFs (nearly half of the 373 predicted TFs in P. aeruginosa) and identified 81,009 significant binding peaks, representing one of the largest TF-DNA interaction studies in the field. Also, The integration of HT-SELEX, pan-genome, and phylogenetic analyses provided multi-dimensional insights into TF conservation and function.

      (2) The authors provided informative analytical Framework for presenting the TFs, where a hierarchical network model based on the "hierarchy index (h)" classified TFs into top, middle, and bottom levels. They identified 13 ternary regulatory motifs and co-association clusters, which deepened our understanding of complex regulatory interactions.

      (3) The PATF_Net database provides TF-target network visualization and data-sharing capabilities, offering practical utility for researchers especially for the P. aeruginosa field.

      Thank you for your positive feedback!

      Weaknesses:

      (1) There is very limited experimental validation for this study. Although 24 virulence-related master regulators (e.g., PA0815 regulating motility, biofilm, and QS) were identified, functional validation (e.g., gene knockout or phenotypic assays) is lacking, leaving some conclusions reliant on bioinformatic predictions. Another approach for validation is checking the mutations of these TFs from clinical strains of P. aeruginosa, where chronically adapted isolates often gain mutations in virulence regulators.

      Thank you for this valuable suggestion. We have performed the EMSA experiment to validate the binding result and also constructed the mutants for further functional validation. The details can be found in Figure S5.

      (2) ChIP-seq in bacteria may suffer from low-abundance TF signals and off-target effects. The functional implications of non-promoter binding peaks (e.g., coding regions) were not discussed.

      Thank you for this insightful comment regarding ChIP-seq data quality and non-promoter binding events. While we acknowledge that completely eliminating all non-specific binding signals is technically challenging in bacterial ChIP-seq experiments, we implemented stringent quality control measures including replicates, negative controls, and FDR cutoffs to minimize false positives.

      Although the coding binding peaks represent a smaller fraction of total binding events, they are functionally significant rather than mere technical artifacts. Our previous work systematically demonstrated that bacterial TFs can bind to coding sequences and regulate gene expression through multiple mechanisms, including modulating cryptic promoter activity and antisense RNA transcription, hindering transcriptional elongation, and influencing translational efficiency[1]. We have now expanded the Discussion section to address these regulatory mechanisms.

      (3) PATF_Net currently supports basic queries but lacks advanced tools (e.g., dynamic network modeling or cross-species comparisons). User experience and accessibility remain underevaluated. But this could be improved in the future.

      Thank you for this constructive feedback on PATF_Net. We acknowledge that more advanced features would further enhance the platform’s utility. To enhance the utility of PA_TFNet, we have implemented two new features: (1) a virulence pathway browser that allows users to explore TF binding across curated gene sets for key virulence pathways (quorum sensing, secretion systems, biofilm, motility, etc.), and (2) a target gene search function that enables rapid identification of all TFs regulating any gene of interest by locus tag query.

      Achievement of Aims and Support for Conclusions

      (1) The authors successfully mapped global P. aeruginosa TF binding sites, constructed hierarchical networks and co-association modules, and identified virulence-related TFs, fulfilling the primary objectives. The database and pan-genome analysis provide foundational resources for future studies.

      (2) The hierarchical model aligns with known virulence mechanisms (e.g., LasR and ExsA at the bottom level directly regulating virulence genes). Co-association findings (e.g., PA2417 and PA2718 co-regulating pqsH) resonate with prior studies, though experimental confirmation of synergy is needed.

      Thank you for your positive feedback! We have added experimental validation in the Results section.

      Impact on the Field and Utility of Data/Methods

      (1) This study fills critical gaps in TF functional annotation in P. aeruginosa, offering new insights into pathogenicity mechanisms (e.g., antibiotic resistance, host adaptation). The hierarchical and co-association frameworks are transferable to other pathogens, advancing comparative studies of bacterial regulatory networks.

      (2) PATF_Net enables rapid exploration of TF-target interactions, accelerating candidate regulator discovery.

      Thank you for your positive feedback!

      Reviewer #3 (Public review):

      Summary:

      The authors utilized ChIP-seq on strains containing tagged transcription factor (TF)-overexpression plasmids to identify binding sites for 172 transcription factors in P. aeruginosa. High-quality binding site data provides a rich resource for understanding regulation in this critical pathogen. These TFs were selected to fill gaps in prior studies measuring TF binding sites in P. aeruginosa. The authors further perform a structured analysis of the resulting transcriptional regulatory network, focusing on regulators of virulence and metabolism, in addition to performing a pangenomic analysis of the TFs. The resulting dataset has been made available through an online database. While the implemented approach to determining functional TF binding sites has limitations, the resulting dataset still has substantial value to P. aeruginosa research.

      Strengths:

      The generated TF binding site database fills an important gap in regulatory data in the key pathogen P. aeruginosa. Key analyses of this dataset presented include an analysis of TF interactions and regulators of virulence and metabolism, which should provide important context for future studies into these processes. The online database containing this data is well organized and easy to access. As a data resource, this work should be of significant value to the infectious disease community.

      Thank you for your positive feedback!

      Weaknesses:

      Drawbacks of the study include 1) challenges interpreting binding site data obtained from TF overexpression due to unknown activity state of the TFs on the measured conditions, 2) limited practical value of the presented TRN topological analysis, and 3) lack of independent experimental validation of the proposed master regulators of virulence and metabolism.

      We thank the reviewer for summarizing these key concerns. We acknowledge the limitations raised regarding TF overexpression, TRN topological analysis interpretation, and experimental validation. We provide detailed point-by-point responses to each of these concerns in our replies to the specific comments below, where we explain our rationale, the measures taken to address these limitations, and our plans for improvement.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Future Directions for the authors to consider for next steps:

      (1) Key TFs (e.g., PA1380, PA5428) should be validated via gene knock out experiments, fluorescent reporter assays, or animal models to confirm roles in virulence pathways.

      Thank you for this important suggestion. We agree that experimental validation is essential to confirm their regulatory roles and biological functions.

      Firstly, we selected a subset of key TFs, including PA0167, PA1380, PA0815, and PA3094, and performed Electrophoretic Mobility Shift Assays (EMSA) experiments to validate their direct binding to target promoters. These results confirmed the ChIP-seq-identified interactions and are now included as Figure S5A-F.

      We also constructed a clean deletion mutant of PA1380 and PA 3094 (ΔPA1380 and ΔPA3094) and their complementary strains (ΔPA1380/p and ΔPA3094/p). We then performed RT-qPCR analysis to validate their regulatory effects on key target genes. We found that PA1380 positively regulate the expression of cupB1 and cupB3 genes (Figure S5F). While the CupB cluster was known not be as important as CupA cluster in the biofilm information, so we did not find significant difference in biofilm formation between WT and ΔPA1380. Additionally, we found TF PA3094 also positively regulate lecA expression, which were shown in Figure S5G.

      We agree that comprehensive functional validation, including animal model studies, would further strengthen the biological significance of these findings. Such experiments are currently underway in our laboratory and will be the subject of follow-up studies.

      We have revised the Results section and Method section to include these validation experiments and their implications. Please see Figure S5 and Lines 283-300.

      “To experimentally validate the regulatory interactions identified by ChIP-seq, we performed biochemical and genetic analyses on selected TFs. First, we conducted Electrophoretic Mobility Shift Assays (EMSA) for four TFs, including PA0167, PA0815, PA1380, and PA3094, using DNA fragments containing their predicted binding sites from target gene promoters. These TFs showed specific binding to their cognate DNA sequences (Figure S5A-D), confirming the direct binding of the ChIP-seq-identified interactions.

      To further validate the functional regulatory roles of these TFs, we constructed clean deletion mutants of PA1380 and PA3094 (ΔPA1380 and ΔPA3094) along with their complemented strains (ΔPA1380/p and ΔPA3094/p). RT-qPCR analysis revealed that PA1380 positively regulates the expression of cupB1 and cupB3 (Figure S5E), two genes within the CupB fimbrial cluster identified as ChIP-seq targets. Similarly, PA3094 was confirmed to positively regulate lecA expression (Figure S5F), which encodes a lectin involved in biofilm formation and host interactions[2]. Expression of these target genes was restored to wild-type (WT) levels in the complemented strains, validating the regulatory relationships predicted by ChIP-seq. These combined biochemical and genetic validations demonstrate the accuracy and biological relevance of our TF binding data.”

      (2) Non-promoter binding events (e.g., coding regions) may regulate RNA stability, warranting integration with translatomics or epigenomics data.

      Thank you for this suggestion. We have now expanded the Discussion section to address this comment. Please see Lines 478-482.

      “Our analysis revealed that TF binding events occur within coding regions, which is consistent with our previous study demonstrating that bacterial TFs possess binding capabilities for coding regions and can regulate transcription through multiple mechanisms [1]. Besides, it may also regulate RNA stability, warranting integration with translatomics or epigenomics data.”

      (3) Incorporate strain-specific TF data (e.g., clinical isolates) and dynamic visualization tools to broaden PATF_Net's applicability.

      Thank you for this constructive suggestion. To enhance the utility of PA_TFNet, we have implemented two new features: (1) a virulence pathway browser that allows users to explore TF binding across curated gene sets for key virulence pathways (quorum sensing, secretion systems, biofilm, motility, etc.), and (2) a target gene search function that enables rapid identification of all TFs regulating any gene of interest by locus tag query. These features are now live on the database and described in the revised manuscript.

      Regarding strain-specific TF data, we agree this would be valuable for understanding regulatory diversity in clinical isolates. However, such an expansion would require ChIP-seq profiling across multiple strains. The current dataset is based on the reference strain PAO1, which serves as the foundation for most P. aeruginosa research and allows direct comparison with existing genomic and functional studies. We have added a statement in the revised manuscript acknowledging this limitation and highlighting strain-specific TF analysis as an important future direction for the field. Please see Lines 372-390.

      “The database offers multiple search modalities to facilitate data exploration: users can perform TF-centric searches to query binding sites, target genes, and regulatory networks for individual TFs, or utilize the target gene search function to identify all TFs that regulate any gene of interest by entering its locus tag. To connect regulatory data with biological function, we have implemented a virulence pathway browser that allows users to explore TF binding patterns across curated gene sets for major P. aeruginosa virulence pathways. Interactive visualization tools, including network graphs and binding profile plots, facilitate intuitive exploration of regulatory relationships. The primary purpose of PATF_Net is to store, search, and mine valuable information on P. aeruginosa TFs for researchers investigating P. aeruginosa infection. The current resource is based on the reference strain PAO1, which serves as the foundation for most P. aeruginosa molecular studies and allows direct integration with existing genomic annotations and functional data. However, P. aeruginosa exhibits substantial genomic diversity across clinical isolates, and strain-specific differences in TF binding patterns may contribute to phenotypic variation in virulence, antibiotic resistance, and host adaptation. Extension of this resource to include strain-specific regulatory maps from diverse clinical isolates would provide valuable insights into the regulatory basis and represents an important direction for future investigation.”

      (4) Phylogenetic analysis highlights TF conservation in bacteria; future work could explore functional homology in other Gram-negative pathogens (e.g., E. coli).

      Thank for this insightful suggestion. Our phylogenetic analysis revealed that P. aeruginosa TFs exhibit varying degrees of conservation across bacterial species, with some showing broad distribution across Gram-negative pathogens while others are lineage-specific.

      We agree that exploring functional homology of orthologous TFs across species would be highly valuable. Such comparative studies could address whether conserved TFs regulate similar target genes and biological processes across species, or whether regulatory networks have been rewired during evolution. For example, comparative ChIP-seq analysis of P. aeruginosa TFs and their orthologs in Klebsiella pneumoniae or even Gram-positive pathogen like Bacillus cereus could reveal conserved regulatory modules governing universal virulence or metabolic strategies versus species-specific adaptations. This represents an important direction for future investigation and would be facilitated by the comprehensive TF binding dataset we provide here. We have expanded the Discussion section to highlight this future direction. Please see Lines 539-550.

      “While our phylogenetic analysis reveals varying degrees of TF conservation across bacterial species, the functional implications of this conservation remain to be fully explored. Many P. aeruginosa TFs have clear orthologs in both Gram-negative (e.g., Klebsiella pneumoniae) and Gram-positive pathogens (e.g., Bacillus cereus), yet whether these orthologs regulate similar target genes and biological processes is largely unknown. Future comparative ChIP-seq profiling of orthologous TFs could reveal the extent to which regulatory network architecture is conserved versus rewired during bacterial evolution, potentially identifying core regulatory modules governing universal bacterial strategies versus species-specific innovations. Such cross-species comparisons would enhance our understanding of regulatory network evolution and enable functional prediction in less well-characterized pathogens based on homology to experimentally validated P. aeruginosa regulators.”

      Reviewer #3 (Recommendations for the authors):

      Major comments

      - Limitations of the ChIP-seq approach: With overexpression plasmids as an approach to TRN elucidation, there are always a set of concerns. First, TF expression is not enough to ensure regulatory activity - metabolite effects must be such that the TF is active which requires growing the cells in activating conditions. Second, the presence of a binding event does not mean that the binding has a regulatory effect - the authors are clearly aware of this as they specify binding sites in promoter regions, which should be helpful, but they also mention the possibility of regulatory binding events in coding regions. These issues should be listed as weaknesses of the approach in the Discussion.

      Thank you for these important suggestions. We agree that these limitations should be explicitly discussed. We have now added a dedicated paragraph in the Discussion section addressing these concerns. Please see Lines 492-501.

      “However, several limitations of the ChIP-seq approach should be acknowledged. Firstly, TF overexpression ensures sufficient protein levels for ChIP-seq signal detection but does not guarantee that all TFs are in their active conformational states, as many bacterial TFs require allosteric activation by metabolites, cofactors, or post-translational modifications. The cells under standard laboratory conditions which may not activate all TFs to their maximal regulatory states, potentially leading to underestimation of condition-specific binding peaks. Secondly, while we observed TF binding at thousands of genomic sites, binding per se does not equate to functional regulation, as chromatin context, cofactor availability, and competitive binding all influence regulatory outcomes.”

      - Lack of independent validation: The study seems to lack substantial independent validation of either the functional nature of the binding sites as well as the proposed physiological regulatory role of the TFs. For example, for the 103 identified TF motifs, do any of these agree with existing motifs in motif databases that may be homologous to P. aeruginosa TFs? The authors claim to have discovered master regulators of virulence and associated core regulatory clusters - but there does not seem to be any independent validation of the proposed associations. The authors selected the TF targets to cover TFs that had not yet been characterized; however, it would have been nice to have some overlap with previous studies so that consistency and data quality could be assessed.

      Thank you for raising these critical points about validation.

      As for motif validation, we compared the existing motifs in the RegPrecise database[3] and we found that the motif of PA3587 show significant similarity to homologous TFs in Pseudomonadaceae. We have added the related description in the Results section. Please see Figure S3B and Lines 228-231.

      As for the validation of master regulators, we have performed EMSA experiments for validating the binding events and constructed the mutants for function validation. We have added the related contents in Results section. Please see Figure S5 and Lines 283-300.

      We have discussed the overlap between our results and previous studies in the Discussion section. Please see Lines 530-538.

      “PA0797 is known to regulate the pqs system and pyocyanin production[4]. In the present study, it was also found to bind to the pqsH promoter region and its motif was visualised. PA5428 was found to bind to the promoter regions of aceA and glcB genes[5], which was also demonstrated in our ChIP-seq results. PA4381 (CloR) was found to be associated with polymyxin resistance in a previous study[6] and to be possibly related to ROS resistance in the present study. Furthermore, PA5032 plays a putative role in biofilm regulation and also forms an operon with PA5033, an HP associated with biofilm formation[7].”

      - Uncertain value of TRN topology analysis: The relationship between ternary motifs and pathogenicity of P. aeruginosa, and why the authors argue these results motivated TF-targeting drugs (the topic of the last paragraph of the Discussion), are unclear to me. The authors allude to possible connections between pathogenicity, growth, and drug resistance, but I don't see concrete examples here of related TF interactions that clearly represent these relationships. The sections "Hierarchical networks of TFs based on pairwise interactions" and "Ternary regulatory motifs show flexible relationships among TFs in P. aeruginosa" seem to not say much in terms of results that are actionable or possible to validate. A topological graph is constructed based on observed TF-TF connections in measured binding sites - however, it's unclear if any of these connections are physiologically meaningful. Line 178 - Why would there be any connection between the structural family of TF and its location in the proposed TRN hierarchy?

      Thank you for this valuable comment on TRN topology analysis. It is hard to quantify precisely how much this resource will accelerate P. aeruginosa research or drug development, but we believe providing this foundational network architecture has inherent value for the community, which is valued for enabling hypothesis generation even before comprehensive functional validation. We would like to clarify our perspective on these findings and have added the discussion in the revised manuscript to better describe their nature and value. Please see Lines 517-528.

      “Additionally, although the TRN analysis revealed organizational patterns in P. aeruginosa regulatory network, the functional significance these topological features, including their specific contributions to pathogenicity, metabolic adaptation, and antibiotic resistance remains to be experimentally determined in the future work. The hierarchical structure and regulatory motifs we identified represent objective network properties derived from our binding data, but translating these structural observations into mechanistic understanding will require condition-specific functional studies, genetic validation, and phenotypic characterization. Our analysis provided a systematic framework and generating testable hypotheses rather than definitive functional conclusions. Nevertheless, these network-level organizational principles provided value to the community as a foundational reference, similar to other regulatory network maps[8] that were useful even before comprehensive validation.”

      - Identification of "master" regulators: Line 527 on virulence regulators: "We first generated gene lists associated with nine pathways" - is this not somewhat circular, i.e. using gene lists generated from (I assume) co-regulated gene sets to identify regulators of those gene lists? I can't tell from the cited reference (80), which is their own prior review article, what the original source of these gene lists was. Somewhat related to this point - Line 32: 24 "master regulators" - if there are that many, is it still considered a master regulator? Line 270: This term "master regulator" would seem to require some quantitative justification. Identifying 24 (a large number of) "master" regulators of virulence would seem to dilute the implied power of the term.

      We apologize for the lack of clarity regarding the virulence pathway gene lists, and we have provided complete gene lists for virulence-related pathways, which were compiled from functional annotations, in our online PA_TFNet database.

      Additionally, we appreciate your concern about the use of “master” regulator. The usage is based on previous studies[9,10], and the master regulator is commonly known in the development of multicellular organisms as a subset of TFs that control the expression of multiple downstream genes and govern lineage commitment or key biological processes. We employed the term "master regulator" in an analogous manner to specify a class of functionally crucial TFs that participate in a pathway or biological event by regulating multiple downstream genes statistically enriched in that pathway. In line with this definition, we identified TFs whose targets were significantly enriched in genes associated with specific virulence pathways (hypergeometric test, P < 0.05).

      We understand the concern that identifying 24 master regulators might seem to dilute the term. However, we would like to clarify that each of these 24 TFs is a "master regulator" with respect to specific virulence pathways based on statistical criteria, not necessarily a global master regulator of multiple pathways of P. aeruginosa. We have revised the Method section. Please see Lines 604-612.

      - Line 234: "Genome-wide synergistic co-association of TFs in P. aeruginosa." This section was an interesting analysis. As I mention above, the weakness of an overexpression approach is not knowing whether the TF is active on the examined conditions. By looking at shared binding peaks across overexpression of different TFs, it should indeed be possible to glean some regulatory connections across TFs. Furthermore, the authors discuss specific examples that appear physiologically reasonable, which is appreciated.

      We thank the reviewer for this positive assessment of our co-association analysis. We agree with the limitation of the overexpression approach, which have been discussed in the Discussion section. We are pleased that the reviewer found the approach and specific examples valuable.

      Minor comments

      - Line 35 - "high-throughput systematic evolution of ligands by exponential enrichment" - no idea what this means. Is this related to the web-based database, or why is it mentioned in the same sentence?

      We apologize for the unclear presentation. To clarify: “High-throughput systematic evolution of ligands by exponential enrichment” (HT-SELEX) is an in vitro technique for determining TF DNA-binding motifs, which our group previously applied to a subset of P. aeruginosa TFs in a prior publication[11]. In the current study, we performed ChIP-seq for 172 TFs, which represent the majority of TFs not covered by the previous HT-SELEX study. Together, these two complementary approaches (HT-SELEX for in vitro binding motifs, ChIP-seq for in vivo genomic binding sites) provide near-complete coverage of the P. aeruginosa TF repertoire. Both datasets are integrated into our PA_TFNet database.

      Due to space constraints in the abstract, we could not provide detailed explanation of HT-SELEX, but we have now improved the clarity in the Introduction to better explain the relationship between our previous HT-SELEX work and the current ChIP-seq study, and why both are mentioned together in the context of the database. Please see Lines 99-105.

      - Line 193 - Only 9 auto-regulating TFs seems like a low number, given the frequency of negative auto-regulation in other organisms like E. coli. Could the authors comment on their expectations based on well-curated TRNs?

      Thank you for this comment. We agree that 9 auto-regulating TFs is lower than might be expected based on E. coli, where auto-regulation is more prevalent. This likely reflects technical limitations of ChIP-seq approach that our detection was limited to standard growth conditions rather than the diverse physiological states where auto-regulation often occurs. Therefore, the 9 TFs we report represent a high-confidence subset, and the true frequency of auto-regulation in P. aeruginosa likely is higher. We added the content in the revised manuscript. Please see Lines 193-196.

      “This number likely represents a conservative estimate, as experiments may not optimally capture auto-regulatory events that depend on native expression levels or specific physiological conditions.”

      - Line 230 - "This conservation suggests that TFs within the same cluster co-regulate similar sets of genes." - Why would clustering of TF binding site motifs need to be done to make this assessment? Couldn't the shared set of regulated genes be identified directly from the binding site data? Computing TF binding site motifs has obvious value, but I am struggling to understand the point of clustering the motifs. Is there some implied evolutionary or physiological connection here? No specific physiological roles or hypotheses are discussed in this section.

      Thank you for this important question. We agree that shared target genes can be identified directly from ChIP-seq binding data, which we also analyzed (co-association analysis). The motif clustering analysis serves a complementary and distinct purpose that provides information not directly obtainable from overlapped targets alone. Specifically, target overlap is inherently condition dependent, and motif clustering captures this intrinsic binding specificity, which reflects the structural similarity of DBDs, evolutionary relationships, and potential for functional redundancy or cooperativity under specific conditions. We have revised the related content in the manuscript, and please see Lines 236-242.

      “Clustering of TF binding motifs identified groups of TFs with similar intrinsic DNA-binding specificities. As expected, many clusters contained TFs from the same DBD families, reflecting evolutionary conservation and potential functional redundancy or competitive binding at shared regulatory elements. Notably, the clustering also uncovered associations between TFs from different DBD families, suggesting convergent evolution of binding specificity or novel regulatory interactions that warrant further investigation.”

      - Line 284 - should "metabolomic" be "metabolic"? I didn't see metabolomic data

      Yes, we have revised. Please see Line 311.

      - Several of the figures are too small (e.g. Fig S4A) or complex (Fig 2A) to see clearly or glean information from.

      Thank you for this comment. We acknowledge that Figure 2A and Figure S4A contain dense information due to the comprehensive nature of the regulatory network and the large number of TFs analyzed. We believe these overview figures serve an important purpose in conveying the scale and organization of the regulatory network, while the tables (Table S6 for Fig. S4A and Table S3 for Fig. 2A) provide the granular data needed for specific inquiries. We have also made the figures available in higher resolution and increased font sizes where possible without compromising the overall layout.

      - I don't understand the organization of the "Ternary regulatory motifs" in Supplementary Data File 4 - A table of contents explaining the tabs and columns would be welcome (for this as well as other supplementary files, some of which are more straightforward than others).

      Thank you for this suggestion. We have now revised all supplementary data files to include header and necessary annotations in the first row. Specifically for Supplementary Data File 4, the three columns (Top, Middle, Bottom) represent the left, middle, and right node, respectively, in each ternary regulatory motif.

      - I would have expected genomic locations of TF binding sites would have been one of the Supplementary Tables, to increase the accessibility of the data. However, the data is made available through their website, https://jiadhuang0417.shinyapps.io/PATF_Net/, which was easy to access and download the full dataset, so this is a minor issue.

      Thank for accessing our PA_TFNet database and for the positive feedback on data accessibility. We agree that providing genomic locations of TF binding sites is crucial. These data are fully available and downloadable through the web interface, which allows flexible searching, filtering, and batch download of binding sites. We felt that the interactive and database format provides more functionality than static supplementary tables (e.g., dynamic filtering by TF, genomic region, or binding strength), given the large scale of this dataset.

      References

      (1) Hua, C., Huang, J., Wang, T., Sun, Y., Liu, J., Huang, L. et al. Bacterial Transcription Factors Bind to Coding Regions and Regulate Internal Cryptic Promoters. Mbio 13, e0164322 (2022).

      (2) Chemani, C., Imberty, A., de Bentzmann, S., Pierre, M., Wimmerová, M., Guery, B. P. et al. Role of LecA and LecB lectins in Pseudomonas aeruginosa-induced lung injury and effect of carbohydrate ligands. Infect Immun 77, 2065-2075 (2009).

      (3) Novichkov, P. S., Kazakov, A. E., Ravcheev, D. A., Leyn, S. A., Kovaleva, G. Y., Sutormin, R. A. et al. RegPrecise 3.0–a resource for genome-scale exploration of transcriptional regulation in bacteria. Bmc Genomics 14, 745 (2013).

      (4) Cui, G. Y., Zhang, Y. X., Xu, X. J., Liu, Y. Y., Li, Z., Wu, M. et al. PmiR senses 2-methylisocitrate levels to regulate bacterial virulence in Pseudomonas aeruginosa. Sci Adv 8 (2022).

      (5) Hwang, W., Yong, J. H., Min, K. B., Lee, K.-M., Pascoe, B., Sheppard, S. K. et al. Genome-wide association study of signature genetic alterations among pseudomonas aeruginosa cystic fibrosis isolates. Plos Pathog 17, e1009681 (2021).

      (6) Gutu, A. D., Sgambati, N., Strasbourger, P., Brannon, M. K., Jacobs, M. A., Haugen, E. et al. Polymyxin resistance of Pseudomonas aeruginosa phoQ mutants is dependent on additional two-component regulatory systems. Antimicrob Agents Chemother 57, 2204-2215 (2013).

      (7) Zhang, L., Fritsch, M., Hammond, L., Landreville, R., Slatculescu, C., Colavita, A. et al. Identification of genes involved in Pseudomonas aeruginosa biofilm-specific resistance to antibiotics. PLoS One 8, e61625 (2013).

      (8) Galan-Vasquez, E., Luna, B. & Martinez-Antonio, A. The Regulatory Network of Pseudomonas aeruginosa. Microb Inform Exp 1, 3 (2011).

      (9) Fan, L. G., Wang, T. T., Hua, C. F., Sun, W. J., Li, X. Y., Grunwald, L. et al. A compendium of DNA-binding specificities of transcription factors in Pseudomonas syringae. Nat Commun 11 (2020).

      (10) Chan, S. S.-K. & Kyba, M. What is a master regulator? Journal of stem cell research & therapy 3, 114 (2013).

      (11) Wang, T. T., Sun, W. J., Fan, L. G., Hua, C. F., Wu, N., Fan, S. R. et al. An atlas of the binding specificities of transcription factors in Pseudomonas aeruginosa directs prediction of novel regulators in virulence. Elife 10 (2021).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      There are a few remaining issues:

      (1) The manuscript quantifies changes over learning in prefrontal goal-selective cells (equated to "splitter" place cells in hippocampus) and task-phase selective cells (similar to non-splitter place cells that are not goal modulated). A subset of these task cells remain stable throughout learning, and are equated to schema representations in the study. In the memory literature, schemas are generally described as relational networks of abstract and generalized information, that enable adapting to novel context and inference by enabling retrieval of related information from previous contexts. The task-phase selective cells that stay stable throughout learning clearly will have a role in organizing task representations, but to this reviewer, denoting them as forming a schema is an unwarranted interpretation. By this definition, hippocampal non-splitter place cells that emerge early in learning and are stable over days would also form a schema. Therefore, schema notation cannot just be based on stability, it requires further evidence of abstraction such as cross-condition generalization.

      We agree with the reviewer that task phase selective cells (“non-splitter cells”) alone do not fulfill the “relationality” criterion of schemas. We found only few of them, and so we cannot really say something about how they covary. We, however, would like to stress that our finding that task phase selective cells have stable firing field comparing learned (task) and habituation (no-task) conditions can be considered as “cross-condition generalization.” We have further specified our discussion of schemas with a particular emphasis on a potential interpretation of the generalizing task phase cells as “potential building blocks of schemas.”

      (2) The quantification of prefrontal replay sequences during reward is useful, but it is still unconvincing that the distinction between existence of sequences in the odor sampling phase and reward phase is not trivially expected based on prior literature. This is odor guided task, not a spatial exploration task with no cues, and it is very well-established (as noted in citations in the previous review) that during odor sampling, animals' will sniff in an exploratory stage, resulting in strong beta and respiratory rhythms in prefrontal cortex. Not having LFP recordings in this task does not preclude considering prior literature that clearly shows that odor sampling results in a unique internal state network state, when animals are retrieving the odor-associated goal, vastly different from a reward sampling phase. The authors argue that this is not trivial since they see some sequences during sampling, although they also argue the opposite in response to a question from Reviewer 2 about shuffling controls for sequences, that 'not' seeing these sequences in the sampling phase is an internal control. The bigger issue here is equating these sequences during sampling to replay/ preplay or reactivation sequences similar to the reward phase, since the prefrontal network dynamics are engaged in odor-driven retrieval of associated goals during sampling, as has been shown in previous studies.

      We agree with the reviewer that sampling and reward phase represent two very different behavioral states. Nevertheless, correlations on short time scales could be similar, which we show is not the case and therefore we do not consider this result trivial. Regarding the interpretation of sequences, we apologize that we have not been sufficiently clear on distinguishing replay with pure sequences. While we find such sequences in the sampling phase (indicative of fast temporal correlation structure beyond cofiring quantified in Figure 3) they are NOT pre/replaying any task related information. Otherwise, our results are fully in line with previous literature on oscillations that we have included in the previous round of revisions. We added a similar explanation at multiple instances in the Results and Discussion section.

      Reviewer #2 (Public review):

      Comments on revisions:

      Further changes are needed to improve the description of the methods and the discussion needs to be extended to contrast the results with previously published results of the group. Some control figures would also be needed to quantitatively demonstrate, across the entire dataset, that sequence detection did not identify random events as sequences, even if the detection method was designed to exclude such sequences. For example, showing that sequences are not detected in randomised data with the current method would better convince readers of the method's validity.

      We have added control quantifications from time randomized sequences which produce a much lower amount of detected sequences. See response below.

      Although differences in the classification scheme relative to the Muysers et al. (2025) paper have been explained, the similarity (perhaps equivalence of results) is not sufficiently acknowledged - e.g., at the beginning of the discussion.

      We have added a paragraph at the beginning of the Discussion on how our results align with the Muysers et al. 2025 paper.

      Although the control of spurious sequences may have been built into the method, this is not sufficiently explained in the method. It is also not clear what kind of randomization was performed. Importantly, I do not see a quantification that shows that the detected sequences are significantly better than the sequence quality measure on randomized events. Or that randomized data do not lead to sequence clusters.

      In response to this question, we have added the requested shuffling control (Supplement 1B to Figure 4). In the shuffled data the amount of detected recurring sequence clusters is only about half of those in the original data. The amount of bursts assigned to clusters in the shuffled data only remains 46% of the originally assigned bursts on average, clearly indicating that the detected sequences in the non-randomized data cannot be explained without assuming stable temporal order.

      Some clusters, however, are still detected in randomized data, which, however, is expected if participation of cells is heterogeneous with some highly active cells occurring in more than half of the bursts. Then random sequences spuriously occur above chance level representing the clusters of random order of few highly active cells. In line with this interpretation, we see that

      (1) Bursts that were removed after shuffling have exactly 0 high-firing cells

      (2) Clusters derived from shuffled sequence have a less sparse contribution of high firing cells, i.e., high firing cells contribute to significantly more clusters in randomized data than in nonrandomized data.

      The difference in the distribution of high firing cells further indicates that sequences obtained with and without randomization are of different quality.

      The spurious (false positive) clusters detected after randomization nevertheless may have a physiological meaning as they identify rate coactivation patterns that were also picked up by analysis in Figure 3.

      Also, it is still not clear how the number of clusters was established. I understand that the previously published paper may have covered these questions; these should be explained here as well.

      The Methods sections states “The [cluster merging] procedure was repeated until no pair [of clusters] satisfied the merging criterion.”

      Also, the sequence similarity description is still confusing in the method; please correct this sentence "Only the l neurons active in both sequences of a pair were taken into account."

      We do not see what is wrong with this sentence. To avoid confusion.” we have replaced lower case l with upper case L as sequence length.

      Reviewer #3 (Public review):

      One comment is that the threshold for extracting burst events (0.5 standard deviations, presumably above the mean) seems lower than what one usually sees as a threshold for population burst detection, and the authors show (in Supplementary Fig 1) that this means bursts cover ~20-40% of the data. However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

      We have added further specifications following the Reviewer’s suggestion and now mention that the threshold is permissive and “capturing large amount cofiring structure.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Most importantly, in accordance with questions raised by Reviewer 1, we now include a detailed comparison of the cell type frequencies between the two examined time points as well as comparison of the pseudotimes along those lineages. This is detailed in the new section “Many cell types are shared between day 8 and day 16 EBs” and illustrated in Supplementary Figure 6c and Supplementary Figures 7-8.

      Besides this new chapter and its accompanying methods part, we mainly edited the language and to clarify methods and assumptions according to the Reviewer suggestions.

      The main concern of Reviewer 2 was our use of the liftoff gene annotation. We explained our reasoning for this choice extensively in our public response to the Reviewer, but did not incorporate this into our manuscript because even though this is an important subject it is not within the main scope of our paper.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Jocher, Janssen, et al examine the robustness of comparative functional genomics studies in primates that make use of induced pluripotent stem cell-derived cells. Comparative studies in primates, especially amongst the great apes, are generally hindered by the very limited availability of samples, and iPSCs, which can be maintained in the laboratory indefinitely and defined into other cell types, have emerged as promising model systems because they allow the generation of data from tissues and cells that would otherwise be unobservable.

      Undirected differentiation of iPSCs into many cell types at once, using a method known as embryoid body differentiation, requires researchers to manually assign all cell types in the dataset so they can be correctly analysed. Typically, this is done using marker genes associated with a specific cell type. These are defined a priori, and have historically tended to be characterised in mice and humans and then employed to annotate other species. Jocher, Janssen, et al ask if the marker genes and features used to define a given cell type in one species are suitable for use in a second species, and then quantify the degree of usefulness of these markers. They find that genes that are informative and cell type specific in a given species are less valuable for cell type identification in other species, and that this value, or transferability, drops off as the evolutionary distance between species increases.

      This paper will help guide future comparative studies of gene expression in primates (and more broadly) as well as add to the growing literature on the broader challenges of selecting powerful and reliable marker genes for use in single-cell transcriptomics.

      Strengths:

      Marker gene selection and cell type annotation is a challenging problem in scRNA studies, and successful classification of cells often requires manual expert input. This can be hard to reproduce across studies, as, despite general agreement on the identity of many cell types, different methods for identifying marker genes will return different sets of genes. The rise of comparative functional genomics complicates this even further, as a robust marker gene in one species need not always be as useful in a different taxon. The finding that so many marker genes have poor transferability is striking, and by interrogating the assumption of transferability in a thorough and systematic fashion, this paper reminds us of the importance of systematically validating analytical choices. The focus on identifying how transferability varies across different types of marker genes (especially when comparing TFs to lncRNAs), and on exploring different methods to identify marker genes, also suggests additional criteria by which future researchers could select robust marker genes in their own data.

      The paper is built on a substantial amount of clearly reported and thoroughly considered data, including EBs and cells from four different primate species - humans, orangutans, and two macaque species. The authors go to great lengths to ensure the EBs are as comparable as possible across species, and take similar care with their computational analyses, always erring on the side of drawing conservative conclusions that are robustly supported by their data over more tenuously supported ones that could be impacted by data processing artefacts such as differences in mappability, etc. For example, I like the approach of using liftoff to robustly identify genes in non-human species that can be mapped to and compared across species confidently, rather than relying on the likely incomplete annotation of the non-human primate genomes. The authors also provide an interactive data visualisation website that allows users to explore the dataset in depth, examine expression patterns of their own favourite marker genes and perform the same kinds of analyses on their own data if desired, facilitating consistency between comparative primate studies.

      We thank the Reviewer for their kind assessment of our work.

      Weaknesses and recommendations:

      (1) Embryoid body generation is known to be highly variable from one replicate to the next for both technical and biological reasons, and the authors do their best to account for this, both by their testing of different ways of generating EBs, and by including multiple technical replicates/clones per species. However, there is still some variability that could be worth exploring in more depth. For example, the orangutan seems to have differentiated preferentially towards cardiac mesoderm whereas the other species seemed to prefer ectoderm fates, as shown in Figure 2C. Likewise, Supplementary Figure 2C suggests a significant unbalance in the contributions across replicates within a species, which is not surprising given the nature of EBs, while Supplementary Figure 6 suggests that despite including three different clones from a single rhesus macaque, most of the data came from a single clone. The manuscript would be strengthened by a more thorough exploration of the intra-species patterns of variability, especially for the taxa with multiple biological replicates, and how they impact the number of cell types detected across taxa, etc.

      You are absolutely correct in pointing out that the large clonal variability in cell type composition is a challenge for our analysis. We also noted the odd behavior of the orangutan EBs, and their underrepresentation of ectoderm. There are many possible sources for these variable differentiation propensities: clone, sample origin (in this case urine) and individual. However, unfortunately for the orangutan, we have only one individual and one sample origin and thus cannot say whether this germ layer preference says something about the species or is due to our specific sample. Because of this high variability from multiple sources, getting enough cell types with an appreciable overlap between species was limiting to analyses. In order to be able to derive meaningful conclusions from intra-species analyses and the impact of different sources of variation on cell type propensity, we would need to sequence many more EBs with an experimental design that balances possible sources of variation. This would go beyond the scope of this study.

      Instead, here we control for intra-species variation in our analyses as much as possible: For the analysis of cell type specificity and conservation the comparison is relative for the different specificity degrees (Figure 3C). For the analysis of marker gene conservation, we explicitly take intra-species variation into account (Figure 4D).

      The same holds for the temporal aspect of the data, which is not really discussed in depth despite being a strength of the design. Instead, days 8 and 16 are analysed jointly, without much attention being paid to the possible differences between them.

      Concerning the temporal aspect, indeed we knowingly omitted to include an explicit comparison of day 8 and day 16 EBs, because we felt that it was not directly relevant to our main message. Our pseudotime analysis showed that the differences of the two time points were indeed a matter of degree and not so much of quality. All major lineages were already present at day 8 and even though day 8 cells had on average earlier pseudotimes, there was a large overlap in the pseudotime distributions between the two sampling time points (Author response image 1). That is why we decided to analyse the data together.

      Are EBs at day 16 more variable between species than at day 8? Is day 8 too soon to do these kinds of analyses?

      When we started the experiment, we simply did not know what to expect. We were worried that cell types at day 8 might be too transient, but longer culture can also introduce biases. That is why we wanted to look at two time points, however as mentioned above the differences are in degree.

      Concerning the cell type composition: yes, day 16 EBs are more heterogeneous than day 8 EBs. Firstly, older EBs have more distinguishable cell types and hence even if all EBs had identical composition, the sampling variance would be higher given that we sampled a similar number of cells from both time points. Secondly, in order to grow EBs for a longer time, we moved them from floating to attached culture on day 8 and it is unclear how much variance is added by this extra handling step.

      Are markers for earlier developmental progenitors better/more transferable than those for more derived cell types?

      We did not see any differences in the marker conservation between early and late cell types, but we have too little data to say whether this carries biological meaning.

      Author response image 1.

      Pseudotime analysis for a differentiation trajectory towards neurons. Single cells were first aggregated into metacells per species using SEACells (Persad et al. 2023). Pluripotent and ectoderm metacells were then integrated across all four species using Harmony and a combined pseudotime was inferred with Slingshot (Street et al. 2018), specifying iPSCs as the starting cluster. Here, lineage 3 is shown, illustrating a differentiation towards neurons. (A) PHATE embedding colored by pseudotime (Moon et al. 2019). (B) PHATE embedding colored by celltype. (C) Pseudotime distribution across the sampling timepoints (day 8 and day 16) in different species.

      (2) Closely tied to the point above, by necessity the authors collapse their data into seven fairly coarse cell types and then examine the performance of canonical marker genes (as well as those discovered de novo) across the species. However some of the clusters they use are somewhat broad, and so it is worth asking whether the lack of specificity exhibited by some marker genes and driving their conclusions is driven by inter-species heterogeneity within a given cluster.

      Author response image 2.

      UMAP visualization for the Harmony-integrated dataset across all four species for the seven shared cell types, colored by cell type identity (A) and species (B).

      Good point, if we understand correctly, the concern is that in our relatively broadly defined cell types, species are not well mixed and that this in turn is partly responsible for marker gene divergence. This problem is indeed difficult to address, because most approaches to evaluate this require integration across species which might lead to questionable results (see our Discussion).

      Nevertheless, we attempted an integration across all four species. To this end, we subset the cells for the 7 cell types that we found in all four species and visualized cell types and species in the UMAPs above (Author response image 2).

      We see that cardiac fibroblasts appear poorly integrated in the UMAP, but they still have very transferable marker genes across species. We quantified integration quality using the cell-specific mixing score (cms) (Lütge et al. 2021) and indeed found that the proportion of well integrated cells is lowest for cardiac fibroblasts (Author response image 3A). On the other end of the cms spectrum, neural crest cells appear to have the best integration across species, but their marker transferability between species is rather worse than for cardiac fibroblasts (Supplementary Figure 9). Cell-type wise calculated rank-biased overlap scores that we use for marker gene conservation show the same trends (Author response image 3B) as the F1 scores for marker gene transferability. Hence, given our current dataset we do not see any indication that the low marker gene conservation is a result of too broadly defined cell types.

      Author response image 3.

      (A) Evaluation of species mixing per cell type in the Harmony-integrated dataset, quantified by the fraction of cells with an adjusted cell-specific mixing score (cms) above 0.05. (B) Summary of rank-biased overlap (RBO) scores per cell type to assess concordance of marker gene rankings for all species pairs.

      Reviewer #2 (Public review):

      Summary:

      The authors present an important study on identifying and comparing orthologous cell types across multiple species. This manuscript focuses on characterizing cell types in embryoid bodies (EBs) derived from induced pluripotent stem cells (iPSCs) of four primate species, humans, orangutans, cynomolgus macaques, and rhesus macaques, providing valuable insights into cross-species comparisons.

      Strengths:

      To achieve this, the authors developed a semi-automated computational pipeline that integrates classification and marker-based cluster annotation to identify orthologous cell types across primates. This study makes a significant contribution to the field by advancing cross-species cell type identification.

      We thank the reviewer for their positive and thoughtful feedback.

      Weaknesses:

      However, several critical points need to be addressed.

      (1) Use of Liftoff for GTF Annotation

      The authors used Liftoff to generate GTF files for Pongo abelii, Macaca fascicularis, and Macaca mulatta by transferring the hg38 annotation to the corresponding primate genomes. However, it is unclear why they did not use species-specific GTF files, as all these genomes have existing annotations. Why did the authors choose not to follow this approach?

      As Reviewer 1 also points out, also we have observed that the annotation of non-human primates often has truncated 3’UTRs. This is especially problematic for 3’ UMI transcriptome data as the ones in the 10x dataset that we present here. To illustrate this we compared the Liftoff annotation derived from Gencode v32, that we also used throughout our manuscript to the Ensembl gene annotation Macaca_fascicularis_6.0.111. We used transcriptomes from human and cynomolgus iPSC bulk RNAseq (Kliesmete et al. 2024) using the Prime-seq protocol (Janjic et al. 2022) which is very similar to 10x in that it also uses 3’ UMIs. On average using Liftoff produces higher counts than the Ensembl annotation (Author response image 4A). Moreover, when comparing across species, using Ensembl for the macaque leads to an asymmetry in differentially expressed genes, with apparently many more up-regulated genes in humans. In contrast, when we use the Liftoff annotation, we detect fewer DE-genes and a similar number of genes is up-regulated in macaques as in humans (Author response image 4B). We think that the many more DE-genes are artifacts due to mismatched annotation in human and cynomolgus macaques. We illustrate this for the case of the transcription factor SALL4 in Author response image 4C, D. The Ensembl annotation reports 2 transcripts, while Liftoff from Gencode v32 suggests 5 transcripts, one of which has a longer 3’UTR. This longer transcript is also supported by Nanopore data from macaque iPSCs. The truncation of the 3’UTR in this case leads to underestimation of the expression of SALL4 in macaques and hence SALL4 is detected as up-regulated in humans (DESeq2: LFC= 1.34, p-adj<2e-9). In contrast, when using the Liftoff annotation SALL4 does not appear to be DE between humans and macaques (LFC=0.33, p.adj=0.20).

      Author response image 4.

      (A) UMI-counts/ gene for the same cynomolgus macaque iPSC samples. On the x-axis the gtf file from Ensembl Macaca_fascicularis_6.0.111 was used to count and on the y-axis we used our filtered Liftoff annotation that transferred the human gene models from Gencode v32. (B) The # of DE-genes between human and cynomolgus iPSCs detected with DESeq2. In Liftoff, we counted human samples using Gencode v32 and compared it to the Liftoff annotation of the same human gene models to macFas6. In Ensembl, we use Gencode v32 for the human and Ensembl Macaca_fascicularis_6.0.111 for the Macaque. For both comparisons we subset the genes to only contain one-to-one orthologs as annotated in biomart. Up and down regulation is relative to human expression. C) Read counts for one example gene SALL4. Here we used in addition to the Liftoff and Ensembl annotation also transcripts derived from Nanopore cDNA sequencing of cynomolgus iPSCs. D) Gene models for SALL4 in the space of MacFas6 and a coverage for iPSC-Prime-seq bulk RNA-sequencing.

      (2) Transcript Filtering and Potential Biases

      The authors excluded transcripts with partial mapping (<50%), low sequence identity (<50%), or excessive length differences (>100 bp and >2× length ratio). Such filtering may introduce biases in read alignment. Did the authors evaluate the impact of these filtering choices on alignment rates?

      We excluded those transcripts from analysis in both species, because they present a convolution of sequence-annotation differences and expression. The focus in our study is on regulatory evolution and we knowingly omit marker differences that are due to a marker being mutated away, we will make this clearer in the text of a revised version.

      (3) Data Integration with Harmony

      The methods section does not specify the parameters used for data integration with Harmony. Including these details would clarify how cross-species integration was performed.

      We want to stress that none of our conservation and marker gene analyses relies on cross-species integration. We only used the Harmony integrated data for visualisation in Figure 1 and the rough germ-layer check up in Supplementary Figure S3. We will add a better description in the revised version.

      Reference

      Janjic, Aleksandar, Lucas E. Wange, Johannes W. Bagnoli, Johanna Geuder, Phong Nguyen, Daniel Richter, Beate Vieth, et al. 2022. “Prime-Seq, Efficient and Powerful Bulk RNA Sequencing.” Genome Biology 23 (1): 88.

      Kliesmete, Zane, Peter Orchard, Victor Yan Kin Lee, Johanna Geuder, Simon M. Krauß, Mari Ohnuki, Jessica Jocher, Beate Vieth, Wolfgang Enard, and Ines Hellmann. 2024. “Evidence for Compensatory Evolution within Pleiotropic Regulatory Elements.” Genome Research 34 (10): 1528–39.

      Lütge, Almut, Joanna Zyprych-Walczak, Urszula Brykczynska Kunzmann, Helena L. Crowell, Daniela Calini, Dheeraj Malhotra, Charlotte Soneson, and Mark D. Robinson. 2021. “CellMixS: Quantifying and Visualizing Batch Effects in Single-Cell RNA-Seq Data.” Life Science Alliance 4 (6): e202001004.

      Moon, Kevin R., David van Dijk, Zheng Wang, Scott Gigante, Daniel B. Burkhardt, William S. Chen, Kristina Yim, et al. 2019. “Visualizing Structure and Transitions in High-Dimensional Biological Data.” Nature Biotechnology 37 (12): 1482–92.

      Persad, Sitara, Zi-Ning Choo, Christine Dien, Noor Sohail, Ignas Masilionis, Ronan Chaligné, Tal Nawy, et al. 2023. “SEACells Infers Transcriptional and Epigenomic Cellular States from Single-Cell Genomics Data.” Nature Biotechnology 41 (12): 1746–57.

      Street, Kelly, Davide Risso, Russell B. Fletcher, Diya Das, John Ngai, Nir Yosef, Elizabeth Purdom, and Sandrine Dudoit. 2018. “Slingshot: Cell Lineage and Pseudotime Inference for Single-Cell Transcriptomics.” BMC Genomics 19 (1): 477.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1B: the orangutan tubulin stain looks a bit unusual - just confirming that this is indeed the right image the authors want to include here.

      We agree, this unfortunately also reflects the findings from the scRNA-seq analysis in that we found hardly any cells that we would classify as proper neurons.

      (2) Typo on line 90: 'loosing' should be 'losing'.

      Fixed

      (3) Line 118: why do the authors believe that using singleR will give better results than MetaNeighbour? This certainly seems supported by the data in S4 and S5, but the reasoning is not clear.

      We think that this might depend on the signal to noise ratio, which is a property specific to each dataset. Here we just wanted to state that our approach seems to work better for our developmental data, but we didn’t test out other data and thus cannot generalize.

      (4) Figure 2B: there are some coloured lines on the first filled black bar from the left - do they mean anything? I couldn't work it out from looking at the figure.

      Indeed this is a bit misleading the colors on the left represent the species identity: this was to illustrate the mixing of the of species for each cell type: The legend reads now: “Each line represents a cell which are colored by their species of origin on the left and by their current cell type assignment during the annotation procedure on the right.”

      (5) Figure 3: I did not understand how the seven bins of the cell type specificity metric were derived until much later - it is just the number of cell types in which a gene is expressed, yes? Might be worth making this clearer earlier in the text.

      We made this more explicit in the legend. “Boxplot of expression conservation of genes according to the number of different cell types in which a gene is expressed in humans (cell type specificity).”

      (6) It would be great to provide a bit more thorough documentation for the shiny app, so it can serve as a stand-alone resource and not require going back and forth with the paper to make sure one knows what one is doing at every point.

      Agree, this would be a good idea. We are on it.

      (7) Line 477: I think this is unclear - the authors retain over 11000 cells per species but then set the maximum number of cells in a cluster for pairwise comparison to 250... which is a lot fewer. What happens to all the other cells? This probably needs some rewriting to clarify it.

      We did this to minimize the power differences due to cell numbers and thus make the results more comparable across species. We added this explanation to the methods section for Marker gene detection.

      Reviewer #2 (Recommendations for the authors):

      How was the clustering resolution (0.1) determined?

      This resolution was only used for the initial rough check up of the germ layers as reported in Figure 1 and Supplementary Figures S3. We chose this resolution because it yielded roughly the same number of clusters as the number of cell types that we got from classification with the Rhodes et al data.

    1. Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that: 1) cerebellothalamic connections are important for learning motor skills, 2) cerebellar efferents specifically to the central lateral (CL) thalamus are important for short-term learning, 3) cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and 4) that once a skill is acquired, cerebellothalamic connections become important for online task performance. The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between on-line learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      (4) The retrograde tracing experiments (Supplementary Figure 5) demonstrate convincingly that the CN-VAL and CN-CL projections are almost entirely segregated,

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is (as they acknowledge in the Discussion) impaired rotarod performance at fixed higher speeds in Supplementary Figure 4f for CN-VAL projections, suggesting that there could be subtle changes in motor performance below the level of detection of their assays. There is also a trend in the same direction that did not pass significance for CN-CL at higher speeds, suggesting that part of the effects could be related to subtle deficits in performance.

    2. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides evidence that cerebellar projections to the thalamus are required for learning and execution of motor skills in the accelerating rotarod task. This important study adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The data presentation is generally sound, especially the main observations, with some limitations in describing the statistical methods and a lack of support for two separate cerebello-thalamic pathways, which is incomplete in supporting the overall claim.

      We completed the MS by adding a double retrograde labelling study showing that the two pathways have limited overlap and by addressing the other concerns.

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      We thank the reviewer for pointing out this weakness of description. The description of the Methods has thus been expanded and better justified in the “Quantification and statistical analysis” section.

      We agree with the reviewer that comparison between Deming regressions would be fragile due to the weakness of these regression in treatment groups (while they are quite robust for control groups) and they are not included in the MS, although Deming regression coefficients with their confidence intervals are now provided for all groups in the statistical tables. As now more clearly explained in the Methods, the comparisons between groups are based on the distribution of residuals around regressions of the control regression lines. If we understand correctly the reviewer’s request, the control groups are all included.

      (2) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals from the DCN but for the output channels of the basal ganglia and cerebellum: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018).” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). Hintzen et al. have indeed performed an extensive review indicating the limited overlap between cerebellar- and basal ganglia-recipient territories. The sentence has been corrected to clarify what the “They” referred to.

      The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei? how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      (3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      The recordings were not extended to the wash period, but examination of the firing rate before CNO on successive days did not evidence major changes in the population firing rate (this is now shown in a new supplementary figure 6).

      (4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      Since reference to these time windows is repeatedly used in the text we have shifted to “Early” and “Late” phase terminology.

      (5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task." I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This has been corrected to: “suggesting the cerebellar contribution to the consolidation of the task is critical early in the learning process and cannot be easily reinstated later”

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation in the accelerating version). Indeed, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group, while there was no measurable effect on the CN-CL group, which actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast, the CN-VAL group only showed significantly lower performance on day 4 consistent with intact learning abilities. Yet, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while in average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s). Overall, we focused our argument on the first days of learning where the differences between the groups are more pronounced. We clarified the discussion (section “A specific impact on learning of CL-projecting CN neurons.”)

      Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel. The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      There is indeed a paragraph devoted to the discussion of this point (last part of the section “A specific impact on learning of CL-projecting CN neurons.”). Briefly, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL infections recruit somewhat different populations of neurons. We have now added more experiments to support our finding using retrograde infections using two rAAV viruses expressing red and green fluorescent reporter. These experiments confirm the limited overlap of the two populations of interest obtained by retrograde infection. We feel thus confident that while some CN neurons may project to both structures, retrograde infection strategies thus appear to differentially infect CN populations.

      While we agree that after 3-4 days of learning the difference between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible.

      Reviewer #3 (Public review):

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) Cerebellothalamic connections are important for learning motor skills

      (2) Cerebellar efferents specifically to the central lateral (CL) thalamus are important for shortterm learning

      (3) Cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) That once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is now better acknowledged in the discussion in the section “A specific impact on learning of CL-projecting CN neurons.” However, we want to underline that the strongest deficit in learning is found in animals with CN->CL inhibition which latency to fall saturates at about 100s on the rotarod; this indicates that mice fall as soon as the accelerating rotarod speed reaches about 16rpm. In fixed speed rotarod, the inhibition of CN->CL neurons shows not even a trend of difference at 15rpm with control mice, and the animals run 2 minutes without falling at this speed. This makes us confident that the CN->CL pathway interfers more with the learning than with the actual locomotor function on the rotarod.

      (2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      This issue is treated in the discussion. (see also replies to reviewers 1 and 2 above). We added experiments with simultaneous retro-AAV infections in CL and VAL and the data are presented in Supplementary Figure 5. We found that retrograde infection targeted different populations of CN neurons; although collaterals in both CL and VAL may be present for (some of) these two populations of neurons, they are likely strongly biased toward one or the other thalamic regions, explaining the differential retrograde labelling in the CN. We hope these experiments will answer the reviewer’ s concern.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Multiple studies have reported on the effect of cerebellar nuclei (CN) manipulation on locomotion. Here the authors perform several controls and careful analysis to rule out gross motor deficits caused by DREADD-mediated CN silencing. As the authors point out in the discussion, part of the difference from prior studies could be the mild degree of inhibition here. However, it is possible that the CN inhibition here induces a subtle motor deficit and the accelerating rotarod task is challenging and more readily reveals this motor deficit, rather than a deficit in motor learning per se. Two pieces of data seem to suggest this:

      (a) under CN inhibition during the task (Figure 1i), mice could never achieve the level of performance as mice under CN inhibition after the task, even after several days of training, which suggests the CN inhibition is interfering with task performance;

      (b) in highly trained mice (after learning), applying the CN inhibition impaired performance to a similar extend as mice in Figure 1i (Figure 4).

      Can the authors rule out the possibility that CN inhibition during the task is impairing motor execution rather than motor learning?

      We do not rule out a contribution of impaired motor coordination at the highest speed (last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”). Indeed, most of our argument in favor of deficit in learning is primarily in the first days (Early phase), particularly for the CN->CL CNO group (Fig 3h). A crucial control in our work is the use of fixed speed rotarod, where no deficit is observed. The difference between the fixed and accelerating rotarod is rather minimal since the acceleration of the rotarod is rather small (0.12rpm/s for speed up to >20 rpm).

      Interpreting the effect of treatment reversal is challenging. If the only effect of CNO was a motor deficit, the animals who learned under CNO should rapidly regain higher performance under saline, which is not observed. When switching from CNO to Saline after 7 days of training, it is difficult to disentangle which part is due to a crude motor deficit (which would not show in fixed speed rotarod), and which part is due to an unability to resume motor learning after the task has been (mis-)consolidated.

      (2) The separation of the cerebellar pathways to the intralaminar thalamus (IL) and ventral thalamus (VAL) is not clear to me. It is not clear the CN neurons projecting to these nuclei are distinct. In addition, although IL projects to the striatum and VAL does not, both IL and VAL project to motor cortex. It is unclear to what extent these pathways can be separated. The argument for distinct pathways (as laid out in the discussion) is the distinct behavior deficits when manipulating these two pathways, but this difference seems subtle (point 3).

      We now clarify that CN populations are different help to retrograde labelling experiments (new Suppl Fig 5). A discussion on the differences in IL and VAL projections is now discussed in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.” Briefly, we argue that the despite some overlap of their targets, the profiles of the CL and VAL differ substantially.

      (3) The pattern of behavioral deficits induced by CN->CL and CN->VAL neurons appear similar in Figure 3b-c and e-f. I have difficulty seeing how these data lead to the differences in the regression fits in panels 3g-k, which seem to show distinct patterns of performance change within and across sessions. One notable difference in Figure 3b-c and e-f seems to be that CN->VAL CNO treated mice exhibit lower performance on the very first trial for most days. Somehow, this pattern is present even after the CNO treatment is switched to saline (Figure 3f). I wonder if this data point is driving the difference. One control analysis the authors could do is to exclude the 1st trial and test if the effects are preserved.

      Since the learning is cumulative and involves varying degree of consolidation it is indeed difficult to substantiate the difference from the average performance: a performance on day 3 may be limited by slow learning and perfect consolidation or good learning and imperfect consolidation. That is why we designed an analysis which takes into account the observed relationships between initial performance, within session gain of performance and acrosssession carry-over of this gain of performance (Fig 2). This analysis focuses on the first days of learning, before the performance plateau is reached in the CNO groups. While a clear deficit in consolidation is observed with full CN inhibition, this is not the case for the CN→CL CNO groups, despite their weaker performance after 3 days, similar to that seen with full CN inhibition. In contrast, normal learning is observed in the CN→VAL CNO group during these three days. The consolidation deficit in the CN→VAL CNO group is more subtle than in the CN CNO group and is indeed largely driven by the first data point. This is consistent with the idea that CN→VAL inhibition only partially impairs consolidation (compared to full CN inhibition), leaving some “savings” that allow rapid reacquisition.

      (4) The quantification of locomotion in Figure S2 needs more information. What is linear movement? What is sigma? What is the alternation coefficient? These are not defined in the legends or the Methods as far as I can tell. Related to point 1 above, the authors should provide some analysis of the stride length and hindlimb to forelimb distance as measures of locomotion execution.

      These measures were taken from Simon J Neurosci 2004 24(8):1987-1995 which is now cited and their description is now provided in the Methods.

      Minor:

      (5) To help readers follow the logic of experimental design, please explain why CNO was switched to saline after day 4 in Figures 1j, 3c, and f. Specifically, is the saline manipulation meant to test something as opposed to applying CNO throughout the entire course of the behavioral test?

      Since we had no difference between the groups at the end of the Early phase, we decided to test whether the skill consolidated under CNO remained available when the CNO was removed (and it indeed was). This is now more clearly stated in the Results.

      (6) I have difficulty understanding what is plotted in Figure 4b and d. The legend says the change in performance is calculated the same way as in Figure 2a, so the changes are presumably the regression slopes. But how are the regression slopes calculated for daily start (1st trial) and daily end (last trial)?

      Skill level at the beginning and end of each trial correspond to the values of the regression line for abscissae values of trial 1 and trial 7 (green points). This has been added to the figure legend.

      (7) Do CN-CL and CN-VAL neurons also project to other brain regions besides the thalamus? Might these pathways also contribute to learning and consolidation of the accelerating rotarod task? Please discuss.

      This is now discussed in more detail in the last paragraph of the section “A specific impact on learning of CL-projecting CN neurons.”

      Reviewer #3 (Recommendations for the authors):

      (1) Please check the anatomic evidence for the strict dichotomy between intralaminar (specifically central lateral nucleus) nuclei projecting to the striatum and the ventral-anteriorlateral (VAL) complex projecting to the cortex. For example, while the Chen et al paper shows that there are cerebellar-intralaminar-striatal projections, it does not exclude intralaminar cortex projections, which have at least been demonstrated in rats. Similarly, VAL has projections to striatum (see, e.g., Smith et al, "The thalamostriatal system in normal and diseased states", Frontiers in Systems Neuroscience, 2014). It may be that some of these projections are stronger, but I don't think it's true that these pathways are as well-separated as the authors suggest. I also don't think this changes the fundamental conclusions but is important for potential mechanisms by which differential learning could occur and necessitate modification of Figure 5.

      We have toned down the interpretation of CL and VAL relaying specifically to different brain structures and mostly put forward the duality of the pathways. The connections with the cortex are now discussed at the end of the section “A specific impact on learning of CL-projecting CN neurons.”

      (2) Please provide more details on the spike sorting. By what metrics were single units declared to be well-separated? How many units were identified under each condition? What was the distribution of firing rates with and without CNO treatment? Are the units shown in panel 1f from before and after CNO as in panel E or are just 2 examples of isolated units? The units by themselves are not very helpful to the reader. Showing sample auto and/or crosscorrelograms for units recorded on the same electrode would be more helpful to show how well-isolated the units are.

      Single units were considered well-isolated based on quantitative quality metrics computed after MountainSort 4 spike sorting (Phyton 3.8). Units were required to have a signal-to-noise ratio (SNR) greater than 5, inter-spike interval (ISI) violations less than 1%, an amplitude cutoff below 0.1, a presence ratio above 0.9, a firing rate greater than 0.1 Hz, and at least 50 detected spikes. In addition, units were assessed for temporal stability across the recording using autocorrelograms and presence over the recording, ensuring there were no prolonged periods of total inactivity. Units meeting these criteria were deemed well-separated and reliable for further analysis. This has been added to the Methods.

      Cell numbers are provided with the statistics in the supplementary table for fig panel 1g. Panels are from the same unit before and after CNO. Example of auto- crosscorr- are provided in the new Supplementary Figure 6.

      (3) Panel 2g - "firing rate modulation" is unclear. I think the authors are showing the mean firing rate with DREADD+CNO treatment divided by the mean firing rate in the pre-CNO condition for the same group (I couldn't find that in the Methods, my apologies if I missed it)? However, firing rate modulation to me means variability in firing rate within a recording. Perhaps "relative firing rate" or "% pre-CNO firing rate" would be clearer?

      The definition has been added to the Method and the axis has been changed to ‘Change in FR induced by SAL/CNO’

      (4) Figure 3f - why does consolidation appear to be impaired after the transition from CNO to saline between sessions, when in panel 1j suppressing the CN does not have a similar effect once CNO is switched to saline? Could this be driven by a small number of mice? Since a central conclusion of the paper is that CN-VAL connections are uniquely important for posttraining consolidation, this discrepancy is important to explain - if the results post-saline are spurious, how do we know that the results post-CNO aren't also spurious? Panels similar to Figure 4b and d showing all the data from the last/first trial of each session I think would be convincing.

      Our results overall indicate that the overnight consolidation of the improvement in performance seem only effective in the early phase (as pointed out on the summary figure 5). We do not believe then that the saline results are spurious.

      It can be seen indeed in the control groups of the figure 1; to make this more visible, we plot in Author response image 1 the difference between trial 7 and trial 1 the next day. An overnight drop in performance becomes visible in the late phase.

      Author response image 1.

      The decrement on the first trial in the first 3 days is visible for the majority of the mice. The plot asked by the reviewer is represented in the Author response image 2.

      Author response image 2.

      Minor points:

      (5) In panel 1a, the solid yellow line obscures a lot of the image and I don't think adds anything.

      We assume this was referring to a line on fig1d, which has been removed.

      (6) Panel 2a - color selection could present problems for those with red-green color blindness.

      This has been fixed.

      (7) Supplementary Figure 3 - what are the arrows and arrowheads indicating?

      These have been removed.

      (8) In the Discussion: "Studies of cerebellar synaptic plasticity provide clearly support the involvement of cerebellum in rotarod learning..." Delete the word "provide"

      This has been fixed

      (9) "This indicates that either the distinct functional roles of VAL-projecting or CLprojecting." The second "of" should be "or", I think.

      This has been fixed.

    1. Reviewer #2 (Public review):

      The authors hypothesized that chemostat propagated viromes could modulate the GM and reduce NEC lesions while avoiding potential side effects, such as the earlier onset of diarrhea. This is interesting.

      Major revision

      (1) As authors said, the aim of the research is 'We hypothesized that chemostat propagated viromes could modulate the GM and reduce NEC lesions while avoiding potentialside effects, such as earlier onset of diarrhea'.

      (a) For the efficacy, in Fig 5, there are no significance in stomach pathology and enterocolitis between groups, even between the control group and the experimental groups, is it because of the low incidence of NEC? This may affect the statistical power of the conclusions. And how can you draw the conclusion that chemostat can reduce NEC lesions?

      (b) Lack of gross view pictures of animal tissues or any other pathological pictures is not convincing.

      (c) For the safety, such as body weight development, FVT had no statistical significance with control, CVT and CVT-MO, so how can you draw the conclusion that chemostat can avoiding potentialside effects?

      (d) The evidence to prove the decrease of eukaryotic viruses are not enough and quantitative.

      (2) Fig 3F,

      (a) How can a medium have 'the baseline viral content' ?

      (b) Statistical significance of relative abundance of specific eukaryotic viral contigs between different times is unkown.

      (c) Some of listed eukaryotic viruses, their hosts are not pigs, piglets or even human, so what's the meaning if these eukaryotic viruses decreased?

      (3) In this study, pH 6.5 was selected as the pH value for chemostat cultivation, but considering the different adaptability of different bacteria to pH, it is recommended to further explore the effect of pH on bacteria and virus groups. In particular, it was optimized to maintain the growth of beneficial bacteria such as Lactobacillaceae and Bacteroides in order to improve the effect of chemostat cultivation.

      (4) In some charts, the annotation of error lines, statistical significance markers (even 'ns' should be marked), etc., should be more standardized and clearer. And in your results section, the combination of pictures is messy, thus maybe you should do some recombination.

      Comments on revisions:

      (1) At the design level, the study posited "reduction of necrotizing enterocolitis (NEC)" as the primary hypothesis and endpoint. Yet neither of the two in-vivo experiments demonstrated any NEC-protective signal; Experiment 2 even showed a trend toward more severe gastric lesions. Although delayed onset of diarrhea can be listed as a secondary endpoint, its clinical significance is limited. The work remains a safety proof-of-concept and falls short of efficacy validation, yielding insufficient scientific value for publication.

      (2) The manuscript postulates a link between the loss of Lactobacillaceae phages and the absence of NEC protection, but no reverse verification (e.g., re-introducing these phages or optimizing culture to retain them) was performed within the study.

      (3) Culturing intestinal microbiota ex vivo is inherently challenging, owing to oxygen sensitivity, pH drift, nutrient depletion, and other factors. This study not only failed to demonstrate stable congruence between the cultured community and the original fecal inoculum, but also documented a marked loss of Lactobacillaceae and a 75 % drop in viral diversity. In the absence of any NEC-protective efficacy, the authors likewise provide no functional validation of phage viability (lysis assays, MOI determination, etc.). Consequently, the data are inadequate to support expectations of therapeutic benefit in vivo.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fecal virome transfer (FVT) has the potential to take advantage of microbiome associated phages to treat diseases such as NEC. However, FVT is also associated with toxicity due to the presence of eukaryotic viruses in the mixture, which are difficult to filter out. The authors use a chemostat propagation system to reduce the presence of eukaryotic viruses (these become lost over time during culture). They show in pig models of NEC that chemostat propagation reduce the incidence of diarrhea induced by FVTs.

      Strengths:

      The authors report an innovative yet simple approach that has the potential to be useful for future applications. Most of the experiments are easy to follow and performed well.

      Weaknesses:

      The biggest weakness is that the authors show that their technique addresses safety, but they are unable to demonstrate that they retain efficacy in their NEC model. This could be due to technical issues or perhaps the efficacy of FVT reported in the literature is not robust. If they cannot demonstrate efficacy of the chemostat propagated virome mixture, the value of the study is compromised.

      We appreciate the reviewer’s assessment and fully acknowledge that our inability to demonstrate NEC protection by FVT is a limitation to the study. If technical issues cover the variability in disease phenotype in our animal model, which is of a spontaneous nature, then yes we fully agree. Issues with FVT preparation are however unlikely, as this is performed per protocol. The effect of FVT on NEC has hitherto only been demonstrated by our research group in two individual studies using separate donor fecal material, so it is indeed too early to speculate about robustness in FVT response. We have briefly mentioned this in the results (lines 563-565) and discussion (lines 777-779), but agree that it needs further elaboration. We have now revised the discussion and conclusion to better emphasize the extent and consequences of this limitation (lines 793-797 + lines 817-818). Importantly, we show that inclusion of specific nutrients, such as milk oligosaccharides, impacts the resulting propagated fecal-derived virome. One can argue that this is not surprising, but it has nevertheless not been shown before – and it opens up possibilities for future “tailor-made” fecal-derived viromes with predictable profiles and effects.

      Even though we do not demonstrate an effect of the chemostat-propagated virome, we still believe that the study provides valuable insights as a proof-of-concept. Specifically, we demonstrate that in vitro chemostat propagation can significantly modulate the safety profile of FVT, while still driving changes in the microbiome, e.g., by decreasing C. perfringens.

      The above issue is especially concerning because the chemostat propagation selected for bacteria that may not necessarily be the ones that harbor the beneficial phages. Without an understanding of exactly how FVT works, is it possible to make any conclusion about the usefulness of the chemostat approach?

      The chemostat work was based on the idea that if we culture a fecal inoculum under suitable conditions, then the phageome would propagate alongside and allow for a scalable production method for standardized donor-independent FVT. We are cognizant that the chemostat end-culture diverged quite markedly from the fecal inoculum. In reality, such divergence is unavoidable when performing in vitro simulation of intestinal growth conditions. On the positive side, we showed that we could drive an expansion of Bacteroides spp. by supplementing the media with human milk oligosaccharides. We have previously shown that Bacteroides spp. engraft FMT recipients that are in turn protected from NEC. However, there is much room for refinement of the chemostat culture condition; i.e. to preserve the rich repertoire of lactobacilli from the inoculum e.g. by means of lowering the pH. Moreover, the loss of viral diversity in the chemostat end-culture also needs to be addressed, potentially by lowering the chemostat dilution-rate to allow the time for phage propagation. Based on these insights, we will in the near future invest heavily in improving the chemostat procedure to end up with a propagated fecal virome with better resemblance to the fecal inoculum.

      Finally, can the authors rule out that their observations in THP-1 cells are driven by LPS or some other bacterial product in the media?

      We thank the reviewer for raising this point. To minimize the influence of bacterial contaminants such as LPS or other small bacterial products, we implemented several steps during sample preparation. Specifically, we performed ultrafiltration using a 300 kDa molecular weight cut-off, which should remove small molecules, including LPS, bacterial metabolites, and other potential soluble immunomodulators. Hereafter, all viral preparations underwent endotoxin removal procedures prior to cell exposure. These precautions reduce the likelihood that our observed effects in THP-1 cells are attributable to bacterial products rather than viral components. This is explained in the referenced article (20), but we have now added the clarification to the Methods section of the revised manuscript (lines 222 and 227). The immune expression profile differs markedly between the viral preparations and the E. coli control, e.g. IFNG, TLR3, TLR8, making it highly likely that viral epitopes are the major drivers of the viral preparations with less impact by any potential bacterial epitope contaminant. This is now mentioned in the results section (line 541-543):

      Reviewer #2 (Public review):

      Major revision

      (1) As authors state that the aim of the research is 'We hypothesized that chemostat propagated viromes could modulate the GM and reduce NEC lesions while avoiding potential side effects, such as earlier onset of diarrhea'.

      (a) For the efficacy, in Fig 5, there are no significance in stomach pathology and enterocolitis between groups, even between control group and experimental groups, is it because of the low incidence of NEC? This may affect the statistical power of the conclusions. Therefore, it is unclear how one can draw the conclusion that chemostat can reduce NEC lesions?

      Thank you for highlighting this important point. We fully agree and would like to clarify that it is not our intention to conclude that chemostat propagation reduces NEC lesions under the experimental settings within this paper. Rather, this was our initial hypothesis, which could not be confirmed. The unexpectedly low incidence of NEC across groups in Piglet Experiment 1 did not allow for a clear conclusion, but the second Piglet Experiment 2 failed to show a NEC-reducing effect. We have stated this important point in the following sections:

      - Abstract (line 42-44): “However, these signatures were lost in recipients of chemostat-propagated viromes, and only minor microbiome effects and no NEC prevention were observed.”

      - Results (line 699): “This highlights that while chemostat propagation effectively mitigates virus-associated diarrhea, the method needs further optimization to targt NEC.”

      - Discussion (lines 773–775): “However, the MO-propagated chemostat virome did not increase Bacteroides or Parabacteroides spp. in the recipient’s gut, nor did it provide NEC protection.”

      - We have rephrased this to emphasize the importance of Experiment 2.

      - To avoid any potential misinterpretation, we have rephrased line 598 to reflect that we observed “a difference in the clinical side effect pattern” rather than implying efficacy.

      - Furthermore, we have updated the summary title for Figure 8 (line 704) to clearly state: “MO-propagated virome modestly exacerbates gastric injury and fails to improve NEC.”

      - Also, we have added the following section to the discussion (lines 793-797): “However, we acknowledge that the absence of demonstrated NEC prevention by the native donor virome is a significant limitation to conclusions regarding efficacy. Without a protective baseline, we cannot assess whether the virome efficacy was lost during chemostat propagation. Consequently, we cannot confirm or dismiss the hypothesis that chemostats can preserve a phage community capable of preventing NEC.”

      - Lastly, we have updated the conclusion (lines 817-818): “However, as neither the chemostat-propagated viromes nor the native donor virome demonstrated NEC prevention, the efficacy of the chemostat approach remains inconclusive.”

      - These changes should clarify that while the study demonstrates improved safety via reduced diarrhea, NEC efficacy was not obtained.

      (b) More convincing pathology images would be helpful.

      Since we did not observe a protective effect against NEC with either of the treatments, we opted not to include pathology images. However, extensive examples can be found in the cited paper (reference 37), which describes our NEC scoring methodology in the Methods section (lines 268-271): https://doi.org/10.1016/j.yexmp.2024.104936.

      (c) For the safety, such as body weight development, FVT had no statistical significance difference from control, CVT, and CVT-MO, so how can you drawn the conclusion that chemostat can avoid potential side effects?

      We appreciate the reviewer’s observation. To clarify, we do not claim that chemostat propagation completely avoids all potential side effects, but rather that it mitigates them. As shown in Fig. 5G, FVT recipients exhibited significantly reduced body weight gain compared to controls, CVT, and CVT-MO specifically on day 4, but not on day 5. This transient effect suggests that side effects such as reduced growth and early-onset diarrhea are delayed, not entirely prevented, by chemostat propagation. This is stated in the results section in lines 593-595. We also believe that this is consistent with the paper title and the conclusion that the chemostat process minimizes the adverse effects associated with native FVT (line 813).

      (d) There is lack of evidence to convince the reader that there is a decrease of eukaryotic viruses. More quantitative data here would be useful.

      Apart from the fact that it is impossible for eukaryotic viruses to shed in a system devoid of eukaryotic cells, and that the chemostat runs continuously exchanges the culture, thereby diluting any substance incapable of propagation, we agree that quantitative data to demonstrate a reduction of eukaryotic virus load is lacking.

      However, in this case we believe the relative viral abundance data are almost as convincing. To make this even clearer, we have produced new graphs showing 1) the eukaryotic viral abundance relative to total viral abundance and 2) observed eukaryotic viral species, both after medium subtraction. Eukaryotic viral relative abundances decrease from around 0.4% to approach zero already in the batch phase, and similarly number of eukaryotic viral species decrease from around 10 in the fecal inoculum to zero midway through the chemostat phase. These new graphs are now part of Supplementary figure S3 B-C. Moreover, an error in the eukaryotic viral heatmaps presented in Figure 3F now means that the relative abundance of each sample (column) now sums up to 100%. Please also notice from the lower heatmap (where the virome signature of the medium is subtracted) that no eukaryotic viruses are identified from the sequencing data of the samples from the chemostat from 50 hours and onwards.

      However, for future experiments we will consider adding a known quantity of a marker virus to the inoculum and monitoring its concentration (e.g., by qPCR) throughout the culture process. Importantly, if the resulting virome is meant for in vivo testing, this marker virus should be inert to the receiving organism.

      (2) Questions regarding Fig 3F,

      (a) How can the medium have 'the baseline viral content' ?

      As we have previously seen persistent eukaryotic viral signals in metagenomics sequencing data from chemostat experiments, we sampled and sequenced the culture medium. As is seen from Figure 3F, this only concerns Dicistroviridae, as the patterns of the remaining eukaryotic viral signals before and after medium subtraction are virtually similar. For some reason, a component of the culture medium contains a genetic signal from this entity. Since all culture components are sterilized, it is most likely genomic traces that are then continuously supplied with the medium and appears in all culture samples. As it is unlikely to derive from intact viruses, the in vivo implications are deemed minimal.

      (b) What is the statistical significance of relative abundance of specific eukaryotic viruses?

      The same as any statistical comparison on single OTU level in a nucleotide sequencing dataset. As commented above, it does not prove a quantitative depletion of eukaryotic virus throughout the chemostat process but given the context a reduction in relative abundance supports the notion that eukaryotic viruses are indeed depleted when the culture medium is exchanged. The relevant question to us is: What is the magnitude of depletion? Which is particularly relevant since the clinical data indicates a delay and not a prevention of side effects after transplantation. Hence, as proposed above, the use of a marker virus would provide us with that answer.

      (c) The hosts for some of the listed eukaryotic viruses are neither pigs or human, as such the significance of a decrease in these viruses to humans is unclear.

      Dicistroviridae is not present in the inoculum and shows up only when medium is added. Picobirnavirus and Astrovirus are relevant mammalian intestinal viruses, whereas Smacoviridae is less well described (dois: 10.3389/fvets.2020.615293 and 10.3390/v8020042). Genomoviridae as a fungal virus indeed appears to be less relevant in the case of the mammalian intestine. Indeed, at any given time point in any given individual, be it a pig or a human, it would carry with it several viral species that are incapable of infecting it, most likely transiting after being ingested with food, or in the case of pigs through rummaging. It is no secret that we have been searching for a causative agent responsible for the clinical side effect patterns related with FVT, but there seems to be no consistent viral agent that is overabundant in diarrheal piglets. Hence, in this study, we are mostly interested in the proof-of-concept for overall eukaryotic virus reduction through chemostat propagation, and we believe we have presented data in support of this.

      (3) In this study, pH 6.5 was selected as the pH value for chemostat cultivation, but considering the different adaptability of different bacteria to pH, it is recommended to further explore the effect of pH on bacteria and virus groups. In particular, it was optimized to maintain the growth of beneficial bacteria such as Lactobacillaceae and Bacteroides in order to improve the effect of chemostat cultivation.

      We agree that pH is a key parameter in shaping microbial communities during chemostat cultivation. As noted, we selected pH 6.5 to balance physiological relevance and bacterial viability, but we acknowledge that this pH may not be optimal for supporting the growth of certain potentially beneficial taxa such as Lactobacillaceae. We explicitly address this in the discussion (lines 736–741), where we state that the selected pH may have limited engraftment and that future studies should investigate pH optimization to better support bacterial groups and improve the overall effectiveness of the cultivation system.

      (4) Please improve the quality of the images, charts, error bars and statistical significance markers throughout and mark the n's. used in each experiment.

      We have carefully reviewed all figures and could not identify any general image quality issues. If some specific images or panels appear unclear or problematic, we would appreciate it if the reviewer could point them out so we can address them directly.

      Regarding sample sizes, the number of animals (n) is indicated in Fig. 5A and its legend, as well as in Fig. 8A. We have now also added this information to the legend of Fig. 8 for clarity.

      To improve the clarity of statistical findings, we have added asterisks to denote significance in panels 6A, 6F, and 7A, as requested.

      To improve the clarity of Fig. 3B, we have added a dashed line to separate LAC and LAC-MO.

      Reviewer #3 (Public review):

      Major revisions

      This study investigated the in vitro amplification of donor fecal virus using chemostat culturing technology, aiming to reduce eukaryotic virus load while preserving bacteriophage community diversity, thereby optimizing the safety and efficacy of FVT. The research employed a preterm pig model to evaluate the effects of chemostat-propagated viromes (CVT) in preventing necrotizing enterocolitis (NEC) and mitigating adverse effects such as diarrhea.

      Strengths:

      Enhanced Safety Profile: Chemostat cultivation effectively reduced eukaryotic virus load, thereby minimizing the potential infection risks associated with virome transplantation and offering a safer virome preparation method for clinical applications.

      Process Reproducibility: The chemostat system achieved stable amplification of bacteriophage communities (Bray-Curtis similarity >70%), mitigating the impact of donor fecal variability on therapeutic efficacy.

      Weaknesses:

      Loss of Phage Functionality: The chemostat cultivation resulted in a reduction in phage diversity (e.g., the loss of Lactobacillaceae phages), which may compromise their protective effects against NEC (potentially linked to the immunomodulatory functions of Lactobacilli). The authors should explicitly address this limitation in the discussion section, particularly if additional experiments cannot be conducted to resolve it within the current study.

      We appreciate the reviewer’s concern and agree that the loss of phage diversity during chemostat cultivation, especially phages targeting Lactobacillaceae, is an important limitation with potential implications for NEC protection.

      We already described the depletion of Lactobacillaceae in the chemostat and its implications in the discussion (lines 742-751 + 787-793), along with our plans to address this in future work by adjusting culture pH. However, we acknowledge that the significance of losing phage diversity deserves more explicit attention. Accordingly, we have expanded the discussion to highlight the possible consequences of this loss and its impact on phage functionality (see lines 758–762), as suggested by the reviewer.

      Limitations in Experimental Design: The low incidence of NEC lesions in the control group reduced the statistical power of the study. This limitation undermines the ability to conclusively evaluate the efficacy and safety of the chemostat-propagated virome as a novel intervention for NEC. Future studies should optimize experimental conditions (e.g., using a more NEC-susceptible model or diet) to ensure adequate disease incidence for robust statistical comparisons.

      We agree that the low NEC incidence in Experiment 1 limited the statistical power to evaluate efficacy. To address this, we designed Experiment 2 using a more NEC-inducing diet (formula 2), which resulted in a higher level of baseline lesions. This allowed for a more conclusive assessment, demonstrating that the MO-propagated chemostat virome did not provide NEC protection when using the donor feces and culture conditions applied in this experiment.

      We acknowledge that this was too unclear in the original manuscript. Please see the response to the first comment by Reviewer 2, where we have highlighted several revisions to improve clarity.

      However, we do believe the data are robust enough to conclude that the level of diarrhea — and thereby safety — was improved in the piglet model, which is why we chose to focus on this aspect in the paper’s title.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The manuscript presents a well-structured study investigating the feasibility of using chemostat-based culturing of the fecal virome to reduce the transfer of eukaryotic viruses during fecal virome transfer (FVT). Utilizing both in vitro fermentation systems and a preterm piglet model, the authors explore whether this method could be a safer and equally effective alternative to raw FVT for treating neonatal intestinal diseases, such as necrotizing enterocolitis (NEC). This study introduces a novel mitigation strategy for FVT through chemostat fermentation. However, a significant revision is recommended before the manuscript can be considered for publication.

      Major Changes:

      - A central aim of the study was to assess whether chemostat-cultured viromes maintain protective effects against NEC. However, this key outcome remains "unresolved" due to the low incidence of NEC in the control group. The discussion should address this limitation.

      We fully acknowledge this limitation and agree that our study cannot conclude whether the NEC effect of FVT was maintained without demonstrating an effect of this native virome. Please see our response to a similar concern raised by Reviewer 1, where we describe the revisions made to the discussion (lines 793-797) and conclusion (lines 817-818).

      - The section on viral particle enrichment should be expanded and discussed in more detail. It would be beneficial to examine its efficiency in separating bacteria from viral-like particles (VLPs) compared to findings from previously reported studies. The authors should clarify the rationale behind the selected dose of VLPs used in the experiments and their role in virus engraftment results.

      We selected the virome isolation method based on previous experiments within our lab, demonstrating efficient separation of bacteria and virus particles, using a 0.45 um filter syringe. Filtrates were quality assessed by fluorescence microscopy, showing absence of intact bacteria. Using a diverse mock virus community, we also showed a high degree of preservation of infective viruses in the FVT following the isolation procedures. We have now expanded the description of the separation method in the results section with a reference to this work (lines 188-190). We did however choose to increase the molecular weight cut off (MWCO) to enhance the exclusion of non-viral components.

      We acknowledge that the rationale and importance of the VLP dose was lacking in the discussion. This has now been added (line 758-762).

      - The viral richness of chemostat viromes was significantly lower than that of native feces. The authors should discuss how this may impact microbiome and virome outcomes.

      We have included this point in the new section about VLP dose in the discussion. Please see lines 758-762.

      - The immune response was assessed through THP-1 cells and a limited piglet cytokine panel. These may not fully represent the intestinal epithelial or mucosal immune responses. Thus, authors should acknowledge these limitations in the discussion section.

      Thank you for the comment. The limitation of using THP-1 cells as an in vitro model is already acknowledged in the results section (line 545): “Since fecal-derived eukaryotic viruses mainly infect intestinal cells, an

      in vivo stimulation may reveal a different response pattern. ”

      The limited panel of porcine cytokines was not intended as a comprehensive assessment of the mucosal immune response, but rather as supportive data for NEC-associated inflammation, as we have previously demonstrated (reference 37: https://doi.org/10.1016/j.yexmp.2024.104936). To obtain a comprehensive view of the immune response, a few days after diarrhoea onset, we additionally performed RNA-Seq analyses of the intestinal lymph node.

      - While the manuscript is comprehensive, it is also lengthy and text-heavy. Some sections could be condensed for clarity.

      The manuscript has been through multiple revisions by authors. While it is indeed lengthy, we have removed non-essential information and redundancies and now feel that the balance between data, text, figures, and supplementary information is acceptable.

      - Several figures (e.g., Figs. 1-5) contain significant data but need clearer summaries in their captions.

      We appreciate the suggestion and have revised the captions for Figs. 1-8 to provide clearer, more informative summaries of the data they present.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We sincerely thank the reviewer for the thorough and constructive evaluation of our manuscript. We greatly appreciate the recognition of our work's strengths, particularly the integration of experiments and mathematical modeling, the stochastic framework for describing sloughing events, and the insights into pressure-driven detachment dynamics.

      We have carefully considered each point raised and provide detailed responses below. In response to the reviewer's comments, we have revised the Methods section to better clarify our approach to three-dimensional assessment. We believe these revisions have improved the clarity of the manuscript.

      Below, we address each of the specific concerns raised by the reviewer:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:<br /> The study achieves its primary goal of integrating experiments and modeling to understand the coupling between flow and biofilm growth and detachment in a microfluidic channel, but it should have highlighted the weaknesses of the methods. I list the ones that, in my opinion, are the main ones:

      The study does not consider biofilm porosity, which could significantly affect the flow and forces exerted on the biofilm. Porosity could impact the boundary conditions, such as the no-slip condition, which should be validated experimentally.

      Porosity is indeed a key component of biofilm structures, resulting from the polymeric nature of the EPS matrix, mechanical forces, and biological processes such as cell death or predation. When considering flow-biofilm interactions, this porosity may allow fluid flow through the biofilm, with reported permeability values spanning an extremely broad range from 1015 to 10-7 m2 (Kurz et al., 2023).

      However, we argue that biofilm permeability is not the primary driver in our system:

      (1) In microscopy visualization, our biofilms form dense structures where flow around the biofilm through narrow channels dominates over flow through the porous biofilm matrix.

      (2) We performed microrheology experiments in these biofilms by imaging the Brownian motion of nanoparticles in the biofilm. Their trajectories indicate that, in our conditions, the viscoelastic flow of the biofilm itself largely dominates over the flow of culture medium through the biofilm matrix.

      (3) We argue that the extreme variability in reported permeability values (spanning several orders of magnitude, Kurz et al., 2023) reflects not only differences in experimental systems, but also fundamental challenges in defining and measuring permeability for viscoelastoplastic biofilms (the biofilm itself is actually flowing). Given this uncertainty, incorporating permeability into our model would introduce parameters that cannot be reliably constrained from literature or independently measured in our setup. Our approach (i.e. treating the biofilm as impermeable and focusing on flow obstruction) avoids this parametrization complexity while successfully capturing the observed dynamics.

      (4) Our model successfully predicts the observed scaling laws (φmax ∝ Q1/2, Fig. 7f) and hydraulic resistance dynamics (Fig. 3) without invoking permeability, suggesting that flow obstruction rather than flow penetration is the dominant mechanism.

      Reference: Kurz, D. L.; Secchi, E.; Stocker, R.; Jimenez-Martinez, J. Morphogenesis of biofilms in porous media and control on hydrodynamics. Environ. Sci. Technol. 2023, 57 (14), 5666−5677.

      The research suggests EPS development as a stage in biofilm growth but does not probe it using lectin staining. This makes it impossible to accurately assess the role of EPS in biofilm development and detachment processes.

      We respectfully disagree that lectin staining is necessary to assess the role of EPS in our system, and we argue that our approach using genetic mutants is superior for the following reasons. Lectin staining has significant limitations. While widely used, lectin staining (e.g., concanavalin A) is non-specific (binding not only to EPS polysaccharides but also to bacterial cell surfaces) and is non-quantitative. It can confirm the presence of polysaccharides but cannot establish causal relationships between specific EPS components and mechanical properties or detachment dynamics. We performed preliminary experiments with ConA-rhodamine (data not shown), which showed widespread presence of polysaccharides. However, this provided limited insight beyond confirming EPS production, which is well-established for P. aeruginosa PAO1 biofilms. We employed a more rigorous genetic approach to directly assess the role of EPS composition. We used Δpel and Δpsl mutants (strains lacking key exopolysaccharides that are the primary structural components of the PAO1 matrix). Our results demonstrate that both mutants show significantly reduced maximum clogging compared to wild-type. The Δpsl mutant is particularly affected, with near-complete detachment at certain flow rates. These differences directly link EPS composition to mechanical stability and detachment dynamics. This genetic approach provides causal, quantitative evidence for the role of specific EPS components in biofilm development and detachment, information that lectin staining cannot provide. We believe this addresses the reviewer's concern more rigorously than lectin staining would.

      While the force and flow are three-dimensional, the images are taken in two dimensions. The paper does not clearly explain how the 2D images are extrapolated to make 3D assessments, which could lead to inaccuracies.

      We thank the reviewer for this important observation. We would like to clarify our methodological approach. Our primary three-dimensional measurement is the hydraulic resistance R(t), obtained from pressure drop measurements across the biofilm-containing channel section. This pressure-based measurement inherently captures the three-dimensional flow obstruction caused by the biofilm. We then employ a geometric model (uniform biofilm layer on all channel walls) to convert R(t) into volume fraction φ(t).

      The two-dimensional fluorescence imaging serves to validate this model-based approach rather than being the basis for three-dimensional extrapolation. The uniform layer assumption is supported by three independent lines of evidence: (i) the excellent quantitative agreement between predicted and measured scaling laws (φmax ∝ Q1/2, Fig. 7f), obtained without adjustable parameters; (ii) the high reproducibility of φmax values across different flow rates and replicates; and (iii) the strong correlation between model-derived φ(t) from pressure measurements and integrated fluorescence intensity (Fig. 3b-d).

      We have added clarifying text in the Methods section (subsection "Data analysis for the calculation of the hydraulic resistance and volume fraction") to better explain this approach and emphasize that pressure measurements provide the three-dimensional information, with the geometric model serving as the link to volume fraction.

      Although the findings are tested using polysaccharide-deficient mutants, the results could have been analyzed in greater detail. A more thorough analysis would help to better understand the role of matrix composition on the stochastic model of detachment.

      We thank the reviewer for this suggestion. Our mutant analysis demonstrates that Δpsl and Δpel strains have significantly reduced φmax and altered detachment dynamics compared to wild-type (Fig. 8), directly linking EPS composition to mechanical stability as predicted by our model. A rigorous quantitative connection between matrix composition and the stochastic parameters (interevent times, jump amplitudes) would require: (i) substantially more sloughing events for statistical power, (ii) independent mechanical characterization of each mutant, and (iii) a mechanistic model linking EPS composition to detachment parameters. We are currently developing microrheology approaches to characterize mutant mechanical properties, which could enable such refinement in future work.

      However, this represents a substantial study beyond the scope of the current manuscript, which establishes the self-sustained sloughing-regrowth cycle and its stochastic nature. The mutant results serve their intended purpose: demonstrating that EPS composition affects detachment, consistent with our model's framework.

      Reviewer #2 (Public review):

      This manuscript develops well-controlled microfluidic experiments and mathematical modelling to resolve how the temporal development of P. aeruginosa biofilms is shaped by ambient flow. The experiment considers a simple rectangular channel on which a constant flow rate is applied and UV LEDs are used to confine the biofilm to a relatively small length of device. While there is often considerable geometrical complexity in confined environments and feedback between biofilm/flow (e.g. in porous media), these simplified conditions are much more amenable to analysis. A non-dimensional mathematical model that considers nutrient transport, biofilm growth and detachment is developed and used to interpret experimental data. Regimes with both gradual detachment and catastrophic sloughing are considered. The concentration of nutrients in the media is altered to resolve the effect of nutrient limitation. In addition, the role of a couple of major polysaccharide EPS components are explored with mutants, which leads results in line with previous studies.

      There has been a vast amount of experimental and modelling work done on biofilms, but relatively rarely are the two linked together so tightly as in this paper. Predictions on influence of the non-dimensional Damkohler number on the longitudinal distribution of biofilm and functional dependence of flow on the maximum amount of biofilm (𝜙max) are demonstrated. The study reconfirms a number of previous works that showed the gradual detachment rate of biofilms scales with the square root of the shear stress. More challenging are the rapid biofilm detachment events where a large amount of biofilm is detached at once. These events occur are identified experimentally using an automated analysis pipeline and are fitted with probability distributions. The time between detachment events was fitted with a Gamma distribution and the amplitude of the detachment events was fitted with a log-normal distribution, however, it is not clear how good these fits are. Experimental data was then used as an input for a stochastic differential equation, but the output of this model is compared only qualitatively to that of the experiments. Overall, this paper does an admirable job of developing a well-constrained experiments and a tightly integrated mathematical framework through which to interpret them. However, the new insights this provides the underlying physical/biological mechanisms are relatively limited.

      We thank the reviewer for the thorough evaluation of our work and for highlighting the tight integration between experiments and modeling. We appreciate the constructive feedback regarding the goodness-of-fit for the probability distributions.

      To address the concern that "it is not clear how good these fits are," we have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes.

      Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      We respectfully disagree that “new insights this provides the underlying physical/biological mechanisms are relatively limited.” Beyond confirming previous findings (e.g., scaling for gradual detachment), we believe our work provides several novel mechanistic insights. First, the Pe/Da criterion enables quantitative prediction of nutrient limitation regimes, allowing systematic decoupling of nutrient effects from other phenomena in biofilm studies. Second, we demonstrate that pressure, not shear, drives sloughing detachment events, a mechanism overlooked in previous studies where the notion of “shear-induced detachment” clearly dominates. Third, we show that sloughing-regrowth cycles occur even in single channels, establishing pressure-driven fluctuations as a signature of confined biofilm growth, independent of geometric complexity. Finally, the stochastic description of sloughing demonstrates that, while instantaneous biofilm states are irreproducible, the underlying randomness is predictable, therefore addressing a fundamental challenge in biofilm research.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the abstract, I suggest clarifying the term "bacteria development." It is unclear if it refers to bacterial growth, biofilm formation, or biofilm detachment. The concept is expressed more clearly at the end of the Introduction.

      We have modified the entire abstract to make it clearer. The abstract now explicitly establishes the key processes - growth ('nutrients necessary for growth', 'growing bacteria obstruct flow paths') and detachment ('mechanical stresses that cause detachment', 'flow-induced detachment', 'sloughing') - before using 'bacterial development' as a collective term to refer to these coupled spatiotemporal dynamics. We believe the abstract is now clear as written.

      (2) Findings from Sanfilippo et al. (2019) were slightly questioned by Padron et al. (PNAS, 2023), who discovered that H2O2 transport is responsible for fro operon upregulation.

      Thanks for the clarification, which is indeed significant. The new sentence now reads: Pseudomonas aeruginosa has been found to regulate the fro operon in response to flow-modulated H2O2 concentrations (Sanfilippo et al. 2019, Padron et al. 2023).

      (3) Additionally, Kurz et al. (2022) account for pressure buildup as the mechanism controlling sloughing.

      We respectfully disagree and note that Kurz et al. (2022) identify shear stress, not pressure buildup, as the primary mechanism controlling sloughing. Besides the title, key sentences include “opening was driven by a physical process and specifically by the shear forces associated with flow through the biofilm”, “The opening of the PFPs is driven by flow-induced shear stress, which increases as a PFP becomes narrower due to microbial growth, causing biofilm compression and rupture.” While pressure differences are measured as indicators of system state and do contribute to normal compression stresses, their mechanistic explanation emphasizes that narrowing PFPs experience increased shear rates that eventually exceed the biofilm's yield stress, triggering viscoplastic deformation and detachment. The pressure buildup is a hydraulic consequence of narrowing rather than the direct cause of sloughing. In contrast, our work demonstrates that in confined geometries, pressure differences generate tangential stresses at the biofilm-solid interface that directly drive detachment.

      (4) The flow control strategy represented in Fig. 1 is not explained and should be detailed in the Methods section.

      The methods section reads as follows. Inoculation and flow experiments BHI suspensions were adjusted at optical density at OD640nm= 0.2 (108 CFU/mL) and inoculated inside the microchannels from the outlet, up to approximately ¾ of the channel length in order to keep a clean inlet. The system was let at room temperature (25°C) for 3h under static conditions. Flow experiments were then performed at 0.02, 0.2, 2, 20 and 200 μL/min constant flow rates for 72h in the microchannels at room temperature. For the experiments at 0.2, 2, 20 and 200 μL/min, the fluidic system was based on a sterile culture medium reservoir pressurized by a pressure controller (Fluigent FlowEZ) and connected with a flow rate controller (Fluigent Flow unit). The flow rate was maintained constant by using a controller with a feedback loop adjusting the pressure in the liquid reservoir. The reservoir was connected to the chip using Tygon tubing (Saint Gobain Life Sciences Tygon™ ND 100-80) of 0.52 mm internal diameter and 1.52 mm external diameter, along with PEEK tubing (Cytiva Akta pure) with 0.25 mm inner diameter adapters for flow rate controller. The waste container was also pressurized by another independent pressure controller to reduce air bubble formation in the inlet part. For the experiments at 0.02 μL/min, we used an Harvard Phd2000 syringe pump for the flow.

      (5) Including images of the actual biofilms formed in a portion of the channel would aid in understanding the analysis presented in Fig. 2.

      Images are introduced later on (eg Figure 5). There is also supplementary material showing videos.

      (6) The boundary conditions used to calculate the stress in the developed model should be discussed. The authors should specify why biofilm porosity is neglected.

      We have added a detailed discussion in the supplementary (Section I.2).

      (7) In the first section of the Results, the authors hypothesize that heterogeneity in biofilm development could be due to oxygen limitation. However, given the high oxygen permeability of PDMS, this hypothesis is later denied by their data. It would be prudent to avoid this hypothesis initially to streamline the presentation. Additionally, the authors should specify how oxygen levels at the inlet and outlet are measured.

      We appreciate this comment and agree that streamlining would simplify the presentation. However, after careful consideration, we have chosen to retain the oxygen limitation hypothesis for the following reasons: (1) oxygen limitation is a frequently invoked mechanism in biofilm systems and deserves explicit consideration, (2) it is not immediately obvious that oxygen remains non-limiting in larger microchannels where transverse gradients could develop, and (3) systematically eliminating this plausible alternative hypothesis strengthens our mechanistic conclusion that BHI drives the observed heterogeneity. Regarding oxygen measurements: we did not directly measure dissolved oxygen concentrations. Our approach is only indirect.

      (8) What is the standard deviation of the doubling time measured at different flows (page 9)?

      We have indicated the standard deviation in the text. Note that the graph shows the SEM.

      (9) What is the "zone of interest" in the channel mentioned on page 9?

      We have added the following sentence to clarify: To further understand this effect, let us consider the mass balance of biofilm in the zone of interest -- the zone where biofilm grows in between the two UVC irradiation zones -- in the channel.

      (10) Minor and major detachment events should be classified based on a defined threshold or criteria, and their frequency should be measured.

      We appreciate the reviewer's concern about quantitative rigor. However, we respectfully disagree that imposing arbitrary thresholds to classify 'minor' vs. 'major' events would improve our analysis. Detachment events in our system span a continuum of magnitudes, and any threshold would be artificial and potentially misleading. Our quantitative characterization of detachment dynamics is provided through the statistical analysis of interevent times, which we show follow a gamma distribution. This stochastic framework captures the full spectrum of detachment behavior without requiring arbitrary binning. The terms 'minor' and 'major' in our manuscript are used qualitatively to illustrate the range of observed phenomena, not as formal classifications.

      (11) Have the authors identified a reason for the peaks in the volume fraction in the Δpsl mutants at the highest flow rate?

      The biofilm thickness following these sloughing events is below our detection limit, consistent with a residual layer of cells. However, these cells grow, leading to a time window where the fraction is measurable, before a new detachment event occurs. Our understanding is that the psl mutant forms a weaker matrix with a much lower threshold for sloughing.

      (12) The fit of the probability density function for the relative density function does not match the data well. The authors should comment on this.

      We have added quantile-quantile (Q-Q) plots for the Gamma distribution fits of inter-event times to the Supplementary Materials (Supplementary Figure S20). These plots demonstrate that the sample quantiles track the theoretical Gamma quantiles across all flow rates (0.2, 2, and 20 μL/min), indicating that the Gamma distribution provides a reasonable approximation of the overall distributional behavior. For detachment amplitudes, we selected the lognormal distribution based on the observed high skewness and kurtosis in the data, which are characteristic signatures of lognormal processes. Formal goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) yielded mixed results across datasets, passing for some while failing for others. This variability reflects inherent noise from measurements, discrete temporal sampling, automated detection thresholds, and intrinsic biological variability. Importantly, our goal is to capture essential distributional characteristics for input into the stochastic model, not to achieve perfect statistical fit across all individual datasets. The Q-Q plots confirm that these distributions provide reasonable approximations, and the qualitative agreement between model predictions and experimental observations validates this modeling approach. We have revised the Methods section to clarify this rationale.

      (13) Additionally, the simulated fraction appears very flat, with limited detachments compared to experiments. Why?

      The model captures the essential dynamics of growth-detachment cycles, including the characteristic timescales and volume fraction ranges. Some event-to-event variability in the experimental data likely reflects biological stochasticity not captured by our current approach—for example, variations in local biofilm mechanical properties or matrix composition that affect the precise stress at which sloughing occurs. While incorporating such biological variability as a stochastic parameter would improve detailed agreement, it would require extensive additional characterization beyond the scope of this study. The current model successfully reproduces the key qualitative and semi-quantitative features of the system.

      (14) The methods section should include a more detailed explanation of how the model was validated against experimental data.

      Model validation was performed by comparing predicted biofilm volume fraction time series and sloughing event statistics against experimental observations across multiple flow rates. The model reproduces the characteristic growth-sloughing cycles, timescales, and steady-state volume fractions without additional parameter fitting beyond the experimentally measured distributions.

      (15) It would be useful to include information on the reproducibility of the experiments and any variations observed between replicates.

      Experiments were performed in N=3 biological replicates. Individual time series for all replicates are shown in Supplementary Figures, demonstrating consistent behavior across replicates.

      (16) A discussion of the limitations of the study, particularly regarding the assumptions made in the modeling and their potential impact on the results, would strengthen the paper.

      We have added a discussion on why we chose to neglect the porosity of the biofilm, and strengthened parts on the uniform biofilm layer assumption.

      Reviewer #2 (Recommendations For The Authors):

      Page 2: "A vast" —> "The vast"

      Changed.

      The text and line widths on many of the figures are far too small. I printed it out at normal size, but had to look at a PDF and magnify to actually see what the graphs are showing. Fig. 9c is particularly illegible.

      Changed.

      Fig. 1 caption "photonic" —> "optical"?

      Changed

      Can you spell out the actual mathematical definition of 𝜙 on page 5 when it is introduced? Currently it just says the "cross section volume fraction of the biofilm", but that seems potentially ambiguous. It is valid to say that this is "fraction of the cross section occupied by the biofilm"?

      Changed

      Bottom of page 5: can you state the physical interpretation of the assumption that M is bounded between 0 and 1. i.e. that growth is larger than detachment?

      There is a comment on that in the paper. It reads “In assuming that M ∈ ]0, 1] and eliminating cases where M > 1, we have not considered situations of systematic detachment 𝜙equ = 0 for any value of the concentration, since this is not a situation that we encountered experimentally.” This comes just after presenting the expression on the only non-trivial steady-state, as it becomes easier to explain the consequences of the initial choice at this point.

      Currently the choice of detachment initially used in the model is a bit confusing. You say that you are going to assume a (1-𝜙)-1 model for simplicity (bottom of page 5), but then later you find that the (1-𝜙)3/4 model is more accurate (page 16). Since the latter has already been confirmed in numerous other studies, why not start with that one from the beginning?

      We thank the reviewer for this important question, which highlights an area where our presentation could be clearer. We did not find that the (1-φ)-3/4 model is "more accurate." Rather, we deliberately chose the (1-φ)-1 scaling because it captures pressure-induced detachment, which we hypothesized would dominate in confined flows where biofilms clog a large portion of the channel. The (1-φ)-3/4 scaling, widely used in previous studies, describes shear stress at the biofilm/fluid interface and was developed primarily for reactor systems where pressure effects are negligible. Our analysis on page 16 validates this choice by demonstrating that pressure stress indeed exceeds shear stress when volume fraction is large, which corresponds to late Stage I and all of Stage II precisely where our model is applied. The excellent quantitative agreement between predicted and measured φmax values across flow rates (Fig. 7f, Table 1) further supports the (1-φ)-1 scaling. We recognize that our initial presentation may have suggested the (1-φ)-1 choice was merely for "simplicity." We have revised this section to emphasize that this scaling was chosen specifically to capture pressure-driven detachment in confined geometries, with the physical justification provided by the stress analysis that follows. We have also clarified our ideas on page 16 to express clearly that (1-φ)-3/4 is never used. We could alternatively use a multi-modal detachment function combining both scalings, but the data do not require this additional complexity.

      In general, the models you derived in this study could be better contrasted with that from previous works. e.g. can you compare your Eqn (4) with the steady-state solutions obtained by other previous studies? Is this consistent with previous works or different? (aside from framing the biofilm thickness in terms of 𝜙)

      We are currently working on a paper dedicated to modeling biofilm development in confined flows, which will do a better job at comparing approaches.

      Top of page 6 - you assume K* = 0.1 - Does this assume that cells grow at half the rate in 0.1X BHI as they do in 1X BHI? Has this been confirmed experimentally or is this just a guess?

      This was estimated rather than measured directly. Model predictions were a lot more sensitive to the Damköhler number, than to the value of K.

      "radial" is used widely in this paper, but you are using a square geometry. Is "transverse" a better choice?

      Yes it clearly is. It’s been changed.

      Fig 3. Are panels (a) and (b) showing different bioreps of the same condition? If so, please spell that out in the caption.

      There was an error here in the caption of fig a. This has been changed. The correspondence is between a and c, and these are exactly the same, not bioreps.

      In multiple places it noted that the change in hydraulic resistance is correlated with the "change in biofilm colonization." Why not demonstrate this directly using a cross correlation analysis? How is the latter connected to the 𝜙 parameter? (e.g. is this d(𝜙)/dt?)

      We thank the reviewer for this suggestion. To clarify: φ(t) represents the volume fraction of biofilm in the channel. We measure this in two independent ways: (1) φ(t) from hydraulic resistance (black line in Fig. 3) i.e. calculated from pressure measurements using φ = 1 - √(R₀/R(t)), assuming uniform layer growth (see Methods section "Data analysis for the calculation of hydraulic resistance and volume fraction") and (2) φ(t) from fluorescence (green squares in Fig. 3) i.e. estimated from integrated GFP intensity or image segmentation of the glass/liquid interface. The reviewer is correct that we should quantify this relationship directly. We have now added correlation analysis between these two independent measurements of φ (new Supplementary Figure S21). The analysis shows strong positive correlation, with r-values ranged from 0.68 to 0.77 across all flow rates. This validates two key aspects of our approach: (1) the uniform layer assumption used to convert R(t) to φ(t) is reasonable, and (2) the pressure-based measurements accurately capture the dynamics visible in fluorescence imaging, including both growth phases and sloughing events. The strong agreement is particularly notable given that these measurements probe different aspects of the biofilm: hydraulic resistance is sensitive to the three-dimensional obstruction of flow, while fluorescence captures primarily the biofilm attached to the glass surface within our focal plane. Their correlation supports the model assumptions. We have revised the manuscript to clarify this relationship and present the correlation analysis.

      Top of page 9 - a doubling time of 110 mins is reported in liquid culture - is this in shaken or static conditions? Can you provide some data on how this was calculated? (e.g. on a plate reader?) Do you think your measurements in the microfluidics could be affected by attachment/detachment of cells, rather than being solely driven by division. It is curious that your apparent growth rate varies by a factor of two across the different flow rates and there is not a monotonic dependency. Both attachment and detachment would depend on the flow rate (with some non-trivial dependencies).e.g. https://www.pnas.org/doi/10.1073/pnas.2307718120 https://doi.org/10.1016/j.bpj.2010.11.078

      Given that your doubling time in the microfluidics is sole based on changes in cell number (rather than directly tracking cell divisions) it seems possible your results here are measuring the combined effect of growth, attachment and detachment, rather than just growth.

      We agree with those comments regarding the doubling time measurement. We have added a description of how we performed the doubling time measurement in the Methods section.

      Page 9 - you discuss the role of EPS here, but the effect of EPS is not demonstrated here and this is muddled with a discussion about the non-linearity of the putative dependency. Maybe this would be on a firmer footing if you save the discussion of EPS for the section on the Psl and Pel mutants?

      Changed.

      Middle of page 9: Please define what "smooth detachment" means and contrast it with catastrophic sloughing. Also, please define what you mean by "flow, seeding, and erosion" detachment are and how these three things differ from one another.

      We have clearly defined each term in the revised version.

      The results from wavelet scalograms seem to be underutilised and not well described. Can you clearly say what time series this analyses has been calculated on the caption? e.g. hydraulic resistance? Other than simply pointing out the "blue stripes", what can be gained from this analyses that could not be obtained with another method? It would be great if the basic features of this plot could more fully discussed (e.g. is the curved envelope at the bottom caused by edge effects?)

      We have improved the text, captions and method section following the reviewer’s comment.

      Fig. 5 a and b - please list the time at which each of these images were taken. Do these have the same dt between the two sets of images?

      Yes the dt is the same (30 minutes). It’s been indicated in the caption.

      Fig. 6: you have significant 2D variation in the biofilm width along the length of the channel. The relative contribution of pressure and shear based detachment will be different at different positions along the length. However, this variation is ignored in your model. Can you please comment on this in our manuscript and how it might affect the interpretation of your results? e.g. would the longitudinally averaged description yield the same result as one that takes the geometry into account (on average)?

      Our model indeed assumes longitudinally averaged properties. A more detailed spatially resolved model would be valuable for capturing heterogeneities and will be explored in future work.

      Bottom of page 11: you say standard deviations are in the range of 10-3. How does this jibe with the error bars on the middle flow rate in Fig. 7e?

      This extremely low standard deviation only applies to the maximum value of 𝜙 and is a completely different measurement from the whisker boxes presented in fig7e.

      Fig. 7: You are calculating the "Fraction" here. Is this "𝜙"? If so, can you put that on the y-axis instead? You calculate the volume fraction two different ways e.g. with hydraulic resistance and with imaging. Is only one of these shown in (e)? Is the same powerlaw dependence shown in (f) conserved when the other measurement of the "fraction" is used? Can you include both in Fig. 7e?

      We have modified the axis and indicated 𝜙.

      (e) is calculated only from hydraulic resistance. This is the most precise measurement to evaluate 𝜙 quantitatively.

      Related to the previous comment: Some of the estimates of 𝜙max in Table 1 are obtained by fitting the model to integrated fluorescence data (Fig. 2b), while others are estimated from measurements of the hydraulic resistance. The former yields non-unique sets of parameters. Can the biofilm fraction instead actually be estimated directly from fluorescent imaging by segmenting biofilm and directly calculating how much of the cross section is occupied by cells on average across the length? This seems like a more direct measure of this quantity. Given there are multiple ways of estimating the same parameter, it would be better consistency checking to make sure that different methods actually yield the same result.

      We have now added in Fig S21 a direct comparison of these two measurement methods. These are strongly correlated. Microscopy is more direct but only provides 2D pictures. Hydraulic resistance provides a 3D measurement, but relies on a model of biofilm distribution. Both are imperfect, but correlate well. In particular, we see that the 2D measurement does capture sloughing.

      You cite a large number of supplemental figures (e.g. Fig. S21 on page 12), but the figures in your SI only go up to 11.

      We have revised references to supplementary figures.

      Bottom of page 11: Your data from liquid culture suggests that your psl mutant grows at half the rate of WT cells. Is that consistent with your microfluidic data (e.g. Fig. 8)? If not, might this be a sign that your growth rate analyses from the microfluidics might be affected by attachment/detachment? (see comment above) Psl cells should detach much more easily.

      The approach taken to measure doubling times in the microfluidic system does not rely on the macroscopic measurements presented in figure 8, but rather on the approach presented in fig 4. These measurements require specific imaging (different magnification and time stepping) and we did not perform such experiments for the mutants.

      In analyses of sloughing, you fit the times between the jumps and the relative amplitude. Are these two random variables correlated with one another? Might that influence your results? Your methods say that "jumps were identified through through the selection of local maxima" of the derivative. Do you to say "minima" here? Did you keep all local maxima/minima or did you have a threshold?

      These are two random variables, not correlated with another. This is an assumption, and it would be interesting to analyze whether these are correlated. To perform this analysis, we believe that we would first need to acquire even more data and more replications to improve the statistical analysis.

      Yes, it was minima (in the code we make everything positive, hence the confusion).

      Yes, there is a threshold on the value of the jump itself. This value is extremely low and essentially filters out noise.

      Fig. 9 - can you make it clearer in the caption what timeseries you are analysing here? I understand from the methods this that is the "volume fraction." The data/fits are difficult to see in Fig. 9 b and impossible to see in Fig. 9c because the green bars get in the way of the other two data sets. Can this visualisation be improved? It is not clear to me how good of a job the Gamma and log-normal fits are actually doing.

      We have clarified that histograms are calculated from all experiments/replicates.

      We have slightly modified the graph to make it clearer. This comparison is intrinsically hard, partly because it compares discrete data with continuous PDFs.

      Aside from noting the results from the stochastic sloughing model are 'strikingly similar to experimental data', which seems to be based on a qualitative analysis of the lines in Fig. 7 d, e, and f. However, experimental data is not plotted in the same graph nor is the experimental data that we should be comparing this to cited in the text/caption.

      We have added a note in the caption to indicate which figure it can be compared to.

    1. Region-based Editing Samples
      1. Quét toàn bộ vật thể trong ảnh với các Vision Foundation Models như Grounding Dino, SAM.
      2. xác định mục tiệu với instruction và ảnh được xử lý bởi VFM.
      3. Cắt viền tự động (Tạo Mask)
      4. (Soft Mask không hoàn toàn dùng phần masked thêm một lớp mờ để các đặc trung nhỏ nằm với background
      5. Kỹ thuật nhồi ảnh : tại các step lẻ ảnh được xử lý như nào vẫn giữ nguyên nhưng tại các step chãn khi được xử lý chỉ giữ lại các thay đổi trong vùng masked còn các vùng backfround vẫn giữ nguyên.
    2. Free-form Data Generation

      Ảnh Image làm anchor xử lý với mô hình SDXL U-net Denoising trained kết hợp caption của target với source qua Prompt to Prompt vào cross attention sinh ra 2 ảnh 1. Họ lấy bức ảnh chụp thật đưa qua bộ mã hóa (VAE) để nén thành dạng Latent. 2. Thay vì dùng nhiễu trắng (pure noise) như cách T2I thông thường, họ bơm một lượng nhiễu (noise) vừa đủ vào bức ảnh thật này . 3. Kết quả thu được là một ma trận nhiễu nhưng vẫn giữ lại mờ ảo cấu trúc hình khối, mảng màu của bức ảnh đời thật. 4. Ma trận nhiễu và Caption real được đưa vào SDXL xử lí rồi denoising theo Unet 5. Caption real với caption target tạo thạnh prompt to prompt rồi xử lí với Cross Attention với ma trận nhiệu có anchor 6. Sau dó quá trình đầu tiên diễn ra tạo ra đựợc ảnh soures mang đặc trưng của ảnh thực đảm bảo tính chân thực ảnh gốc và không có thiên kiến tiềm ẩn trong ảnh sources 7. quá trình thứ 2 diễn ra tương tự để sinh ảnh target nhưng khi đó họ dùng ảnh source làm anchor thây thế cho real image.

    3. Implicit biases in images
      1. Hạn chế gây ra bởi thành kiến tiềm ản trong các ảnh được tạo sinh lên ảnh ban đầu rồi tác động xấu lên ảnh target.
      2. Nguyên nhân cốt lõi gây ra Implicit Biases (Thiên kiến ngầm): Các nghiên cứu trước đó (như InstructPix2Pix) phụ thuộc hoàn toàn vào các mô hình Text-to-Image (T2I) đã được huấn luyện sẵn (pre-trained models). Do đó, sự phân phối và các đặc trưng của tập dữ liệu gốc (prior data) bị "in hằn" lên các ảnh được sinh ra. Thay vì phản ánh khách quan và bám sát vào Target Caption, các mẫu ảnh này lại bị kéo lệch về phía phong cách đặc trưng của T2I model (ví dụ: ảnh luôn có độ nét cao ảo tạo, ánh sáng cinematic, mang tính đồ họa...).
      3. Lỗ hổng từ phương pháp tiếp cận T2I (Text-to-Image): Để tạo ra các cặp dữ liệu chỉnh sửa, các phương pháp cũ sử dụng quy trình T2I thay vì I2I (Image-to-Image). Điều này đồng nghĩa với việc quá trình khử nhiễu (denoising) để tạo ra ảnh được bắt đầu hoàn toàn từ nhiễu trắng (pure noise).

      4. Việc tạo ảnh từ con số 0 khiến hệ thống thiếu vắng một bức ảnh thật làm "mỏ neo" (real image anchor). từ đó cho thiên kiến tiềm ẩn hoàn toàn thể hiện trên ảnh được sinh ra do đầu vào chỉ là nhiễu theo instruction thôithôi -> Prior data với T2I làm kích hoạt các thiên kiến tiềm ản trong ảnh trong quá trình tạo sinh.

    4. Instructions and Caption Generation
      1. Con Người viết Instruction từtừ cặp caption-image real
      2. Instruction mở rộng LLM
      3. các Instruction mở rộng đa dạng thành các sample In-Context
      4. cặp caption-Image real dùng caption làm caption source
      5. Caption source với sample in-context được tạo sinh vớ LLM khi đó mở rộng được thành editing Instruction với target caption.
    5. Is, It, Te, Im, Ts, Tt⟩
      1. Is : ảnh Source
      2. It : ảnh target
      3. Te : Instruction
      4. Im : ảnh region masked
      5. Ts : caption Source
      6. Tt : caption Target
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-03242R

      Corresponding author(s): Shinya Kuroda

      1. General Statements

      We appreciate the reviewers for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer's comments and have revised our manuscript accordingly.

      The reviewers' comments in this letter are in Bold and Italics.

      2. Point-by-point description of the revisions

      Response to Reviewer #1's Comments

      Evidence, reproducibility and clarity:

      Major comments

      1. This study leaves out lipid metabolism as a major energy metabolism pathway relevant to AD. The authors themselves cite the significance of acylcarnitines and CPT1A in AD (pg. 3, lines 32-33, pg. 4, lines 1-2). Lipid metabolism and homeostasis is known to be disrupted in AD1. Fatty acid oxidation is a known energy source in the prefrontal cortex2 and will also generate acetyl coA, which this study reveals is a significant decreased metabolite in AD. Furthermore, sphingomyelin emerges as one of the major decreased DEMs as well. Thus, lipid metabolism should be highlighted in Figure 3 and discussed throughout the manuscript; otherwise its omission should be clearly stated and justified.

      We appreciate the reviewer's insightful comment regarding a critical role of lipid metabolism in AD. We recognize that lipid metabolism is a metabolic pathway deeply involved in AD pathology (Baloni et al., 2022, 2020; Varma et al., 2021). Accordingly, we have revised the Limitations section to more strongly emphasize its role as a vital energy source (pg. 13, lines 15-17). Regarding the visualization of lipid metabolism, we extracted lipid-related pathway from the trans-omic network but found that the regulatory relationships among DEPs and DEMs were excessively complex and interconnected. Thus, interpreting this regulatory network seemed to be more challenging compared to the other energy production pathways presented in our manuscript. Therefore, we have concluded that the pathway analysis in our trans-omic network may not be suitable for deeply elucidating the lipid dysregulation in AD. We have added a statement acknowledging this as a limitation of our current methodology in the revised manuscript (pg. 13, lines 13-22).

      The covariates used for differential analysis should be discussed and justified. Notably, age is used as a covariate for transcriptomic analysis but not proteomic and metabolomic analysis, with no justification. Additionally, given the known importance of lipid metabolism in AD and the putative role of APOE in lipid homeostasis3, APOE genetic status should be considered as a covariate, or its omission should be justified.<br />

      We appreciate the reviewer's comment regarding the included covariates in differential analyses of our study. The reason we did not include other variables, such as age at death and RIN, is that these data were not available for each sample. Thus, we referred to the original research articles from which proteomic or metabolomic datasets used in our study were derived. Regarding the metabolomic dataset, in the original article (Batra et al., 2023), only two metabolites, 1-methyl-5-imidazoleacetate and N6-carboxymethyllysine, were significantly associated with age. In addition, no metabolites were significantly associated with sex, BMI, and years of education. Regarding the proteomic dataset, in the original article (Johnson et al., 2020), age at death, PMI, and sex were included as covariates in the analyses, though these variables were not found to strongly influence the data (Extended Data Fig.2 in (Johnson et al., 2020)).

      The authors make a conclusion statement that suggests intervention: "Collectively, our data suggests that preserving or improving the ability to produce ATP and early intervention in the process of nitrogen metabolism are candidates for the prevention and treatment of dementia" (pg. 12, lines 12-14). This claim is not well-supported by the evidence provided in the study. There are a few limitations: (a) This was an observational, not interventional study; (b) The study did not establish whether the metabolic disruptions are causes or effects in AD; and (c) ATP or other bioenergetic indicators were not directly measured. Therefore, any statements about potential interventions should be removed or qualified as highly speculative.

      We agree with the reviewer that the statement regarding potential interventions was not sufficiently supported by our analyses. Accordingly, we have removed the sentence regarding prevention and treatment from the revised manuscript (e.g., we have deleted final paragraph of the previous manuscript).

      In conjunction with the last point, the main conclusion of the study is that energy production is down in AD. The data presented in Figure 3 are consistent with this conclusion, but it is far from definitive due to limitations stated above in comments 3a and 3b. The authors should offer additional support for this conclusion: experimental follow-up, flux modeling, analysis of alternative datasets with ATP measurement, causal inference.<br />

      We sincerely thank the reviewer for this valuable and constructive suggestion. Regarding flux modeling, we agree that metabolic flux analysis could provide important mechanistic insight. Indeed, previous studies have applied flux modeling in the context of lipid metabolism in Alzheimer's disease (Baloni et al., 2022). We also attempted to perform flux modeling focusing on energy metabolism. However, we found it difficult to obtain biologically meaningful and robust results and therefore decided not to include these analyses in the current manuscript.

      With respect to ATP measurements, we fully agree that direct evidence of altered ATP levels would further strengthen our conclusion. However, to the best of our knowledge, there are currently no publicly available large-scale datasets that directly measure ATP levels in human postmortem brain tissues. This limitation makes it challenging to incorporate validation in the present study.

      Regarding experimental follow-up, we agree that functional validation is essential to confirm the mechanistic implications of our findings. We are actively considering follow-up experimental studies. However, we consider the present work to be a multi-omic integrative analysis aimed at identifying key molecular alterations and generating biologically important hypotheses. We have revised the Limitation section to more clearly position this manuscript as an observational systems-level analysis (pg. 13, lines 20-22).

      The validation analysis did not sufficiently show the generalizability of this study's results. The authors demonstrated a correlation of 0.53 to the MSBB transcriptomics data and 0.60 to the AMP-AD DiverseCohorts proteomics data. Beyond these correlation coefficients, no meaningful comparison between the datasets is offered. How concordant are the differentially expressed features (or pathways) between the datasets? How robust would the trans-omic network be if incorporating the alternate datasets? Is the main conclusion (energy metabolism is down in AD) supported by the validation datasets? We think this analysis should be expanded and described in the main text. Although the results for external metabolomics datasets are reported in Fig S2C, correlation coefficients with the external data are not reported. The authors state, "Note that each study used different definitions for AD and CT groups, had variations in measurement methods and brain regions analyzed." We appreciate these limitations. However, the external data should be re-analyzed using the same definitions of AD and CT, if possible. The limitations and results (which DEMs are shared between datasets) should be discussed in the main text. __

      We thank the reviewer for this important comment regarding the generalizability of our findings. In the revised manuscript, we have expanded the validation analyses and summarized the results in Figure S2. First, at the transcriptomic level, Figure S2B and S2C show the overlap between up- and downregulated genes in AD identified in our ROSMAP-derived analyses and those reported in a previously published large-scale meta-analysis of 2,114 postmortem samples across seven brain regions (Wan et al., 2020). A substantial proportion of DEGs were shared, supporting cross-cohort and cross-region robustness to some extent. At the proteomic level, Figure S2E shows a comparison between the ROSMAP and the AMP-AD DiverseCohorts datasets. We highlighted the subset of enzymes involved in the energy metabolism analysis shown in Fig. 3 and calculated a separate correlation coefficient for this subset (Pearson coefficient = 0.86, p-value = 1.5e-7), further supporting our main conclusion. In addition, to assess the concordance between the two datasets in a threshold-independent manner, we additionally performed Rank-Rank Hypergeometric Overlap (RRHO) analysis (Figure S2E). RRHO analysis (Cahill et al., 2018; Plaisier et al., 2010) enables the comparison of ranked protein lists without relying on arbitrary differential expression cutoffs and has been used for cross-dataset comparison in several previous studies (Fröhlich et al., 2024; Maitra et al., 2023). The RRHO heatmaps demonstrated significant enrichment in the concordant quadrants, confirming systematic agreement between datasets beyond simple correlation coefficients. For metabolomics, Figure S2G shows RRHO analyses comparing the ROSMAP metabolomic data with other datasets measured by the same UPLC-MS/MS platform (Batra et al., 2024; Novotny et al., 2023), demonstrating significant concordance in ranked metabolite changes in AD.

      The glycolysis analysis and discussion needs more development. Glycolysis and gluconeogenesis share many of the same enzymes, but they are not the same pathway and should not be discussed as such. To make a claim about the overall influence of enzyme and metabolite levels on glycolysis, the authors should focus on the energetically committing steps of glycolysis (hexokinase, phosphofructokinase, pyruvate kinase) in Figure 3A, and include the full/current version of the figure in the supplement. Gluconeogenesis-specific enzymes (pyruvate carboxylase, PEPCK) are not mentioned at all - are they among the DEPs/DEGs?<br />

      We appreciate the reviewer's comment regarding the distinction between glycolysis and gluconeogenesis pathway. Among the gluconeogenesis-specific enzyme proteins, G6PC1, FBP1, PC, and PCK2 were measured in our dataset, but none of them were identified as DEPs. In addition, gluconeogenesis is a process that occurs primarily in the liver and kidney rather than the brain. Given this biological context and the lack of significant changes in relevant enzymes, we have revised the terminology throughout the manuscript, replacing "glycolysis/gluconeogenesis pathway" with "glycolysis pathway" in the revised version.

      Given that there wasn't good concordance between the DEGs and DEPs, did including the mRNA and transcription factor layers in the network really add anything useful? It seems like the main conclusions of the manuscript were driven by the protein and metabolite layers only. How many of the DE metabolic enzymes were coregulated at the transcript and protein level? It would be useful to include the 5-layer trans-omic network in the supplement to display these results. Given your network, at what level does it appear that energy metabolism is regulated?<br />

      It is true that our primary conclusion regarding the regulation of energy metabolism is driven by the changes in protein and metabolite abundance. However, we consider the low concordance between mRNA and protein expression itself to be an important feature of AD pathology, as also reported in previous studies (Johnson et al., 2022; Tasaki et al., 2022). Although we did not perform a further analysis of this discordance, we believe that including the TF and mRNA layers into the metabolic trans-omic network strengthens a system-wide view of metabolic dysregulation in AD.

      Regarding the mRNA changes corresponding to the DEP enzymes, please refer to Figure S7A.

      Comment further on the results from Figure 2D. What can be learned from identifying metabolites with the greatest degree centrality? What pathways other than energy metabolism are highlighted by the trans-omic network?<br />

      We assume that some energetic indicators, including AMP and acetyl-CoA, and nitrogen metabolism-related metabolites, Glu, 2-oxoglutarate, and urea, can be potential key regulators of dysregulated metabolism in AD.

      (Suggestion) We suggest the authors leverage their trans-omic network in additional ways beyond giving a snapshot of a few energy metabolism pathways. The analysis of top DEMs could go further. What pathways are impacted beyond energy metabolism? Among the metabolic reactions allosterically regulated by top DEMs, what metabolic pathways are enriched?<br />

      We identified the enriched metabolic pathways that were allosterically regulated by DEMs in AD using Fisher's exact test. Alanine, aspartate, and glutamate metabolism pathways were significantly enriched in 2-oxoglutarate, glutarate, alanine, and glutamate-regulating metabolic reactions. Arginine and proline metabolism pathway was enriched in N-methyl-L-arginine and putrescine-regulating metabolic reactions. Arginine biosynthesis pathway was enriched in arginine-regulating metabolic reactions. Glycerophospholipid metabolism pathway was enriched in CDP-ethanolamine-regulating metabolic reactions. Glycine, serine, and threonine metabolism pathway was enriched in serine-regulating metabolic reactions. Purine metabolism pathway was enriched in AMP-regulating metabolic reactions. Pyrimidine metabolism pathway was enriched in deoxyuridine and thymidine-regulating metabolic reactions. Sphingolipid metabolism pathway was enriched in sphingosine-regulating metabolic reactions. However, this analysis did not yield sufficiently valuable insights into the regulatory relationships among biomolecules in AD. Thus, we did not include these results in the revised manuscript.

      (Suggestion) Figure 3 shows that most differential signal in AD points to lower energy production due to the combination of differentially expressed metabolites and enzymes, but we are not given much context about the strength of these among all the differential signals. We would suggest including volcano plots where the features of interest, i.e. DE enzymes and metabolites, are colored differently (or a similar figure).<br />

      We thank the reviewer for this constructive suggestion. To provide better context regarding the importance of the differential signals, we have added volcano plots for mRNAs, proteins, and metabolites in Figure S4A, B, and C.

      (Suggestion) The PPI network could be better leveraged to understand metabolic changes in AD. If nodes are grouped into subnetworks (e.g. by Louvain / Leiden clustering) and tested for pathway enrichment, could you find functional subnetworks of coordinately up- and down- regulated metabolic enzymes? This could yield some pathways of interest beyond the energy metabolism pathways already highlighted.<br />

      We appreciate the reviewer's suggestion to utilize the PPI network for subnetwork analysis. However, it is important to note that the proteomic dataset analyzed in this study is derived from the original work of (Johnson et al., 2020). In that paper, the authors already performed a Weighted Gene Co-expression Network Analysis (WGCNA) across several datasets to identify co-expressed modules and functional pathways.

      Given this, we assumed that applying additional clustering methods to the same dataset would be unlikely to yield significant biological insights beyond the established findings.

      __ ____Minor comments __

      12. "All genes" and "all metabolites" should not be the background for the proteomic and metabolic pathway enrichment analysis by Metascape and MetaboAnalyst. The background should be limited to the proteins and metabolites that were measured.

      We fully agree with the reviewer that using "all gene" or "all metabolites" as a background is not suitable for enrichment analyses. As suggested, we have revised the enrichment analyses using the measured proteins and metabolites as a background in both Metascape and MetaboAnalyst (Fig. S4D).

      Highlight the metabolic enzymes in Fig S2B. Calculate a separate correlation coefficient for the enzymes extracted in the energy metabolism analysis from Fig 3.<br />

      We appreciate the reviewer's suggestion to refine the correlation analysis. As requested, we have revised Fig. S2D to explicitly highlight the subset of enzymes involved in the energy metabolism analysis shown in Fig. 3. We calculated a separate correlation coefficient for the subset (Pearson coefficient = 0.86, p-value = 1.5e-7).

      Use a multiple hypothesis adjusted p-value or q-value in Figure S3.<br />

      We agree with the reviewer regarding the necessity of correcting for multiple comparisons. Accordingly, we have revised Fig. S4D using q-values.

      Describe the methods used to calculate the logFC values from the validation dataset.<br />

      We have revised the Methods to include a detailed description of the procedure used to calculate the log2FC values for the validation datasets (pg. 21, lines 13-15).

      It is difficult to read Figure 3. We would recommend really emphasizing to the reader to refer to Fig S7B as a "key" to this figure. The description of the red/blue arrows and nodes in the methods section (pg. 24, lines 21-36, pg 25, lines 1-4) were also helpful, but very lengthy. We recommend putting an abridged version of this description into the Fig S7 figure legend.<br />

      We appreciate the feedback regarding the readability of Fig. 3. As recommended, we have revised the manuscript to explicitly direct readers to Fig. S8B as an essential "key" for interpreting the network visualization (pg. 8, lines 28). Furthermore, we have added an abridged description of the network elements to the legend of Fig. S8B.

      The S7 figure legend should refer to panels A and B, not E and F.<br />

      We apologize for this oversight. We have corrected the legend of Fig. S8.

      (Suggestion) Are any of the differentially expressed metabolites allosteric regulators of the DE transcription factors? This could be interesting to discuss.<br />

      We appreciate the reviewer's insightful suggestion about the potential allosteric regulation of the DETFs by DEMs. We conducted an extensive literature search to identify any reports related to this perspective. However, to the best of our knowledge, no such direct interactions have been reported to date.

      Significance:

      The study's strength lies in leveraging three omics modalities across large patient cohorts (n ~ 150-240) to identify coherent signals between transcriptomics, proteomics, and metabolomics in postmortem DLPFC tissue. It was encouraging to see that the main result, showing downregulation for TCA, oxidative phosphorylation, and ketone body metabolism, emerged from consistent signals across both proteomics and metabolomics. This result was consistent with previous findings in other models cited by the author4,5 and other studies 6,7 demonstrating deficiency in energy-producing pathways in AD. Another strength of the study is the application of thoughtful methodology to connect differentially expressed proteins and metabolites via an intermediate data layer of metabolic reactions. The authors leverage the KEGG and BRENDA databases and apply sound logic to estimate the effects of enzyme level and metabolite level on pathway activity, with metabolites serving as substrate, product, or allosteric regulator for reactions. This trans-omic network methodology was developed in previous studies cited by the author8,9. However, as written, this study is limited in its contribution of new knowledge to the AD research field. The main conclusion (energy production is down in AD, due to regulatory disruption of energy metabolism) is not strongly supported (see comments 1, 3, and 4 for elaboration). The evidence could be improved by orthogonal approaches: further experimentation, further integration of external datasets, causal modeling, or flux modeling. Alternatively, even in the absence of new experimental and computational approaches, the story could be made more complete by further leveraging the trans-omic network to provide insights into (a) the regulation of energy metabolism; and (b) the impacts of key disrupted metabolites (see comments 7-9). The study is also limited in its demonstrating the power of these methodologies to provide integrative insights. As mentioned above, the integration of enzyme levels and metabolite levels is clearly useful (Figure 3). In contrast, the utility of the mRNA and transcription factor layers was not evident. The study did not appear to improve or expand upon trans-omic network methodology described in the previous works. Finally, the various analyses (analyzing the trans-omic network for nodes with the highest degree centrality, the PPI analysis, and viewing the energy metabolism pathways in the network) provided disparate results that were only tenuously connected in the discussion section.


      Response to Reviewer #2's Comments____

      Evidence, reproducibility and clarity: Summary

      This manuscript integrates public transcriptomic, proteomic, and metabolomic datasets from ROSMAP DLPFC samples to construct a multi-layer metabolic trans-omic network in Alzheimer's disease. By linking transcription factors, enzyme mRNAs, proteins, metabolic reactions, and metabolites, the authors report coordinated downregulation of the TCA cycle, oxidative phosphorylation, and ketone body metabolism, along with mixed regulatory signals in glycolysis/gluconeogenesis. They interpret these patterns as indicative of broad energetic dysfunction and alterations in amino-acid/nitrogen metabolism in AD. While the framework is conceptually appealing, much of the analysis remains descriptive, and several biological interpretations extend beyond what the data can robustly support. The reliance on bulk tissue without accounting for cell-type composition, limited covariate adjustment, and the absence of validation or sensitivity analyses reduce confidence in the mechanistic conclusions. Overall, the study provides a preliminary systems-level overview, but additional rigor is needed before the proposed trans-omic regulatory insights can be considered convincing.

      Major Comments

      1. Interpretation requires more cautious phrasing, and validation is essential. The manuscript frequently asserts that specific pathways are "inhibited" or that energetic deficits are "compensated," but these conclusions extend beyond what the descriptive, bulk-level data can support. Because no metabolic flux, causality, or direct functional measurements are included, the results should be framed as putative regulatory shifts, not confirmed impairments. Critically, key claims about pathway inhibition would require flux modeling, perturbation analyses, or experimental validation to be convincing. Without such validation, the mechanistic interpretations remain speculative.

      We thank the reviewer for this crucial comment. We fully agree that, given the descriptive and bulk-level nature of our analysis, mechanistic interpretations must be made with caution. In the absence of direct metabolic flux measurements or experimental validation, our findings should be interpreted as putative regulatory shifts rather than confirmed functional impairments. Accordingly, we have revised the manuscript to temper mechanistic claims. We have replaced definitive statements with more speculative phrasing (e.g., "Our analysis revealed a putative coordinated downregulation ..." instead of "Our analysis revealed a coordinated downregulation ..." in Abstract section; "we demonstrate the systems-level view of the potential dysregulated energy production ..." instead of "we demonstrate the systems-level view of the dysregulated energy production ..." in pg. 10, lines 25-26).

      Although the authors acknowledge this in the limitations, bulk-level differences may primarily reflect altered proportions of neurons, astrocytes, microglia, and oligodendrocytes rather than true within-cell-type regulation. Incorporating a cell-type deconvolution or performing a sensitivity analysis would substantially improve interpretability. This issue also impacts the trans-omic network: if the molecules included originate from different cell types, the inferred regulatory relationships may not reflect true intracellular processes.

      We appreciate the reviewer's point that bulk-level differences can reflect altered proportions of different brain cell types, subsequently affecting the inferred trans-omic network analysis. To assess the changes in cell type proportions of the samples that we used in our study, we additionally used public single-cell transcriptomic datasets, which were obtained from DLPFC tissue of 465 subjects in the ROSMAP cohort (Green et al., 2024). For each omic data that we used in our analyses, we matched the same subjects and calculated the following cell type proportions, astrocytes, excitatory neurons, inhibitory neurons, microglias, oligodendrocytes, and OPCs. Then, we statistically compared the cell type proportions between control subjects and patients with AD (Fig. S3). In the transcriptomic data, we confirmed that the proportion of inhibitory neurons in the AD group was smaller than in the CT group, and that the proportion of oligodendrocytes in the AD group was larger than in the CT group. In the proteomic data, we did not observe any statistically significant changes in the cell type proportion between the two group. In the metabolomic data, we found that the proportion of inhibitory neurons in the AD group was smaller than in the CT group (pg. 6, lines 8-11).

      Differential analysis covariates. For the differential expression analyses, only gender and PMI were included as covariates. Additional variables, such as age at death, RIN, neuropathological measures, and comorbidities, can strongly influence molecular profiles and should be considered to ensure that the observed differences reflect AD-related biology rather than confounding pathological or technical factors.

      We appreciate the reviewer's comment regarding the included covariates in differential analyses of our study. The reason we did not include other variables, including age at death and RIN, is that these data for each sample were not available. Thus, we referred to original research articles from which proteomic or metabolomic datasets used in our study were derived. Regarding the metabolomic dataset, in the original article (Batra et al., 2023), only two metabolites, 1-methyl-5-imidazoleacetate and N6-carboxymethyllysine, were significantly associated with age. In addition, no metabolites were significantly associated with sex, BMI, or education. Regarding the proteomic dataset, in the original article, age at death, PMI, and sex were included as covariates in the analyses, though these variables were not found to strongly influence the data (Extended Data Fig.2 in (Johnson et al., 2020)).

      Network stability and sample non-overlap. Proteomic, transcriptomic, and metabolomic data come from partially overlapping individuals. The authors should test whether the reconstructed network is robust to: different significance thresholds, restricting analyses to overlapping samples and alternative definitions of AD vs control.

      __ __We appreciate the reviewer's comment for the trans-omic network stability. In our study, the number of individuals for whom all omic modalities were measured was relatively small (n=25 in CT and n=35 in AD). This limited overlap reduces statistical power and can affect the downstream network construction. We have acknowledged this limitation in the revised manuscript and clarified that the reconstructed networks should be interpreted with caution regarding reproducibility and generalizability (pg. 13, lines 13-23).

      Minor Comments

      1. Some TF enrichment and regulatory inferences lack explicit mention of multiple-testing correction.

      We apologize for the lack of clarity in our original description. We have corrected for multiple-testing for the TF inference. Thus, we have revised the Methods section to explicitly describe the correction method used and the threshold applied (pg. 23, lines 23-24).

      The limitations section is strong but should explicitly discuss the influence of postmortem interval on metabolite levels.<br />

      We appreciate the reviewer's comment about the effect of postmortem interval on changes in metabolite levels. Accordingly, we have added the description of this perspective in our revised manuscript (pg. 13, lines 1-5).

      __*Reviewer #2 (Significance (Required)):

      Significance *__

      The study extends a trans-omic integration framework, originally applied to metabolic disease, into the context of Alzheimer's pathology. Although the biological findings largely confirm known alterations in mitochondrial and energy metabolism, the network-based approach offers a structured way to view cross-layer regulatory changes. Its main advance is conceptual rather than biological, providing a unified framework rather than uncovering fundamentally new mechanisms. This work will primarily interest researchers in neurodegeneration and systems biology, as well as computational groups developing multi-omics integration methods.

      Response to Reviewer #3's Comments


      Evidence, reproducibility and clarity

      This study leverages existing transcriptomic, metabalomic and proteomic datasets from prefrontal cortex (PFC) to assess metabolic dysregulation in Alzheimer's disease (AD). They found a downregulation of multiple metabolic pathways, including TCA cycle, oxidative phosphorylation, and ketone metabolism, that may explain bioenergetic alterations in AD. The study used matching ROSMAP omics datasets from the DLPFC that have allowed more robust data integration. However, the datasets are all generated using bulk tissue, which makes data interpretation difficult. For example, the AD changes they observed may be due to shifts in cell type proportion with disease (e.g. cell death, neuron inflammation). Did the authors account for any potential shifts in cell type proportion in their analysis?* *

      __If the assumption is that the changes in AD are cell intrinsic, which cell types are likely to be impacted? Can the authors integrate any existing single-cell analysis to infer which cell types may be driving the signals they detect, and whether this accounts for some of the antagonistic regulatory effects that were detected?______

      We thank the reviewer for their insightful comments. We agree that the use of bulk tissue datasets cannot account for cell-type heterogeneity. As noted in our Limitations section (pg. 12, lines 24-27), we recognize that previous studies have found that the Braak stage is correlated positively with microglia and astrocyte proportions and negatively with oligodendrocyte proportion (Hannon et al., 2024; Shireby et al., 2022). Regarding the integration of single-cell analysis, we have referenced recent snRNA-seq findings (Mathys et al., 2024) in our Limitations section (pg. 12, lines 28-32) to deconvolve our bulk signatures.

      Furthermore, in our revised manuscript, we additionally used public single-cell transcriptomic datasets, which were obtained from DLPFC tissue of 465 subjects in the ROSMAP cohort (Green et al., 2024). For each omic data that we used in our analyses, we matched the same subjects and calculated the following cell type proportions, astrocytes, excitatory neurons, inhibitory neurons, microglia, oligodendrocytes, and OPCs. Then, we statistically compared the cell type proportions between control subjects and patients with AD (Fig. S3). In the transcriptomic data, we confirmed that the proportion of inhibitory neurons in the AD group was smaller than in the CT group, and that the proportion of oligodendrocytes in the AD group was larger than in the CT group. In the proteomic data, we did not observe any statistically significant changes in the cell type proportion between the two groups. In the metabolomic data, we found that the proportion of inhibitory neurons in the AD group was smaller than in the CT group (pg. 6, lines 8-11).

      Significance

      The manuscript provides multimodal insight into metabolic dysregulation in AD in the PFC. Given that metabolic dysfunction is likely to play a major in disease pathogenesis, this is a study of importance. However, the findings lack granularity at the cell type level, which limits the impact of the study.

      Reference

      1. Baloni, P., Arnold, M., Buitrago, L., Nho, K., Moreno, H., Huynh, K., Brauner, B., Louie, G., Kueider-Paisley, A., Suhre, K., Saykin, A. J., Ekroos, K., Meikle, P. J., Hood, L., Price, N. D., Alzheimer's Disease Metabolomics Consortium, Doraiswamy, P. M., Funk, C. C., Hernández, A. I., ... Kaddurah-Daouk, R. (2022). Multi-Omic analyses characterize the ceramide/sphingomyelin pathway as a therapeutic target in Alzheimer's disease. Communications Biology, 5(1), 1074.
      2. Baloni, P., Funk, C. C., Yan, J., Yurkovich, J. T., Kueider-Paisley, A., Nho, K., Heinken, A., Jia, W., Mahmoudiandehkordi, S., Louie, G., Saykin, A. J., Arnold, M., Kastenmüller, G., Griffiths, W. J., Thiele, I., Alzheimer's Disease Metabolomics Consortium, Kaddurah-Daouk, R., & Price, N. D. (2020). Metabolic Network Analysis Reveals Altered Bile Acid Synthesis and Metabolism in Alzheimer's Disease. Cell Reports. Medicine, 1(8), 100138.
      3. Batra, R., Arnold, M., Wörheide, M. A., Allen, M., Wang, X., Blach, C., Levey, A. I., Seyfried, N. T., Ertekin-Taner, N., Bennett, D. A., Kastenmüller, G., Kaddurah-Daouk, R. F., Krumsiek, J., & Alzheimer's Disease Metabolomics Consortium (ADMC). (2023). The landscape of metabolic brain alterations in Alzheimer's disease. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 19(3), 980-998.
      4. Batra, R., Krumsiek, J., Wang, X., Allen, M., Blach, C., Kastenmüller, G., Arnold, M., Ertekin-Taner, N., Kaddurah-Daouk, R., & Alzheimer's Disease Metabolomics Consortium (ADMC). (2024). Comparative brain metabolomics reveals shared and distinct metabolic alterations in Alzheimer's disease and progressive supranuclear palsy. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 20(12), 8294-8307.
      5. Cahill, K. M., Huo, Z., Tseng, G. C., Logan, R. W., & Seney, M. L. (2018). Improved identification of concordant and discordant gene expression signatures using an updated rank-rank hypergeometric overlap approach. Scientific Reports, 8(1), 9588.
      6. Fröhlich, A. S., Gerstner, N., Gagliardi, M., Ködel, M., Yusupov, N., Matosin, N., Czamara, D., Sauer, S., Roeh, S., Murek, V., Chatzinakos, C., Daskalakis, N. P., Knauer-Arloth, J., Ziller, M. J., & Binder, E. B. (2024). Single-nucleus transcriptomic profiling of human orbitofrontal cortex reveals convergent effects of aging and psychiatric disease. Nature Neuroscience, 27(10), 2021-2032.
      7. Green, G. S., Fujita, M., Yang, H.-S., Taga, M., Cain, A., McCabe, C., Comandante-Lou, N., White, C. C., Schmidtner, A. K., Zeng, L., Sigalov, A., Wang, Y., Regev, A., Klein, H.-U., Menon, V., Bennett, D. A., Habib, N., & De Jager, P. L. (2024). Cellular communities reveal trajectories of brain ageing and Alzheimer's disease. Nature, 633(8030), 634-645.
      8. Hannon, E., Dempster, E. L., Davies, J. P., Chioza, B., Blake, G. E. T., Burrage, J., Policicchio, S., Franklin, A., Walker, E. M., Bamford, R. A., Schalkwyk, L. C., & Mill, J. (2024). Quantifying the proportion of different cell types in the human cortex using DNA methylation profiles. BMC Biology, 22(1), 17.
      9. Johnson, E. C. B., Carter, E. K., Dammer, E. B., Duong, D. M., Gerasimov, E. S., Liu, Y., Liu, J., Betarbet, R., Ping, L., Yin, L., Serrano, G. E., Beach, T. G., Peng, J., De Jager, P. L., Haroutunian, V., Zhang, B., Gaiteri, C., Bennett, D. A., Gearing, M., ... Seyfried, N. T. (2022). Large-scale deep multi-layer analysis of Alzheimer's disease brain reveals strong proteomic disease-related changes not observed at the RNA level. Nature Neuroscience, 25(2), 213-225.
      10. Johnson, E. C. B., Dammer, E. B., Duong, D. M., Ping, L., Zhou, M., Yin, L., Higginbotham, L. A., Guajardo, A., White, B., Troncoso, J. C., Thambisetty, M., Montine, T. J., Lee, E. B., Trojanowski, J. Q., Beach, T. G., Reiman, E. M., Haroutunian, V., Wang, M., Schadt, E., ... Seyfried, N. T. (2020). Large-scale proteomic analysis of Alzheimer's disease brain and cerebrospinal fluid reveals early changes in energy metabolism associated with microglia and astrocyte activation. Nature Medicine, 26(5), 769-780.
      11. Maitra, M., Mitsuhashi, H., Rahimian, R., Chawla, A., Yang, J., Fiori, L. M., Davoli, M. A., Perlman, K., Aouabed, Z., Mash, D. C., Suderman, M., Mechawar, N., Turecki, G., & Nagy, C. (2023). Cell type specific transcriptomic differences in depression show similar patterns between males and females but implicate distinct cell types and genes. Nature Communications, 14(1), 2912.
      12. Mathys, H., Boix, C. A., Akay, L. A., Xia, Z., Davila-Velderrain, J., Ng, A. P., Jiang, X., Abdelhady, G., Galani, K., Mantero, J., Band, N., James, B. T., Babu, S., Galiana-Melendez, F., Louderback, K., Prokopenko, D., Tanzi, R. E., Bennett, D. A., Tsai, L.-H., & Kellis, M. (2024). Single-cell multiregion dissection of Alzheimer's disease. Nature, 632(8026), 858-868.
      13. Novotny, B. C., Fernandez, M. V., Wang, C., Budde, J. P., Bergmann, K., Eteleeb, A. M., Bradley, J., Webster, C., Ebl, C., Norton, J., Gentsch, J., Dube, U., Wang, F., Morris, J. C., Bateman, R. J., Perrin, R. J., McDade, E., Xiong, C., Chhatwal, J., ... Harari, O. (2023). Metabolomic and lipidomic signatures in autosomal dominant and late-onset Alzheimer's disease brains. Alzheimer's & Dementia: The Journal of the Alzheimer's Association, 19(5), 1785-1799.
      14. Plaisier, S. B., Taschereau, R., Wong, J. A., & Graeber, T. G. (2010). Rank-rank hypergeometric overlap: identification of statistically significant overlap between gene-expression signatures. Nucleic Acids Research, 38(17), e169.
      15. Shireby, G., Dempster, E. L., Policicchio, S., Smith, R. G., Pishva, E., Chioza, B., Davies, J. P., Burrage, J., Lunnon, K., Seiler Vellame, D., Love, S., Thomas, A., Brookes, K., Morgan, K., Francis, P., Hannon, E., & Mill, J. (2022). DNA methylation signatures of Alzheimer's disease neuropathology in the cortex are primarily driven by variation in non-neuronal cell-types. Nature Communications, 13(1), 5620.
      16. Tasaki, S., Xu, J., Avey, D. R., Johnson, L., Petyuk, V. A., Dawe, R. J., Bennett, D. A., Wang, Y., & Gaiteri, C. (2022). Inferring protein expression changes from mRNA in Alzheimer's dementia using deep neural networks. Nature Communications, 13(1), 655.
      17. Varma, V. R., Wang, Y., An, Y., Varma, S., Bilgel, M., Doshi, J., Legido-Quigley, C., Delgado, J. C., Oommen, A. M., Roberts, J. A., Wong, D. F., Davatzikos, C., Resnick, S. M., Troncoso, J. C., Pletnikova, O., O'Brien, R., Hak, E., Baak, B. N., Pfeiffer, R., ... Thambisetty, M. (2021). Bile acid synthesis, modulation, and dementia: A metabolomic, transcriptomic, and pharmacoepidemiologic study. PLoS Medicine, 18(5), e1003615.
      18. Wan, Y.-W., Al-Ouran, R., Mangleburg, C. G., Perumal, T. M., Lee, T. V., Allison, K., Swarup, V., Funk, C. C., Gaiteri, C., Allen, M., Wang, M., Neuner, S. M., Kaczorowski, C. C., Philip, V. M., Howell, G. R., Martini-Stoica, H., Zheng, H., Mei, H., Zhong, X., ... Logsdon, B. A. (2020). Meta-Analysis of the Alzheimer's Disease Human Brain Transcriptome and Functional Dissection in Mouse Models. Cell Reports, 32(2), 107908.
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      This manuscript integrates public transcriptomic, proteomic, and metabolomic datasets from ROSMAP DLPFC samples to construct a multi-layer metabolic trans-omic network in Alzheimer's disease. By linking transcription factors, enzyme mRNAs, proteins, metabolic reactions, and metabolites, the authors report coordinated downregulation of the TCA cycle, oxidative phosphorylation, and ketone body metabolism, along with mixed regulatory signals in glycolysis/gluconeogenesis. They interpret these patterns as indicative of broad energetic dysfunction and alterations in amino-acid/nitrogen metabolism in AD. While the framework is conceptually appealing, much of the analysis remains descriptive, and several biological interpretations extend beyond what the data can robustly support. The reliance on bulk tissue without accounting for cell-type composition, limited covariate adjustment, and the absence of validation or sensitivity analyses reduce confidence in the mechanistic conclusions. Overall, the study provides a preliminary systems-level overview, but additional rigor is needed before the proposed trans-omic regulatory insights can be considered convincing.

      Major Comments

      1. Interpretation requires more cautious phrasing, and validation is essential. The manuscript frequently asserts that specific pathways are "inhibited" or that energetic deficits are "compensated," but these conclusions extend beyond what the descriptive, bulk-level data can support. Because no metabolic flux, causality, or direct functional measurements are included, the results should be framed as putative regulatory shifts, not confirmed impairments. Critically, key claims about pathway inhibition would require flux modeling, perturbation analyses, or experimental validation to be convincing. Without such validation, the mechanistic interpretations remain speculative.
      2. Although the authors acknowledge this in the limitations, bulk-level differences may primarily reflect altered proportions of neurons, astrocytes, microglia, and oligodendrocytes rather than true within-cell-type regulation. Incorporating a cell-type deconvolution or performing a sensitivity analysis would substantially improve interpretability. This issue also impacts the trans-omic network: if the molecules included originate from different cell types, the inferred regulatory relationships may not reflect true intracellular processes.
      3. Differential analysis covariates. For the differential expression analyses, only gender and PMI were included as covariates. Additional variables, such as age at death, RIN, neuropathological measures, and comorbidities, can strongly influence molecular profiles and should be considered to ensure that the observed differences reflect AD-related biology rather than confounding pathological or technical factors.
      4. Network stability and sample non-overlap. Proteomic, transcriptomic, and metabolomic data come from partially overlapping individuals. The authors should test whether the reconstructed network is robust to: different significance thresholds, restricting analyses to overlapping samples and alternative definitions of AD vs control.

      Minor Comments

      1. Some TF enrichment and regulatory inferences lack explicit mention of multiple-testing correction.
      2. The limitations section is strong but should explicitly discuss the influence of postmortem interval on metabolite levels.

      Significance

      The study extends a trans-omic integration framework, originally applied to metabolic disease, into the context of Alzheimer's pathology. Although the biological findings largely confirm known alterations in mitochondrial and energy metabolism, the network-based approach offers a structured way to view cross-layer regulatory changes. Its main advance is conceptual rather than biological, providing a unified framework rather than uncovering fundamentally new mechanisms. This work will primarily interest researchers in neurodegeneration and systems biology, as well as computational groups developing multi-omics integration methods.

      Reviewer expertise

      My background is in Alzheimer's disease, multi-omics integration, and computational systems biology. I am not a specialist in enzymology or allosteric regulation and therefore cannot fully evaluate the biochemical specificity of those annotations.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Major comments

      1. This study leaves out lipid metabolism as a major energy metabolism pathway relevant to AD. The authors themselves cite the significance of acylcarnitines and CPT1A in AD (pg. 3, lines 32-33, pg. 4, lines 1-2). Lipid metabolism and homeostasis is known to be disrupted in AD1. Fatty acid oxidation is a known energy source in the prefrontal cortex2 and will also generate acetyl coA, which this study reveals is a significant decreased metabolite in AD. Furthermore, sphingomyelin emerges as one of the major decreased DEMs as well. Thus, lipid metabolism should be highlighted in Figure 3 and discussed throughout the manuscript; otherwise its omission should be clearly stated and justified.
      2. The covariates used for differential analysis should be discussed and justified. Notably, age is used as a covariate for transcriptomic analysis but not proteomic and metabolomic analysis, with no justification. Additionally, given the known importance of lipid metabolism in AD and the putative role of APOE in lipid homeostasis3, APOE genetic status should be considered as a covariate, or its omission should be justified.
      3. The authors make a conclusion statement that suggests intervention: "Collectively, our data suggests that preserving or improving the ability to produce ATP and early intervention in the process of nitrogen metabolism are candidates for the prevention and treatment of dementia" (pg. 12, lines 12-14). This claim is not well-supported by the evidence provided in the study. There are a few limitations: (a) This was an observational, not interventional study; (b) The study did not establish whether the metabolic disruptions are causes or effects in AD; and (c) ATP or other bioenergetic indicators were not directly measured. Therefore, any statements about potential interventions should be removed or qualified as highly speculative.
      4. In conjunction with the last point, the main conclusion of the study is that energy production is down in AD. The data presented in Figure 3 are consistent with this conclusion, but it is far from definitive due to limitations stated above in comments 3a and 3b. The authors should offer additional support for this conclusion: experimental follow-up, flux modeling, analysis of alternative datasets with ATP measurement, causal inference..
      5. The validation analysis did not sufficiently show the generalizability of this study's results. The authors demonstrated a correlation of 0.53 to the MSBB transcriptomics data and 0.60 to the AMP-AD DiverseCohorts proteomics data. Beyond these correlation coefficients, no meaningful comparison between the datasets is offered. How concordant are the differentially expressed features (or pathways) between the datasets? How robust would the trans-omic network be if incorporating the alternate datasets? Is the main conclusion (energy metabolism is down in AD) supported by the validation datasets? We think this analysis should be expanded and described in the main text.

      Although the results for external metabolomics datasets are reported in Fig S2C, correlation coefficients with the external data are not reported. The authors state, "Note that each study used different definitions for AD and CT groups, had variations in measurement methods and brain regions analyzed." We appreciate these limitations. However, the external data should be re-analyzed using the same definitions of AD and CT, if possible. The limitations and results (which DEMs are shared between datasets) should be discussed in the main text. 6. The glycolysis analysis and discussion needs more development. Glycolysis and gluconeogenesis share many of the same enzymes, but they are not the same pathway and should not be discussed as such. To make a claim about the overall influence of enzyme and metabolite levels on glycolysis, the authors should focus on the energetically committing steps of glycolysis (hexokinase, phosphofructokinase, pyruvate kinase) in Figure 3A, and include the full/current version of the figure in the supplement. Gluconeogenesis-specific enzymes (pyruvate carboxylase, PEPCK) are not mentioned at all - are they among the DEPs/DEGs? 7. Given that there wasn't good concordance between the DEGs and DEPs, did including the mRNA and transcription factor layers in the network really add anything useful? It seems like the main conclusions of the manuscript were driven by the protein and metabolite layers only. How many of the DE metabolic enzymes were coregulated at the transcript and protein level? It would be useful to include the 5-layer trans-omic network in the supplement to display these results. Given your network, at what level does it appear that energy metabolism is regulated? 8. Comment further on the results from Figure 2D. What can be learned from identifying metabolites with the greatest degree centrality? What pathways other than energy metabolism are highlighted by the trans-omic network? 9. (Suggestion) We suggest the authors leverage their trans-omic network in additional ways beyond giving a snapshot of a few energy metabolism pathways. The analysis of top DEMs could go further. What pathways are impacted beyond energy metabolism? Among the metabolic reactions allosterically regulated by top DEMs, what metabolic pathways are enriched? 10. (Suggestion) Figure 3 shows that most differential signal in AD points to lower energy production due to the combination of differentially expressed metabolites and enzymes, but we are not given much context about the strength of these among all the differential signals. We would suggest including volcano plots where the features of interest, i.e. DE enzymes and metabolites, are colored differently (or a similar figure). 11. (Suggestion) The PPI network could be better leveraged to understand metabolic changes in AD. If nodes are grouped into subnetworks (e.g. by Louvain / Leiden clustering) and tested for pathway enrichment, could you find functional subnetworks of coordinately up- and down- regulated metabolic enzymes? This could yield some pathways of interest beyond the energy metabolism pathways already highlighted.

      Minor comments

      1. "All genes" and "all metabolites" should not be the background for the proteomic and metabolic pathway enrichment analysis by Metascape and MetaboAnalyst. The background should be limited to the proteins and metabolites that were measured.
      2. Highlight the metabolic enzymes in Fig S2B. Calculate a separate correlation coefficient for the enzymes extracted in the energy metabolism analysis from Fig 3.
      3. Use a multiple hypothesis adjusted p-value or q-value in Figure S3.
      4. Describe the methods used to calculate the logFC values from the validation dataset.
      5. It is difficult to read Figure 3. We would recommend really emphasizing to the reader to refer to Fig S7B as a "key" to this figure. The description of the red/blue arrows and nodes in the methods section (pg. 24, lines 21-36, pg 25, lines 1-4) were also helpful, but very lengthy. We recommend putting an abridged version of this description into the Fig S7 figure legend.
      6. The S7 figure legend should refer to panels A and B, not E and F.

      7. (Suggestion) Are any of the differentially expressed metabolites allosteric regulators of the DE transcription factors? This could be interesting to discuss.

      Significance

      The study's strength lies in leveraging three omics modalities across large patient cohorts (n ~ 150-240) to identify coherent signals between transcriptomics, proteomics, and metabolomics in postmortem DLPFC tissue. It was encouraging to see that the main result, showing downregulation for TCA, oxidative phosphorylation, and ketone body metabolism, emerged from consistent signals across both proteomics and metabolomics. This result was consistent with previous findings in other models cited by the author4,5 and other studies 6,7 demonstrating deficiency in energy-producing pathways in AD. Another strength of the study is the application of thoughtful methodology to connect differentially expressed proteins and metabolites via an intermediate data layer of metabolic reactions. The authors leverage the KEGG and BRENDA databases and apply sound logic to estimate the effects of enzyme level and metabolite level on pathway activity, with metabolites serving as substrate, product, or allosteric regulator for reactions. This trans-omic network methodology was developed in previous studies cited by the author8,9. However, as written, this study is limited in its contribution of new knowledge to the AD research field. The main conclusion (energy production is down in AD, due to regulatory disruption of energy metabolism) is not strongly supported (see comments 1, 3, and 4 for elaboration). The evidence could be improved by orthogonal approaches: further experimentation, further integration of external datasets, causal modeling, or flux modeling. Alternatively, even in the absence of new experimental and computational approaches, the story could be made more complete by further leveraging the trans-omic network to provide insights into (a) the regulation of energy metabolism; and (b) the impacts of key disrupted metabolites (see comments 7-9). The study is also limited in its demonstrating the power of these methodologies to provide integrative insights. As mentioned above, the integration of enzyme levels and metabolite levels is clearly useful (Figure 3). In contrast, the utility of the mRNA and transcription factor layers was not evident. The study did not appear to improve or expand upon trans-omic network methodology described in the previous works. Finally, the various analyses (analyzing the trans-omic network for nodes with the highest degree centrality, the PPI analysis, and viewing the energy metabolism pathways in the network) provided disparate results that were only tenuously connected in the discussion section.

      References

      1. Yin F. Lipid metabolism and Alzheimer's disease: clinical evidence, mechanistic link and therapeutic promise. FEBS J. 2023;290(6):1420-1453. doi:10.1111/febs.16344
      2. Ju Y, Li S, Kong X, Zhao Q. Exploring fatty acid metabolism in Alzheimer's disease: the key role of CPT1A. Sci Rep. 2024;14(1):31483. doi:10.1038/s41598-024-82999-z
      3. Sienski G, Narayan P, Bonner JM, et al. APOE4 disrupts intracellular lipid homeostasis in human iPSC-derived glia. Sci Transl Med. 2021;13(583):eaaz4564. doi:10.1126/scitranslmed.aaz4564
      4. Zhang C, Rissman RA, Feng J. Characterization of ATP Alternations in an Alzheimer's Transgenic Mouse Model. J Alzheimers Dis JAD. 2015;44(2):375-378. doi:10.3233/JAD-141890
      5. Jia D, Wang F, Yu H. Systemic alterations of tricarboxylic acid cycle enzymes in Alzheimer's disease. Front Neurosci. 2023;17:1206688. doi:10.3389/fnins.2023.1206688
      6. Sang C, Philbert SA, Hartland D, et al. Coenzyme A-Dependent Tricarboxylic Acid Cycle Enzymes Are Decreased in Alzheimer's Disease Consistent With Cerebral Pantothenate Deficiency. Front Aging Neurosci. 2022;14. doi:10.3389/fnagi.2022.893159
      7. Patel V, Mill J, Okonkwo OC, Salamat S, Li L, Raife T. Global Energy Metabolism Deficit in Alzheimer Disease Brain. J Prev Alzheimers Dis. 2024;11(1):171-178. doi:10.14283/jpad.2023.91
      8. Egami R, Kokaji T, Hatano A, et al. Trans-omic analysis reveals obesity-associated dysregulation of inter-organ metabolic cycles between the liver and skeletal muscle. iScience. 2021;24(3):102217. doi:10.1016/j.isci.2021.102217
      9. Bai Y, Morita K, Kokaji T, et al. Trans-omic analysis reveals opposite metabolic dysregulation between feeding and fasting in liver associated with obesity. iScience. 2024;27(3):109121. doi:10.1016/j.isci.2024.109121
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Matsen et al. describe an approach for training an antibody language model that explicitly tries to remove effects of "neutral mutation" from the language model training task, e.g. learning the codon table, which they claim results in biased functional predictions. They do so by modeling empirical sequence-derived likelihoods through a combination of a "mutation" model and a "selection" model; the mutation model is a non-neural Thrifty model previously developed by the authors, and the selection model is a small Transformer that is trained via gradient descent. The sequence likelihoods themselves are obtained from analyzing parent-child relationships in natural SHM datasets. The authors validate their method on several standard benchmark datasets and demonstrate its favorable computational cost.

      They discuss how deep learning models explicitly designed to capture selection and not mutation, trained on parent-child pairs, could potentially apply to other domains such as viral evolution or protein evolution at large.

      Strengths:

      Overall, we think the idea behind this manuscript is really clever and shows promising empirical results. Two aspects of the study are conceptually interesting: the first is factorizing the training likelihood objective to learn properties that are not explained by simple neutral mutation rules, and the second is training not on self-supervised sequence statistics but on the differences between sequences along an antibody evolutionary trajectory. If this approach generalizes to other domains of life, it could offer a new paradigm for training sequence-to-fitness models that is less biased by phylogeny or other aspects of the underlying mutation process.

      Thank you for your kind words.

      Weaknesses:

      Some claims made in the paper are weakly or indirectly supported by the data. In particular, the claim that learning the codon table contributes to biased functional effect predictions may be true, but requires more justification.

      Thank you for this comment, which made us realize that we had not adequately explained the key insight of Figure S3. We have expanded the caption of Figure S3 to clarify:

      “DASM selection factors match the pattern seen in experimental measurements, while masked language models show artifacts from the codon table.

      The experimental data (left two panels) show a slight decrease in median scores for amino acids requiring multiple nucleotide mutations (“multiple”) versus single mutations (“single”).

      DASM captures this pattern, showing similar distributions for both categories.

      In contrast, AbLang and ESM assign radically lower scores to multinucleotide amino acid substitutions, consistent with the masked language modeling objective learning codon-level mutation probabilities as described in the main text (Figure 1a).”

      This figure directly supports our claim: the experimental fitness data show similar distributions for single-mutation vs multiple-mutation amino acids, yet AbLang2 and ESM assign dramatically different scores to these groups, while DASM does not.

      Additionally, the paper could benefit from additional benchmarking and comparison to enhanced versions of existing methods, such as AbLang plus a multi-hit correction.

      It's an interesting idea to consider enhancing existing models. However, this approach faces some challenges. Most fundamentally, it is difficult to recast AbLang and other such models in an evolutionary framework: the masked language objective is simply not an evolutionary one. We have written a whole paper working to do this (https://doi.org/10.1371/journal.pcbi.1013758) and the results were middling despite our best efforts. Specifically regarding multihit, the effects of multihit are minor compared to the codon table effects, and those require the structure of codon-based evolutionary model.

      Further descriptions of model components and validation metrics could help make the manuscript more readable.

      We have clarified several aspects of the model in the revision: we now describe the Thrifty neutral model in the introduction, clarify the transformer architecture and wiggle activation function in the Methods, and explain the joint branch-length optimization procedure.

      In the introduction we now describe Thrifty:

      “This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F 5-mer model.”

      In the Methods we clarify the architecture:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.

      This function asymptotes to zero for highly deleterious mutations and grows sub-linearly for beneficial ones.”

      And the joint optimization:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      Reviewer #2 (Public review):

      Summary:

      Endowing protein language models with the ability to predict the function of antibodies would open a world of translational possibilities. However, antibody language models have yet to achieve breakthrough success, which large language models have achieved for the understanding and generation of natural language. This paper elegantly demonstrates how training objectives imported from natural language applications lead antibody language models astray on function prediction tasks. Training models to predict masked amino acids teaches models to exploit biases of nucleotide-level mutational processes, rather than protein biophysics. Taking the underlying biology of antibody diversification and selection seriously allows for disentangling these processes through what the authors call deep amino acid selection models. These models extend previous work by the authors (Matsen MBE 2025) by providing predictions not only for the selection strength at individual sites, but also for individual amino acid substitutions. This represents a practically important advance.

      Strengths:

      The paper is based on a deep conceptual insight, the existence of a multitude of biological processes that affect antibody maturation trajectories. The figures and writing a very clear, which should help make the broader field aware of this important but sometimes overlooked insight. The paper adds to a growing literature proposing biology-informed tweaks for training protein language models, and should thus be of interest to a wide readership interested in the application of machine learning to protein sequence understanding and design.

      Thank you for your kind words.

      Weaknesses:

      Proponents of the state-of-the-art protein language models might counter the claims of the paper by appealing to the ability of fine-tuning to deconvolve selection and mutation-related signatures in their high-dimensional representation spaces. Leaving the exercise of assessing this claim entirely to future work somewhat diminishes the heft of the (otherwise good!) argument.

      This is an interesting idea! However, it seems to us that this approach has some fundamental limitations. Existing models operate on amino acid sequences with no nucleotide representation, so while they can be implicitly biased by the codon table, they have no signal to separate selection from effects related to the codon table and SHM rates.

      We interpret this comment as proposing that we could use fine-tuning on functional data to pull out the selection components (that would only affect the functional data) versus the mutation component. That sounds like an interesting research project. We would be concerned that there are correlations between mutability and selective effects (e.g., CDRs are both more mutable and under different selection), creating identifiability problems unless separate data sources are used as we do here.

      Additionally, the fine-tuning approaches we are aware of are taskspecific: they require labeled data from a specific assay (binding to antigen X, expression in system Y) that may or may not relate to the general evolutionary selection signal. Also, such approaches are limited to the specific data used and may not do a good job of guiding the model to a signal that is not present in the training data.

      By structuring the model as we do, we obtain the evolutionary interpretation directly from phylogenetic signal without requiring taskspecific supervision.

      In the context of predicting antibody binding affinity, the modeling strategy only allows prediction of mutations that improve affinity on average, but not those which improve binding to specific epitopes.

      We agree, and this is fundamental to any general purpose model. Predictions of binding patterns for a specific target requires information about that target to be specified in the training data. We look forward to developing such task-specific models in the future.

      We have added a paragraph to the Discussion clarifying this limitation:

      “The current generation of DASM model does not use any antigen-labeled training data.

      The signal that it leverages to infer some limited ability to predict binding comes from natural affinity maturation.

      This affinity maturation comes through natural repertoires and so represents a mix of all of the antigens to which the sampled individuals have been exposed.”

      Reviewer #3 (Public review):

      Summary:

      This work proposes DASM, a new transformer-based approach to learning the distribution of antibody sequences which outperforms current foundational models at the task of predicting mutation propensities under selected phenotypes, such as protein expression levels and target binding affinity. The key ingredient is the disentanglement, by construction, of selection-induced mutational effects and biases intrinsic to the somatic hypermutation process (which are embedded in > a pre-trained model).

      Strengths:

      The approach is benchmarked on a variety of available datasets and for two different phenotypes (expression and binding affinity). The biologically informed logic for model construction implemented is compelling, and the advantage, in terms of mutational effects prediction, is clearly demonstrated via comparisons to state-of-the-art models.

      Thank you.

      Weaknesses:

      The gain in interpretability is only mentioned but not really elaborated upon or leveraged for gaining insight.

      We are also excited about the ability of these models to provide interpretable predictions. We have dedicated an entire paper to this direction: “A Sitewise Model of Natural Selection on Individual Antibodies via a Transformer-Encoder" in MBE (https://doi.org/10.1093/molbev/msaf186). The interpretations offered by that paper overturn some of the oversimplified dogma about how natural selection works in antibodies (purifying in FWK and diversifying in CDR), giving a more nuanced sitewise perspective. The paper also highlights the importance of specific structural features of the antibodies.

      This eLife paper, on the other hand, is focused on comparison to antibody language models and benchmarking zero-shot prediction on functional tasks.

      We have better highlighted this new paper in our revision with:

      “We have dedicated a companion paper to leveraging this interpretability to provide new perspectives on the operating rules of affinity maturation (Matsen et al., MBE 2025): that work provides a nuanced sitewise perspective on natural selection in antibodies that challenges classical oversimplified views of selection patterns.”

      The following aspects could have been better documented: the hyperparametric search to establish the optimal model; the predictive performance of baseline approaches, to fully showcase the gain yielded by DASM.

      We appreciate the concern and the desire to reveal all the factors that lead to a strong performance result. For this particular paper, we feel that this is less of a concern because we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. We now describe how other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      Regarding baseline approaches, our previous paper includes comparisons to simpler models for the evolutionary objective. Here we focus on comparison to antibody language models for functional prediction. Comparing between state-of-the-art models is the standard practice for papers in this field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      We recommend modest amounts of revision, discussed below:

      Major comments:

      (1) In the first section of the results, there is extensive discussion on shortcomings of existing antibody language models like AbLang2 that seems to associate all of the performance gap with the inability to separate non-synonymous mutations separated by 1 or 2+ substitutions.

      In reality, some of the lower likelihoods in the 2+ substitution case could actually reflect real fitness deficits (while others could indeed be rarer occurrences in the training data). The authors should either moderate these claims or do an analysis that leverages antibody deep mutational scanning data to show that, conditioned on the fitness of the antibody (probably expression) being the same (either all high or all low), AbLang2 still artefactually considers rarer-training/less-codon-accessible variants to be less fit.

      As described above, we believe that this is addressed by Figure S3, but if not please correct us.

      (2) Some in the machine learning for antibody community might view the set of benchmarked datasets to be incomplete and somewhat arbitrarily selected, though we do think this is a good start, and the results are promising. A dataset commonly used in this field that is missing from this paper is from Shehata et al. (https://pubmed.ncbi.nlm.nih.gov/31553901/). A binding affinity experiment that is also commonly used in the field is from Phillips et al. (https://elifesciences.org/articles/71393) - this dataset measures combinatorial changes of framework regions on binding, which may be especially relevant here.

      We're glad to have the opportunity to clarify this, thanks.

      We based our evaluations on the April 2024 version of the FLAb benchmarking project (https://doi.org/10.1101/2024.01.13.575504) which preceded our work and thus was not subject to selection bias by us. We took the largest data sets in that repository. After this we became aware of the rich data sets offered by the Whitehead lab that provided binding measurements for many variants for a number of antigens, and added that to the evaluation set.

      We have clarified this in the manuscript:

      “We based our evaluations on the April 2024 version of the FLAb benchmarking project, which preceded our work and thus was not subject to selection bias by us.

      We also benchmarked high-throughput binding data (more recent than FLAb) from the Whitehead lab that provided affinity measurements across many variants and antigens.”

      The Shehata dataset is interesting but doesn't fit so much in the DASM mold: it is a survey of biophysical properties across many independent antibodies rather than a deep investigation of point mutants of a smaller collection of focal antibodies.

      FLAb has grown to include the Phillips dataset. We are working full-tilt on the next version of DASM and will be including many other datasets in our paper on DASM2. Thanks for the tip!

      (3) Similar to the above comment, we were also extremely curious as to why the authors did not test data from DeWitt et al. (https://pubmed.ncbi.nlm.nih.gov/40661619/). Instead, the authors only make a cryptic reference to this study on lines 201-6, but we could not even find a figure describing the results discussed on these lines. It would be great to actually include this data.

      We agree, however, our model is for human rather than mouse. We would like to train a mouse model in the future but have not yet lined up the appropriate data.

      (4) The authors should comment on potential data leakage if the SHM trajectories used in training have a similar sequence or antigen similarity to the benchmark expression/binding datasets.

      This is a good question that we should clarify. Our model is trained only on evolutionary trajectories and not functional data. Evaluation is then done on functional data without fine-tuning. Because these evaluation data are categorically different from the training data and thus data leakage is not a problem. Recall that our model is zero-shot: it only considers evolutionary trajectories and not functional data as such. In a similar way, other self-supervised models such as MLMs do not exclude seeing an antibody in the training data when they are doing functional prediction.

      We have clarified this in the manuscript with

      “Because the DASM is trained exclusively on evolutionary trajectories rather than functional measurements, evaluation on expression and binding benchmarks is strictly zero-shot with no risk of data leakage.”

      Relatedly, what happens if this approach is applied to completely de novo antibodies?

      We direct this reviewer to the Shanehsazzadeh dataset that involves antibodies that were suggested by an AI algorithm rather than observed in nature.

      If the reviewer is referring to completely synthetic antibody molecules, such as those generated by inverse folding, we have not attempted this.

      (5) It makes sense that you included the multihit correction as a response to your earlier instantiation (without this correction) underestimating the probabilities of multiple mutations in a codon associated with a single amino acid substitution (lines 476-477).

      However, this could potentially make for a somewhat unfair comparison to existing methods: if, say, we took AbLang (or another comparator) and also applied a multi-hit correction (even in some naive way at inference time), how would that compare to DASM? If this comparison favors DASM, it would show that models need more than just such a correction on top of existing methods to do good sequence scoring--which would only amplify the impact of the results.

      Thank you for this suggestion. We believe that we have addressed it in the response to the public reviews, but please let us know if not.

      Minor comments:

      (1) It would be worth explicitly defining/summarizing the mutation model used in the study, e.g. giving an overview of Thrifty in the introduction or where it first appears.

      Thanks, we have done this:

      “Our approach separates mutation and selection processes by encoding functional effects in a Deep Amino acid Selection Model (DASM) while explicitly modeling mutation using a separate fixed model trained on neutrally evolving data.

      This fixed model uses convolutions on 3-mer embeddings to deliver wide context sensitivity without needing a large number of parameters: the variant we use has around the same number of parameters as the classic S5F (Yaari et al., 2013) 5-mer model.”

      (2) Paragraph starting on line 58: it sounds like you're suggesting that masked deep learning models will learn certain features of genomes in a certain order. We suggest that you weaken the language, giving examples of various things the model could learn, not implying that such models will necessarily learn the most useful features after the less useful ones.

      We have fixed this by removing the "First... Second... Third... Finally" ordering:

      “It could memorize the germline genes and learn about the probabilities of V(D)J recombination.

      It could learn the codon table, as according to this table some aminoacid mutations are much more likely than others. It could learn rates of somatic hypermutation...

      It could also learn about the impact of amino acid mutations on antibody function through natural selection in the course of affinity maturation, which is the desired signal.

      However, this desired signal is confounded by the preceding factors.”

      (3) Line 72: You make a strong claim that existing models conflate mutation and selection without knowing for sure that they didn't successfully learn these components separately (it seems this would require a lot of mechanistic interpretability). The language could be softened here.

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (4) Line 79: Say a bit more about the separate fixed mutation model here. Why shouldn't we worry about this choice (especially the word "fixed") biasing your results? Does the empirical performance of your method suggest this doesn't really matter?

      We have added to the description of the fixed mutation model, as described above.

      As described in the public response, training SHM models on out-of-frame sequences is an established methodology for characterizing mutation in the absence of selection. In principle one could jointly train a model of SHM and selection, but one could have identifiability problems as there is a correlation between more mutable sites (e.g. in the CDRs) and those under relaxed selection. Using out-of-frame sequences gives a clean an independent description of the SHM process.

      (5) Line 81: on what benchmarks does it outperform? State briefly.

      Great suggestion. Done:

      “The DASM, trained on substantially less data, outperforms AbLang2 and general protein language models including ESM2 and ProGen2-small. This outperformance holds on the largest benchmark datasets of the FLAb collection and on recent high-throughput binding assays.”

      (6) Paragraph starting on line 90: The topic sentence reads a bit vague to us. Do you mean that you want to learn the extent to which models are regurgitating nucleotide similarity of AAs in determining the scores associated with AAs at masked sites?

      Thank you. We have updated to

      "We first sought to understand the extent to which processes such as neutral mutation rate and the codon table influence antibody language model prediction at masked sites."

      (7) Paragraph starting on line 108: feels speculative and maybe better for the discussion...

      We appreciate this comment, but we have decided to keep the content where it is. Although this would make sense as a Discussion item we feel like it fits well here right next to the evidence, and the structure of our Discussion doesn't really have a place for it.

      (8) Paragraph starting on line 116: don't say "sequences from [12]" or "method of [15]." Explain what these are before giving the citation.

      Whoops! Thanks. We have fixed these.

      (9) Line 134: Consider giving a brief definition of perplexity?

      Thanks. We added our favorite definition:

      “Perplexity (as defined in the Methods) is the standard way of evaluating the plausibility of a sequence according to a model: it is the acrosssite geometric mean of the inverse probability of the observed amino acid.”

      (10) Line 154: A citation here could be useful to support the claim that these models are learning phylogeny.

      We have replaced with the more clearly established "codon table":

      “We implemented a model to learn amino-acid preferences of antibodies without being influenced by germline genes, the codon table, or SHM biases.”

      (11) Lines 161-162: Given that phylogenetic inference methods can be tough to scale, we're curious how you managed to get 2 million PCPs from the data? Did you construct a bunch of different phylogenies (in > parallel)?

      Indeed! We now clarify in the methods section that these trees were run in parallel across clonal families:

      “As in our previous work, tree inference and ancestral sequence reconstruction were performed per clonal family with the K80 substitution model...

      Because these clonal families are independent these phylogenetic inferences were run in parallel.”

      (12) Line 173-174: Can you say more about the joint optimization of the branch lengths? Are you conditioning on a phylogenetic tree topology only, and leaving the branch lengths unknown? Do you account for the fact that these branch lengths in the same phylogenetic tree aren't independent?

      Thanks for pointing out the need to clarify these points. We have done so in the methods section and provided a pointer to the methods section in the main text.

      In the main text we now say:

      “We trained DASMs of several sizes (~1M, ~4M, ~7M) using joint optimization of branch length t and parameters of the DASM (see Methods for details).”

      And in the Methods:

      “This joint optimization is performed cyclically, in which a complete cycle consists of neural network optimization followed by branch length optimization for every parent-child pair.

      The parent sequence and the child sequence are pre-estimated, fixed, and used as training data.

      The branch lengths are independent and so are optimized in parallel.”

      (13) Line 358: Yes, in a trivial sense, separating mutation and selection means that we know exactly how each of those two components has been learned. We would be curious if you could say anything about mechanistic interpretability within the deep learning selection model. If not, could this be a future research direction?

      We believe that we have addressed this in the response to public reviews, but please let us know if not.

      (14) Lines 384-386--indeed. Do you have any proposals for how a phylogeny could be constructed at this scale?

      As above this is not one big phylogeny but many, which invites parallelization.

      Reviewer #2 (Recommendations for the authors):

      (1) I agree that a full study of fine-tuning strategies for all possible alternative models is beyond the scope of the paper. However, a little bit of fine-tuning would go a long way to demonstrate how easy (or hard) it is to extract the relevant signal from a general protein language model embedding.

      As described in our response to the public reviews, we appreciate this point but have decided to focus on the core novelty of the paper and leave fine-tuning experiments to future work.

      (2) The authors might want to add some discussion about what signals their models capture with regard to binding affinity (averages), and how this limitation might be addressed in future work.

      As described in our response to the public reviews, we have added a paragraph to the Discussion clarifying this limitation.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I think more references have to be provided re: Antibody "foundation" language models, e.g. adding AntiBERTy and the two versions of AntiBERTa.

      We have added citations to those two models, although we weren't sure what the second version of AntiBERTa was. There are very many antibody language models. If we could use number ranges we would cite a dozen or more, but I hesitate to add many of them in the eLife format, which has parenthetical citations. If there are others that you consider essential don't hesitate to suggest them.

      (2) A key point of the approach is the disentanglement of “mutation” and “selection”, as mentioned in the introduction. However, the explanation of what the authors mean by mutation and selection comes only later. I would anticipate it in the introduction for clarity.

      This is a great point. The revised intro has this in the second sentence:

      “Natural antibodies are generated through V(D)J recombination, and refined by somatic hypermutation and affinity-based selection in germinal centers.”

      and the "While the masked..." paragraph now more clearly calls out selection.

      (3) Line 133: expression of what? Could the authors also explain mechanistically why expression should be impacted by a mutation? In what conditions do these data sample expression?

      We have clarified that it is expression in a phage display library:

      “To do so, we used the largest dataset of the FLAb collection of benchmarks, which measures the effect of single mutations on expression in a phage display library.”

      (4) Line 142: Clarify that 0.49 and 0.3 are correlation coefficients. Also, what type of correlation coefficient is this?

      Thanks for the catch! They are Pearson correlations as we now describe.

      (5) Line 173: The hyperparametric search should have been more documented (with a description of how it was carried out and plots).

      As described in our response to the public reviews, we are optimizing according to an evolutionary objective function and then evaluating according to a functional one. Other than model size, hyperparameters stayed the same as in our previous paper (Matsen et al., MBE 2025).

      (6) Line 358: The authors say that 'DASMs provide direct interpretability'. However, this is not really inspected. A valuable addition would be to show how such interpretability is made possible, how it can recapitulate existing biological knowledge or provide hints for antibody engineering.

      As described above, this is addressed in detail in our previous paper.

      (7) Line 398: 'Inferred insertions or deletions were reversed, so that all sequences align to the naive sequence without gaps.' Could the authors comment on whether this is a limitation of the approach, why it wasn't dealt with and whether it could be the direction of future work?

      Funny you should mention this! We have been planning out such an extension in detail recently. We have added a sentence in the discussion:

      “We also have plans to extend the DASM framework to estimate the effect of natural selection on insertion and deletion events.”

      (8) Line 430-431: Could the authors clarify 'shared' over what? Also, I believe these two lines really describe the DASM architecture. This should be spelt out more clearly and tied to the description provided in lines 173-175. A diagram of the architecture would be a valuable addition to provide a full picture of the model (this could be added to the general diagram of the modelling approach of Figure S8).

      We have clarified in the text that this is indeed a description of the DASM architecture -- thanks for the catch:

      “We parameterize the DASM f using the standard transformer-encoder architecture: an amino-acid embedding, sinusoidal positional encodings, and PyTorch's TransformerEncoder module.

      The only non-standard component to this architecture is a custom “wiggle” activation function to the output layer that prevents extreme selection factors as previously described.”

      The architecture is very “stock” - just the default torch TransformerEncoder, so I don't think that it merits a diagram. We have expanded our discussion of the simple architecture in the revision. This sits in contrast to the setup for the loss function, which is quite custom and is the subject of Figure 2 and Figure S8.

      (9) Another general remark is that, to fully showcase the predictive advantage offered by DAMS with all the modelling choices entailed, one could show the performance of simpler models, like the mutation model alone (with no selection factors), or models where selection factors are just learnt independently for each site, or are learnt with a simple linear layer instead of a transformer (these are just ideas of some simpler approach that can set baselines over which DASM improvement can be shown).

      This is a great suggestion. The primary focus of this paper is in comparing to alternate antibody language models in terms of functional prediction.

      These simpler models could be used for comparing the evolutionary objective, which we did in our previous paper (https://doi.org/10.1093/molbev/msaf186). We note that a sitewise model with fixed sites cannot really be appropriately formulated due to sequences being of different lengths.

      Additional changes

      In addition to the reviewer-requested changes, we added a comparison of ESM2 model sizes (650M vs 3B parameters) on the Koenig benchmark. We found that scaling ESM2 from 650M to 3B parameters did not improve performance. Indeed, the larger model showed slightly degraded correlations, particularly for light chain predictions. This is consistent with recent observations that medium-sized protein language models can outperform larger ones on transfer learning tasks (Vieira et al., Sci. Rep. 2025). We added Table S2 documenting these results and cite this finding in the main text to justify our use of the 650M model throughout the analyses. After doing this, we realized for the Shanehsazzadeh evaluation we had accidentally used ESM2-3B instead of ESM2-650M. The corrected ESM2-650M values are slightly lower (0.191 and 0.308 for sequence lengths 119 and 120, respectively, compared to the previous values of 0.248 and 0.337). This correction does not affect our conclusions, as DASM substantially outperforms ESM2 on this benchmark before and after the change.

      We also realized in the course of revision that we had been scoring AbLang2 using the masked-marginals pseudo-perplexity approach for the single-mutant Koenig dataset (Figure 1c), rather than the standard persequence pseudo-perplexity used elsewhere in the paper. For maskedmarginals, probabilities are computed using only wild-type context, whereas standard pseudo-perplexity uses each variant's own context.

      The masked-marginals approach has a simple interpretation: for singlemutation variants, it is a linear transformation of the log ratio of the variant amino acid probability to the wild-type amino acid probability, both evaluated under wild-type context. This log-odds ratio directly measures how much the model prefers the mutation over the original residue.

      We found that masked-marginals performed better for AbLang2 on this dataset, so we continued using it for Figure 1c. However, for the benchmarking table (Table 1), we switched to per-sequence pseudoperplexity as for the other comparisons in the paper, following the standard benchmarking protocol defined in FLAb (Chungyoun et al., 2024). We document both approaches in the Methods section:

      “An alternative “masked-marginals” approach scores variants using only wild-type context.

      For a wild-type sequence w, masked-marginals computes . for all amino acids a at each position i once, then uses these wild-type-derived probabilities to compute pseudoperplexity for any variant x...

      For a single-mutation variant x that differs from wild-type w only at position j, all terms except position j cancel when comparing to wild-type, giving . Thus, the log-probability difference between variant and wild-type amino acids equals, up to an additive constant that depends only on the wild-type sequence, negative n times the log pseudo-perplexity of the variant.

      For Figure 1c on the single-mutant Koenig dataset, we found that this approach gave a higher correlation for AbLang2 and so used it in that figure.

      For benchmarking comparisons (Table 1), we followed standard practice and used per-sequence pseudo-perplexity.”

    1. Reviewer #3 (Public review):

      Summary:

      The researchers performed a genetic screen to identify a protein, ZNF-236, which belongs to the zinc finger family, and is required for repression of heat shock inducible genes. The researchers applied a new method to map the binding sites of ZNF-236, and based on the data, suggested that the protein does not repress genes by directly binding to their regulatory regions targeted by HSF1. Insertion of a reporter in multiple genomic regions indicates that repression is not needed in repetitive genomic contexts. Together, this work identifies ZNF-236, a protein that is important to repress heat-shock-responsive genes in the absence of heat shock.

      Strengths:

      A hit from a productive genetic screen was validated, and followed up by a series of well-designed experiments to characterize how the repression occurs. The evidence that the identified protein is required for the repression of heat shock response genes is strong.

      Weaknesses:

      The researchers propose and discuss one model of repression based on protein binding data, which depends on a new technique and data that are not fully characterized.

      Major Comments:

      (1) The phrase "results from a shift in genome organization" in the abstract lacks strong evidence. This interpretation heavily relies on the protein binding technique, using ELT-2 as a positive and an imperfect negative control. If we assume that the binding is a red herring, the interpretation would require some other indirect regulation mechanism. Is it possible that ZNF-236 binds to the RNA of a protein that is required to limit HSF-1 and potentially other transcription factors' activation function? In the extrachromosomal array/rDNA context, perhaps other repressive mechanisms are redundant, and thus active repression by ZNF-236 is not required. This possibility is mentioned in one sentence in the discussion, but most of the other interpretations rely on the ZNF-236 binding data to be correct. Given that there is other evidence for a transcriptional role for ZNF-236, and no negative control (e.g. deletion of the zinc fingers, or a control akin to those done for ChIP-seq (like a null mutant or knockdown), a stronger foundation is needed for the presented model for genome organization.

      (2) Continuing along the same line, the study assumes that ZNF-236 function is transcriptional. Is it possible to tag a protein and look at localization? If it is in the nucleus, it could be additional evidence that this is true.

      (3) I suggest that the authors analyze the genomic data further. A MEME analysis for ZNF-236 can be done to test if the motif occurrences are enriched at the binding sites. Binding site locations in the genome with respect to genes (exon, intron, promoter, enhancer?) can be analyzed and compared to existing data, such as ATAC-seq. The authors also propose that this protein could be similar to CTCF. There are numerous high-quality and high-resolution Hi-C data in C. elegans larvae, and so the authors can readily compare their binding peak locations to the insulation scores to test their hypothesis.

      (4) The researchers suggest that ZNF-236 is important for some genomic context. Based on the transcriptomic data, can they find a clue for what that context may be? Are the ZNF-236 repressed genes enriched for not expressed genes in regions surrounded by highly expressed genes?

    1. chocolate milk on tap
      1. Cookies and cream ice cream
      2. Cadbury's Marble chocolate bar
      3. Strawberry milkshake
      4. Fanmade Animals Of Farthing Wood characters
      5. Juicy gossip at the hairdresser's
      6. My Little Pony having autistic ponies
      7. Remembering the first season of Scream Street
      8. WatchMojo's 10 Childhood Shows That Feel Like A Fever Dream Video
      9. WatchMojo's 10 Annoying Kids' Shows Video
      10. Doing a scrapbook
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Qin and colleagues aim to delineate a neural mechanism by which the internal satiety levels modulate the intake of sugar solution. They identified a three-step neuropeptidergic system that downregulates the sensitivity of sweet-sensing gustatory sensory neurons in sated flies. First, neurons that release a neuropeptide Hugin (which is an insect homolog of vertebrate Neuromedin U (NMU)) are in an active state when the concentration of glucose is high. This activation does not require synaptic inputs, suggesting that Hugin-releasing neurons sense hemolymph glucose levels directly. Next, the Hugin neuropeptides activate Allatostatin A (AstA)-releasing neurons via one of Hugin's receptors, PK2-R1. Finally, the released AstA neuropeptide suppresses sugar response in sugar-sensing Gr5a-expressing gustatory sensory neurons through AstA-R1 receptor. Suppression of sugar response in Gr5a-expressing neurons reduces the fly's sugar intake motivation (measured by proboscis extension reflex). They also found that NMU-expressing neurons in the ventromedial hypothalamus (VMH) of mice (which project to the rostral nucleus of the solitary tract (rNST)) are also activated by high concentrations of glucose, independent of synaptic transmission, and that injection of NMU reduces the glucose-induced activity in the downstream of NMU-expressing neurons in rNST. These data suggest that the function of Hugin neuropeptide in the fly is analogous to the function of NMU in the mouse.

      Generally, their central conclusions are well-supported by multiple independent approaches. The parallel study in mice adds a unique comparative perspective that makes the paper interesting to a wide range of readers. It is easier said than done: the rigor of this study, which effectively combined pharmacological and genetic approaches to provide multiple lines of behavioral and physiological evidence, deserves recognition and praise.

      A perceived weakness is that the behavioral effects of the manipulations of Hugin and AstA systems are modest compared to a dramatic shift of sugar solution-induced PER (the behavioral proxy of sugar sensitivity) induced by hunger, as presented in Figure 1B and E. It is true that the mutation of tyrosine hydroxylase (TH), which synthesizes dopamine, does not completely abolish the hunger-induced PER change, but the remaining effect is small. Moreover, the behavioral effect of the silencing of the Hugin/AstA system (Figure Supplement 13B, C) is difficult to interpret, leaving a possibility that this system may not be necessary for shifting PER in starved flies. These suggest that the Hugin-AstA system accounts for only a minor part of the behavioral adaptation induced by the decreased sugar levels. Their aim to "dissect out a complete neural pathway that directly senses internal energy state and modulates food-related behavioral output in the fly brain" is likely only partially achieved. While this outcome is not a shortcoming of a study per se, the depth of discussion on the mechanism of interactions between the Hugin/AstA system and the other previously characterized molecular circuit mechanisms mediating hunger-induced behavioral modulation is insufficient for readers to appreciate the novelty of this study and future challenges in the field.

      We thank the reviewer for the thoughtful comment. We agree that the behavioral effects of manipulating the Hugin–AstA system alone were considerably weaker than the pronounced PER shifts induced by starvation. We have revised our Discussion to address it by positioning our findings within the broader context of energy regulation.

      More specifically, we discuss that feeding behavior is controlled by two distinct, yet synergistic, types of mechanisms:

      (1) Hunger-driven 'accelerators': as the reviewer notes, pathways involving dopamine and NPF are powerful drivers of sweet sensitivity. These systems are strongly activated by hunger to promote food-seeking and consumption.

      (2) Satiety-driven 'brakes': our study identifies the counterpart to those systems above, aka. a satiety-driven 'brake'. The Hugin–AstA pathway acts as a direct sensor of high internal energy (glucose), which is specifically engaged during satiety to actively suppress sweet sensation and prevent overconsumption.

      This framework explains the seemingly discrepancy in effect size. The dramatic PER shift seen upon starvation is a combined result of engaging the 'accelerators' (hunger pathways like TH/NPF) while simultaneously releasing the 'brake' (our Hugin–AstA pathway being inactive).

      Our manipulations, which specifically target only the 'brake' system, are therefore expected to have a more modest effect than this combined physiological state. Thus, rather than being a "minor part," the Hugin–AstA pathway is a mechanistically defined, satiety-specific circuit that is essential for the precise "braking" required for energy homeostasis. We will update our Discussion to emphasize how these 'accelerator' and 'brake' circuits must work in concert to ensure precise energy regulation.

      In this context, authors are encouraged to confront a limitation of the study due to the lack of subtype-level circuit characterization, despite their intriguing finding that only a subtype of Hugin- and AstA-releasing neurons are responsive to the elevated level of bath-applied glucose.

      We thank the reviewer for highlighting the critical issue of subtype-level specialization within the Hugin and AstA populations.

      We fully agree that the Hugin system is known for its functional heterogeneity (pleiotropy), with different Hugin neuron subclusters implicated in regulating a variety of behaviors, including feeding, aversion, and locomotion (e.g., Anna N King, Curr Biol, 2017, Andreas PLoS Biol, Sebastian et al., 2016, Nat Comm). Our finding that only a specific subcluster of Hugin neurons is responsive to glucose elevation provides a crucial first step in functionally dissecting this complexity.

      we have added a dedicated paragraph to elaborate on this functional partitioning in the discussion. We propose that this subtype-level specialization allows the Hugin system to precisely link specific physiological states (like high circulating glucose) to appropriate behavioral outputs (like the suppression of sweet taste), demonstrating an elegant solution to coordinating multiple survival behaviors. Future work using high-resolution tools such as split-GAL4 and single-cell sequencing will be invaluable in fully mapping the specific functional roles corresponding to each Hugin and AstA subcluster.

      Reviewer #2 (Public review):

      Summary:

      The question of how caloric and taste information interact and consolidate remains both active and highly relevant to human health and cognition. The authors of this work sought to understand how nutrient sensing of glucose modulates sweet sensation. They found that glucose intake activates hugin signaling to AstA neurons to suppress feeding, which contributes to our mechanistic understanding of nutrient sensation. They did this by leveraging the genetic tools of Drosophila to carry out nuanced experimental manipulations and confirmed the conservation of their main mechanism in a mammalian model. This work builds on previous studies examining sugar taste and caloric sensing, enhancing the resolution of our understanding.

      Strengths:

      Fully discovering neural circuits that connect body state with perception remains central to understanding homeostasis and behavior. This study expands our understanding of sugar sensing, providing mechanistic evidence for a hugin/AstA circuit that is responsive to sugar intake and suppresses feeding. In addition to effectively leveraging the genetic tools of Drosophila, this study further extends their findings into a mammalian model with the discovery that NMU neural signaling is also responsive to sugar intake.

      Weaknesses:

      The effect of Glut1 knockdown on PER in hugin neurons is modest, and does not show a clear difference between fed and starved flies as might be expected if this mechanism acts as a sensor of internal energy state. This could suggest that glucose intake through Glut1 may only be part of the mechanism.

      We thank the reviewer for this insightful comment and agree that the modest behavioral effect of Glut1 knockdown is a critical finding that warrants further clarification. This observation strongly supports the idea that internal energy state is monitored by a sophisticated and robust network, not a single, fragile component. We believe the effect size is modest for two main reasons, which we have addressed in revised Discussion.

      Firstly, the effect size is likely attenuated by technical and molecular redundancy. Specifically, the RNAi-mediated knockdown of Glut1 may be incomplete, leaving residual transporter function. Furthermore, Glut1 is likely only one part of the Hugin neuron's intrinsic sensing mechanism; other components, such as alternative glucose transporters or downstream K<sub>ATP</sub> channel signaling, may provide molecular redundancy, meaning that the full energy-sensing function is not easily abolished by a single manipulation.

      Secondly, and more importantly, the final feeding decision is an integrated output of competing circuits. While hunger-sensing pathways like the dopamine and NPF circuits act as powerful "accelerators" to drive sweet consumption, the Hugin–AstA pathway serves as a satiety-specific "brake." The modest effect of partially inhibiting just one component of this 'brake' system is the hallmark of a precisely regulated, multi-layered homeostatic system. We have clarified in the Discussion that the Hugin pathway represents one essential inhibitory circuit within this cooperative network that works together with the hunger-promoting systems to ensure precise control over energy intake.

      Reviewer #3 (Public review):

      Summary:

      This study identifies a novel energy-sensing circuit in Drosophila and mice that directly regulates sweet taste perception. In flies, hugin+ neurons function as a glucose sensor, activated through Glut1 transport and ATP-sensitive potassium channels. Once activated, hugin neurons release hugin peptide, which stimulates downstream Allatostatin A (AstA)+ neurons via PK2-R1 receptors. AstA+ neurons then inhibit sweet-sensing Gr5a+ gustatory neurons through AstA peptide and its receptor AstA-R1, reducing sweet sensitivity after feeding. Disrupting this pathway enhances sweet taste and increases food intake, while activating the pathway suppresses feeding.

      The mammalian homolog of neuromedin U (NMU) was shown to play an analogous role in mice. NMU knockout mice displayed heightened sweet preference, while NMU administration suppressed it. In addition, VMH NMU+ neurons directly sense glucose and project to rNST Calb2+ neurons, dampening sweet taste responses. The authors suggested a conserved hugin/NMU-AstA pathway that couples energy state to taste perception.

      Strengths:

      Interesting findings that extend from insects to mammals. Very comprehensive.

      Weaknesses:

      Coupling energy status to taste sensitivity is not a new story. Many pathways appear to be involved, and therefore, it raises a question as to how this hugin-AstA pathway is unique.

      The reviewer is correct that several energy-sensing pathways are known. However, we now clarify that these previously established mechanisms, such as the dopaminergic and NPF pathways, primarily function as hunger-driven "accelerators." They are activated by low-energy states to promote sweet sensitivity and drive consumption.

      The crucial, missing piece of the puzzle—which our study provides—is the satiety-specific "brake" mechanism. We identify the Hugin–AstA circuit as one of the “brakes”: a dedicated, central sensor that responds directly to high circulating glucose (satiety) to suppress sweet sensation and prevent overconsumption.

      Thus, our work is unique because it defines the essential counterpart to the hunger pathways. In the revised Discussion, we have explained how these 'accelerator' (hunger) and 'brake' (satiety) systems work in concert to allow for the precise, bidirectional regulation of energy intake. Furthermore, by demonstrating that this Hugin/NMU 'brake' circuit is evolutionarily conserved in mice, our findings reveal a fundamental energy-sensing strategy and suggest that this pathway could represent a promising new therapeutic target for managing conditions of excessive food intake.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Considering the comments from all three reviewers, new experiments are not necessary, but the authors are welcome to provide new pieces of evidence that would strengthen their conclusions. To assist the authors with their revisions, the comments have been categorized from the highest to lowest priority based on the concerns raised by reviewers 1, 2, and 3.

      High priority:

      (1) Acknowledgement of partial phenotypes by the genetic manipulations, especially relative to other neuromodulators that are involved in the adjustment of sugar sensitivity after starvation (1, 2).

      Please see our responses to the Public Review 1 for details.

      (2) Detailed discussion on the novelty of the present work, also in light of previous studies both in flies and mammals (known Drosophila modulators, as well as NMU-rNST circuit on sugar sensation) (1, 2, 3).

      Please see our responses to the Public Review 3 for details.

      (3) Medium priority:

      • Discussions on the subtype-specific function of hugin neurons (1).

      Please see our responses to the Public Review 1 for details.

      • Discussions on the pleiotropic effect of changes in the level of circulating sugar (including release of other sugar types) (2, 3).

      We agree that circulating sugars represent a complex, systemic signal with broad, pleiotropic effects, and we have expanded our Discussion to address this.

      We will discuss the functional distinction between key hemolymph sugars, such as trehalose (the main circulating sugar, critical for stress/flight) and glucose (the primary, rapidly mobilized energy currency). While various sugars collectively influence metabolic status, our study’s unique focus is on the direct neural link between internal energy and sweet taste modulation. We clarify that our work precisely identifies glucose as the direct, key ligand for the Hugin satiety circuit, thus providing a concrete, mechanistically defined link from systemic energy complexity to the specific regulation of sweet sensation.

      • Illustration or clear explanations of sugar application methods in mouse experiments (ex. Figure 5F vs Figure 5M), as well as discussion on the concentration of sugar solutions used (3).

      We have added the relevant details in the figure legends and explain the rationale for using this concentration of sugar in the results.

      • Less saturated image for Figure 5K (3).

      We have adjusted Figure 5K to reduce image saturation for clarity.

      • Discussions on the modest effect of NMU on rNST neurons (Figure 5M) (3).

      In the revised results, we have discussed that the modest suppression of rNST activity likely reflects partial peptide diffusion and the heterogeneous composition of sweet-responsive rNST neurons.

      (4) Low priority:

      • Systematic quantification of multiple types of sugars after starvation (3).

      We agree that circulating sugars represent a complex metabolic milieu, and a fully systematic biochemical quantification of individual hemolymph sugars after starvation would be informative. While such analyses are beyond the scope of the present study, we have addressed this point at the functional level by systematically pre-feeding flies with different types of dietary sugars prior to PER assays.

      We find that multiple sugars are capable of suppressing PER, indicating that satiety-related behavioral inhibition is not unique to a single carbohydrate source. Notably, sucrose produces the strongest suppression, consistent with its rapid metabolic conversion and effectiveness in elevating internal glucose levels. These results support the notion that diverse dietary sugars converge on a common satiety-signaling mechanism, while our mechanistic analyses specifically identify glucose as the key ligand engaging the Hugin satiety circuit.

      We now clarify this distinction in the revised Discussion.

      • Testing Gr64f neurons or mutants (3).

      Our results indicate that energy sensing in the CNS suppresses sweet-sensing neuron activity (e.g., via hyperpolarization) rather than directly blocking sugar binding to receptors. Thus, sweet perception—not sugar detection—is inhibited. As evidence, in Figure supplementary4 we measured the PER to fructose and trehalose. Although Gr5a and Gr64a differ in their sensitivity to these sugars, the CNS energy state consistently suppresses sweet perception for both. As Reviewer 3 noted, Gr5a and Gr64f are co-expressed in sweet neurons; while they respond to different sugars, their labeling of the neurons is largely equivalent.

      • Testing sugar preference (glucose vs. other sugars) (3)

      Since our primary goal was to identify a direct satiety-sensing and sensory-modulating circuit—the "brake" mechanism—PER served as the most suitable and mechanistically specific readout. While manipulation of the Hugin–AstA circuit influences internal state, and therefore likely alters long-term sugar preference, investigating the integration of this pathway with reward and post-ingestive signaling is a critical question that lies beyond the scope of the current study.

      • Cell type-specific knockout of NMU (3).

      Achieving a cell type-specific knockout of NMU using the Cre approach is not feasible in the short term. While previous studies have reported the role of NMU in the VMH region in regulating feeding, our contribution lies in revealing how these neurons sense energy. We also show that these neurons project to the vicinity of Calb2 neurons and that the neuropeptide can suppress Calb2 neuronal activity. This essentially demonstrates that the hugin–Gr5a pathway in Drosophila is conserved in mice. We believe that a detailed dissection of the precise circuitry in mice is more appropriate to address in a subsequent study.

      • Explanation of NMU detection in Figure 5K (3): this is GFP expressed by the Cre-dependent virus.

      We have revised the Figure 5K legend to clarify that NMU<sup>+</sup> neurons are labeled by GFP expression from a Cre-dependent AAV2/1-DIO-GFP, which undergoes anterograde trans-synaptic transfer. We further explain that GFP expression in rNST neurons requires local AAV-Cre injection, enabling identification of postsynaptic Calb2<sup>+</sup> target neurons.

      • Neuronal manipulation of NMU neurons by optogenetics or DREADD.

      Please see our responses to the question “Cell type-specific knockout of NMU.”

      Reviewer #1 (Recommendations for the authors):

      A major concern about the study is that the effect of genetic manipulations on Hugin/AstA system appears to account for only a small part of the dramatic shift of PER probability toward smaller concentrations of sucrose solutions among starved flies. In Figure 1B and E, PER probability is significantly higher among starved flies in response to 10-200mM of sucrose solutions than fed flies. Compared to this, RNAi knockdown of glucose transporter in hugin neurons (Figure 2C), PK2-R1 pan-neuronally (Figure 3C) or in AstA-releasing neurons (Figure 3G), AstA-R1 in Gr5a neurons (Figure 4E), systemic mutation of PK-R2 (Figure Supplement 10) and AstA-R1 (Figure Supplement 12) all produce relatively minor behavioral changes. Consistent with previous works, the mutation of TH causes a robust decrease of PER across the entire range of sucrose concentration tested (Figure Supplement 1).

      These discrepancies can be caused by many technical limitations that cannot be readily addressed. For instance, the large effect of TH can be confounded by the pleiotropic behavioral effect of the lack of dopamine. RNAi can suffer from incomplete elimination of targeted genes. However, the relatively small behavioral effect size of these manipulations cannot be entirely ignored in light of previous publications, which point to the importance of other neuromodulators such as dopamine, serotonin, Akh, and NPF, on sugar sensitivity (Marella et al., 2012; Inagaki et al., 2014; Yao et al., 2022), as well as other potentially parallel glucose-sensing systems, including Gr43a-expressing cells (Miyamoto et al., 2012) and sNPF-expressing CN neurons (Oh et al., 2019). While the neuropeptides initially tested (Figure 1) are not poor choices, it is a missed opportunity that so many other neuromodulators were excluded from the initial search.

      We appreciate the reviewer’s detailed analysis and agree that the magnitude of behavioral effects produced by manipulating the hugin–AstA pathway is smaller than the dramatic shift in PER observed under starvation conditions. This comparison is important and highlights a central conceptual point of our study.

      Starvation represents a compound physiological state that simultaneously engages multiple hunger-promoting neuromodulatory systems—most prominently dopaminergic and NPF pathways—while also releasing satiety-associated inhibitory signals. As shown previously and confirmed here (Figure supplementary 1), manipulation of dopamine synthesis produces a broad and robust reduction in PER across sucrose concentrations, consistent with its role as a powerful hunger-driven modulator.

      By contrast, our genetic manipulations specifically target a satiety-associated inhibitory circuit—the hugin–AstA pathway—that is selectively engaged by high internal glucose levels. Manipulating this pathway alone therefore isolates a single “brake” component of feeding regulation, rather than recapitulating the full physiological state of starvation, which combines both accelerator activation and brake release. Accordingly, the more modest behavioral effects we observe are an expected consequence of dissecting one defined regulatory module from a larger, cooperative network.

      We agree that multiple neuromodulators, including dopamine, serotonin, Akh, NPF, and others, as well as parallel glucose-sensing systems such as Gr43a-expressing cells and sNPF-expressing CN neurons, contribute to the regulation of sugar sensitivity. Rather than aiming to exhaustively screen all neuromodulators, our study was designed to identify and mechanistically define a central, glucose-responsive satiety sensor that directly links internal energy state to sweet taste modulation. In the revised discussion, we now explicitly position the hugin–AstA circuit as one essential, satiety-specific component within this broader regulatory landscape and discuss how it functionally complements previously characterized hunger-driven pathways.

      I am also confused by the results of Shibirets1-mediated silencing of Hugin and AstA neurons (Figure Supplement 13B, C). It is unclear to me why a feeding assay was used instead of PER, like the activation experiments. Feeding (ingestion) and PER are qualitatively different types of behavior, which cannot be directly compared. Moreover, the definition of "fold change" is not provided either in the figure legend or in the Materials and Methods section, making it difficult to understand what the figure means.

      We thank the reviewer for pointing out this important issue regarding the interpretation of the Shibire^ts1-mediated silencing experiments. We agree that proboscis extension reflex (PER) and feeding/ingestion assays reflect qualitatively different behavioral processes and should not be directly compared.

      In the original submission, feeding assays were used to assess the effect of neuronal silencing, which led to ambiguity when comparing these results with PER-based activation experiments. To directly address this concern and ensure consistency across behavioral readouts, we have now performed additional PER experiments under the same Shibire^ts1-mediated silencing conditions.

      These new data demonstrate that acute silencing of hugin neurons significantly enhances PER responses to sucrose (Figure supplementary 13B), indicating increased sweet sensitivity. This result is fully consistent with our activation experiments and supports the conclusion that the hugin–AstA pathway suppresses sweet taste perception under satiety conditions.

      In addition, we have revised the figure legend to explicitly define the “fold change” metric used in the behavioral analysis, clarifying how the values were calculated and normalized. Together, these changes resolve the ambiguity raised by the reviewer and strengthen the behavioral consistency of our conclusions.

      Of note, Marella et al. (2012) reported that silencing of Hugin-releasing neurons did not affect PER. It is therefore possible that the Hugin system is sufficient, but not necessary, for modulating PER under food deprivation.

      We agree that their observation—that silencing Hugin-releasing neurons does not alter PER in starved flies—is consistent with a state-dependent role of the Hugin system in feeding regulation.

      In starved animals, dopaminergic TH<sup>+</sup> neurons are strongly activated and promote high PER responsiveness, while circulating glucose levels are low, placing Hugin neurons in a relatively inactive state. Under such conditions, further silencing of Hugin neurons would be expected to produce minimal additional effects on PER, which likely explains the results reported by Marella et al.

      Importantly, our data show that preventing the starvation-associated reduction in Hugin neuronal activity—by thermogenetic activation of Hugin<sup>+</sup> neurons (Hugin–TrpA1; Figure 1D)—significantly suppresses the hunger-induced enhancement of PER. These results indicate that dynamic downregulation of Hugin neuronal activity is a critical component of the normal behavioral shift in sweet sensitivity in response to food deprivation. Thus, while Hugin neurons may not be required to further modulate PER once animals are already in a strongly starved state, their regulated activity change is essential for mediating state-dependent modulation of sweet taste behavior. We have added discussion in the revised manuscript.

      While no new experiments are requested, it is important for authors to acknowledge the limited effect size of Hugin/AstA manipulation. In the current manuscript, the authors briefly mention the previous works (lines 460-462, 472-474), which is insufficient. Discussions must include how the Hugin/AstA system may "complement these established mechanisms (line 460)" (described in the references listed above), under what situations this novel Hugin/AstA system can be relevant for controlling PER, and why the fly is equipped with seemingly redundant systems for sensing internal glucose levels and controlling feeding behavior. Without these discussions, it is difficult to recognize the novelty of the presented work. The data appears largely to be a minor and incremental progress on an already mature field.

      In the revised manuscript, we have substantially expanded the Discussion to explicitly acknowledge this limited effect size and to clarify the functional role of the Hugin–AstA pathway within the broader energy-regulatory network. We now emphasize that this circuit represents a satiety-specific inhibitory branch that complements, rather than replaces, previously described hunger-promoting systems such as dopaminergic, NPF, and AKH circuits.

      Importantly, we discuss the specific physiological conditions under which the Hugin–AstA system is most relevant—namely, post-feeding and high-glucose states. Unlike hunger circuits that amplify sweet sensitivity during starvation, the Hugin–AstA pathway directly senses circulating glucose and rapidly suppresses sweet taste perception when energy is sufficient, thereby acting as a brake to prevent overconsumption.

      We further address the apparent redundancy among internal sugar-sensing systems. Rather than being redundant, these pathways form a coordinated and layered network with distinct sugar specificities, temporal dynamics, and functional roles. For example, Gr43a<sup>+</sup> neurons primarily detect fructose, whereas hemolymph glucose represents the principal energetic currency in Drosophila. The use of multiple internal sugar sensors allows flies to fine-tune feeding decisions across different nutritional contexts and timescales.

      Finally, we expand the Discussion to highlight that although the Hugin–AstA circuit constitutes only one branch of the energy-sensing network, its disruption leads to excessive energy intake (Figure supplementary 13C-E, G) and increased fat accumulation (Figure S13F), underscoring its physiological relevance. We also discuss how this pathway likely interacts with other neuromodulatory systems, including TH<sup>+</sup> dopaminergic and NPF<sup>+</sup> neurons, to collectively orchestrate adaptive feeding behavior and energy homeostasis.

      Together, these additions clarify that our work does not simply add another neuromodulator to an already mature field, but instead identifies a distinct glucose-sensing, satiety-linked mechanism that fills a conceptual gap between internal energy state detection and sensory modulation.

      Another perceived weakness is the lack of subtype-level dissection among Hugin- and AstA-releasing neurons. I make a justified request to narrow down the behaviorally relevant neuron to one (or one type), which is based on a widespread but unreasonable and dangerous assumption that every behavior must be controlled by one neuron. However, the authors present very interesting data that only a subset of Hugin- and AstA-releasing neurons responds to higher levels of sucrose (Figure 1H, Figure Supplement 7A, B), which leads to a hypothesis that a specific subtype within each peptidergic neuronal group is responsible for starvation-induced behavioral change. The authors only briefly touch upon this (lines 217-218), but this is an important hypothesis that requires further discussion.

      We thank the reviewer for highlighting the importance of neuronal heterogeneity within the Hugin- and AstA-releasing populations. We fully agree that the observation that only a subset of Hugin<sup>+</sup> and AstA<sup>+</sup> neurons responds to elevated sucrose levels (Figure 1H; Figure Supplement 7A, B) strongly suggests functional specialization within these peptidergic groups.

      In the revised Discussion, we now explicitly propose that distinct subtypes of Hugin and AstA neurons differentially contribute to energy sensing and feeding modulation. We suggest that glucose-responsive subpopulations may be specifically engaged in satiety signaling, whereas other neurons within the same genetic classes may participate in additional physiological or behavioral processes. This heterogeneity provides a plausible explanation for the partial behavioral effects observed following population-level manipulations. Although we did not perform subtype-specific perturbations in this study, our findings provide a foundation for identifying these subtypes in future work using split-GAL4 lines and connectomic datasets.

      These issues are more important than the sprawling and unfocused review of various hunger and satiety-controlling systems across species in the Introduction. Lines 53-108 contain only tangential information to the main conclusion of the paper. Both the Introduction and Discussion sections must be completely restructured so that readers understand what is already known about hunger-induced changes in feeding-related behavior, what is a missing gap of knowledge in neural mechanisms controlling behavioral adaptation under starvation, and why Hugin/NMU is an interesting target in this context.

      We thank the reviewer for this important structural critique. We agree that, in the original manuscript, the Introduction placed disproportionate emphasis on a broad survey of hunger- and satiety-regulating systems across species, which may have obscured the central conceptual advance of this study.

      In the revised manuscript, we have substantially restructured both the Introduction and the Discussion to sharpen the narrative focus and clarify the specific knowledge gap addressed by our work.

      First, the Introduction has been streamlined to focus on what is already known about hunger-induced modulation of feeding-related behaviors, particularly sweet taste sensitivity and PER in Drosophila. We now emphasize that prior studies have predominantly characterized hunger-activated, feeding-promoting pathways (e.g., dopaminergic, NPF, AKH systems) that act as accelerators of food-seeking behavior.

      Second, we explicitly define the missing gap in knowledge: while hunger-driven mechanisms are well studied, it remains unclear how satiety states—specifically elevated internal glucose levels—are directly sensed by central neurons and translated into suppression of sensory gain and feeding behavior.

      Third, we reposition Hugin/NMU as an attractive and conceptually distinct target because of its peptidergic nature, evolutionary conservation, and previously reported but mechanistically unresolved links to feeding regulation. This framing motivates our central question: whether Hugin/NMU neurons function as a direct internal energy sensor that actively implements a satiety-specific inhibitory control over taste perception.

      In parallel, the Discussion has been reorganized to avoid an unfocused review of feeding circuits across species and instead to interpret our findings within a clear conceptual framework. We now emphasize that the Hugin–AstA (and NMU) pathway represents a satiety-driven “brake” that complements, rather than duplicates, established hunger-driven “accelerator” circuits. This restructuring clarifies both the novelty of our findings and their relevance within the existing literature.

      Reviewer #2 (Recommendations for the authors):

      When discussing the results of Figure 1, such as lines 203-204, "These results demonstrate that sugar intake inhibits sweet sensation, probably via increasing circulating sugar levels" it may be worth discussing the known impact of sweet sensation experience on future sweet taste responses. With the data shown here, it is difficult to conclusively separate blood glucose levels from the sweet sensation that happens during the re-feeding. The "normal diet minus sucrose" does not blunt the starved PER effect, but that could potentially be impacted by either/both sugar intake or sweet taste.

      We thank the reviewer for this thoughtful and important point. We agree that sweet taste experience itself can influence subsequent sweet sensitivity, and that separating the contribution of sensory experience from nutrient-derived internal energy is non-trivial.

      In the revised manuscript, we have clarified the experimental timing by explicitly stating that PER was assessed 15 minutes after refeeding. At this time point, hemolymph glucose levels have returned to baseline (Figure supplementary 5), supporting the physiological relevance of glucose-dependent activation of Hugin neurons under our experimental conditions.

      We also acknowledge that sweet taste exposure can induce sensory adaptation and modulate future taste responses. To directly address this potential confound, we performed additional control experiments during revision (Figure supplementary 4B) in which starved flies were refed with sorbitol (caloric but not sweet) or arabinose (sweet but non-nutritive). We found that both manipulations partially reduced PER, but neither recapitulated the full suppressive effect of sucrose refeeding.

      These results indicate that sweet taste experience and metabolic energy contribute in parallel to the regulation of sweet sensitivity. Importantly, the incomplete effects of sorbitol or arabinose alone suggest that neither sensory adaptation nor caloric value is sufficient by itself to fully account for the observed PER suppression.

      Accordingly, we have revised the Discussion to clarify that the Hugin–AstA pathway likely operates within a broader, multi-layered regulatory framework, integrating internal metabolic state with sensory experience, rather than acting as a sole determinant of post-feeding sweet sensitivity. This clarification avoids over-attribution of the behavioral effect to circulating glucose alone while preserving the central conclusion that internal energy state is a key modulator of sweet perception.

      Blocking cellular sugar intake or metabolism could be impacting the ability of neurons to function, distinct from any specific intracellular regulatory mechanism that glucose or its derivatives might be involved with. That may be a caveat worth mentioning in the results or discussion.

      We thank the reviewer for raising this important caveat. We agree that blocking cellular sugar uptake or metabolism could, in principle, impair neuronal function in a nonspecific manner, independent of any dedicated intracellular glucose-sensing mechanism.

      In the revised manuscript, we now explicitly acknowledge this possibility and clarify the scope of our interpretation. Several features of our data argue against a generalized loss of neuronal function as the primary explanation. First, the behavioral and physiological effects observed upon manipulation of glucose transport or K<sub>ATP</sub> channel activity are rapid and reversible, consistent with state-dependent modulation rather than chronic metabolic failure. Second, these manipulations selectively affect sweet sensitivity and feeding-related behaviors, without causing gross deficits in proboscis extension or neuronal responsiveness.

      Accordingly, we have revised the Results to emphasize that while intracellular glucose metabolism is required for normal neuronal activity, our findings specifically support a role for glucose-dependent modulation of neuronal excitability in satiety signaling, rather than a nonspecific energetic impairment.

      Minor suggestions:

      (1) Figure 2G: "Pryuvate" -> "Pyruvate."

      We have corrected “Pryuvate” to “Pyruvate”

      (2) "Fly" methods section: it says that flies were kept on 2% agar for 12 hours for starvation, but in the Figure 1A description, it says 24 hours.

      We have corrected the description in Figure 1A.

      Reviewer #3 (Recommendations for the authors):

      (1) SEZ Hugin+ and AstA+ neurons were activated by glucose (Figures 1G, 1I), yet hemolymph also contains trehalose and fructose. For instance, DH44 neurons respond broadly to all hemolymph sugars (Dus et al., 2015), while Gr43a neurons specifically detect fructose (Miyamoto et al., 2012). The present study does not clarify whether Hugin+ or AstA+ neurons are similarly sugar-specific or more broadly tuned. A systematic analysis is needed to determine whether these circuits are selective for glucose.

      We thank the reviewer for raising this important question regarding sugar specificity. We agree that hemolymph contains multiple sugars, including trehalose and fructose, and that distinct neural systems have been shown to differ in their tuning breadth. To address this issue, we performed additional experiments during revision in which starved wild-type flies were refed with different sugars—including sucrose, fructose, trehalose, and sorbitol—followed by PER measurements. We found that sucrose refeeding produced the strongest suppression of PER, whereas fructose, trehalose, and sorbitol induced weaker effects (Figuresupplementary 4A).

      We interpret these results as suggesting a preferential sensitivity of the Hugin/AstA pathway to glucose availability rather than a broad responsiveness to all circulating sugars. One plausible explanation is that fructose, trehalose, and sorbitol require peripheral metabolic conversion before contributing to intracellular glucose levels in neurons, whereas sucrose feeding rapidly restores hemolymph glucose within the 15-minute time window used in our experiments (Figure supplementary 5).

      Importantly, we now clarify in the revised Results and Discussion that our data support a functional preference for glucose under physiological conditions, rather than excluding the possibility that other sugars may influence this circuit indirectly or on longer timescales.

      (2) The authors state that SEZ, but not VNC, Hugin+ neurons regulate AstA activity (lines 318-319). However, comparison of Figure Supplement 8B with the severing sample in Figure Supplement 11B shows a more pronounced reduction of sweet sensation under hug>TrpA1 activation. Although the absolute response in Figure 3F (in vivo) is higher than that in the cut-off preparation (Figure S11), comparison of Figure S11C with Figure 3F indicates that hug+ neurons drive an AstA+ calcium transient more than fourfold greater in the presence of VNC neurons. Thus, the contribution of Hugin+ VNC neurons cannot be dismissed, and the conclusion should be revised accordingly.

      We thank the reviewer for this careful and quantitative comparison. We agree that our original wording overstated the exclusivity of SEZ Hugin<sup>+</sup> neurons in regulating AstA activity.

      Upon closer examination of the data, we now acknowledge that VNC Hugin<sup>+</sup> neurons likely contribute to AstA activation. As the reviewer points out, the AstA<sup>+</sup> calcium response evoked by Hugin activation is substantially larger when VNC neurons are intact (Figure supplementary11C) compared with the cut preparation (Figure 3F), indicating that descending inputs from the VNC can potentiate AstA neuronal activity.

      Accordingly, we have revised the manuscript to state that SEZ Hugin<sup>+</sup> neurons play a predominant role in driving AstA responses relevant to sweet sensation, while VNC Hugin<sup>+</sup> neurons provide additional modulatory input that enhances the overall magnitude of Hugin signaling. These revisions have been made in the Results to more accurately reflect the contributions of distinct Hugin subpopulations.

      (3) In Figure 4D, you show AstA-R1 co-localized with Gr5a-expressing cells. However, Gr5a-expressing cells also co-express Gr64f in labellum (Fuji et al., 2015, Current Biology). Are the authors sure that the sweet sensation they described is Gr5a-specific? Testing Gr64f is essential. Moreover, Fuji et al. demonstrated that Gr5a loss-of-function mutation impairs not only sucrose but also maltose, fructose, and trehalose sensation. This raises a question of whether the Hug+ and AstA+ neurons identified in the current study contribute to sensing sugars beyond sucrose. Additional experiments are required to clarify this point.

      Please see our responses to the Reviewing Editor Comments (4).

      (4) While nutritive sugar sensors such as Dh44 neurons have been directly implicated in sugar preference (Dus et al., 2015, Neuron), this study examines the hug+,AstA+, Gr5a neuronal circuit only in the context of PER responses. Why is sugar preference not assessed here, especially given that in mice, the comparison was made using preference tests?

      We thank the reviewer for this insightful question. We agree that sugar preference assays provide important information about feeding decisions and reward-based behavior. In the present study, however, we deliberately focused on the proboscis extension reflex (PER) because it offers a direct, quantitative, and temporally precise readout of sweet sensory sensitivity at the sensory–motor level.

      PER allows us to isolate changes in taste perception itself, largely independent of post-ingestive reinforcement, learning, or motivational state, all of which strongly influence preference-based assays. This distinction is particularly important given our central goal of identifying a circuit that directly links internal energy sensing to modulation of peripheral sweet-sensing neurons.

      By contrast, sugar preference reflects an integrated behavioral outcome combining sensory input, internal state, and post-ingestive reward signals, including those mediated by DH44 neurons and other nutritive sensing pathways. We therefore chose PER as the most mechanistically specific assay to dissect the Hugin–AstA–Gr5a pathway. We now explicitly acknowledge in the revised Discussion that determining how this satiety-linked sensory modulation interacts with reward and post-ingestive circuits to shape long-term sugar preference will be an important direction for future studies.

      Several other concerns:

      (5) The intraperitoneal injection of NMU is interpreted as reflecting a brain-specific NMU effect, but such systemic delivery cannot exclude peripheral actions. In Figure 5D, the use of whole-body KO mice is insufficient; targeted manipulations (e.g., NMU-Cre-driven inactivation) are required to establish circuit-specific behavioral roles.

      Please see our responses to the Reviewing Editor Comments (Low priority)

      (6) In Figure 5F and 5M, neural activity is measured under different conditions: gastric glucose infusion in 5F versus glucose licking in 5M. To establish that NMU VMH neurons and Calb2 rNST neurons belong to the same circuit, this discrepancy in stimulation timing must be resolved to support the conclusions.

      We thank the reviewer for pointing out this important issue regarding stimulation paradigms in Figures 5F and 5M. We agree that the difference between gastric glucose infusion and glucose licking requires explicit clarification.

      In the revised manuscript, we now clearly state that these two paradigms were intentionally designed to probe complementary levels of the same NMU–Calb2 circuit. In Figure 5F, gastric glucose infusion was used to isolate the internal energy-sensing property of VMH NMU<sup>+</sup> neurons, independent of oral sensory input, motor behavior, or reward expectation. This experiment establishes that NMU<sup>+</sup> neurons are directly activated by elevated circulating glucose.

      By contrast, Figures 5M examined how activation of this NMU pathway modulates downstream Calb2<sup>+</sup> rNST neurons under physiologically relevant feeding conditions, in which sweet taste signals are naturally evoked by licking. This design allows us to test the functional consequence of NMU signaling on sweet-responsive rNST neurons during normal sensory processing.

      Although the route and timing of glucose delivery differ, both paradigms converge on a unified circuit model: internal glucose elevation activates VMH NMU<sup>+</sup> neurons, and NMU signaling suppresses sweet-driven activity in Calb2<sup>+</sup> rNST neurons. We have revised the Results and figure legends to explicitly describe this layered experimental logic and to clarify that Figures 5F and 5M together establish distinct but connected nodes of the same circuit.

      (7) Figure 5I-J. The glucose concentration used appears excessively high. In mammals, blood glucose in the sated state is ~7-8 mM. It is unclear whether the observed responses represent physiological effects or artifacts of supraphysiological stimulation. Additional experiments with lower glucose concentrations would strengthen the study.

      We thank the reviewer for raising this important concern regarding the glucose concentration used in Figure 5I–J. We agree that the concentration applied in ex vivo slice experiments exceeds the typical physiological range of circulating glucose.

      This higher concentration was intentionally chosen to ensure reliable neuronal activation in acute brain slices, where glucose diffusion, uptake, and metabolic access are substantially slower than in vivo. Similar approaches have been widely used in studies of glucose-sensitive hypothalamic neurons to overcome these technical limitations (e.g., Kim et al., 2025., Neuron).

      Importantly, the physiological relevance of our findings is supported by in vivo fiber photometry experiments, which demonstrate that VMH NMU⁺ neurons are robustly activated following normal sugar ingestion under physiological conditions. Thus, while supraphysiological glucose was used to establish glucose responsiveness ex vivo, our in vivo data confirm that NMU⁺ neurons respond to glucose elevations within the normal physiological range.

      (8) Figure 5K. The VMH images are inconsistently oriented compared with Figure 5E, lacking a 3v landmark. The NMU detection method (IHC or FISH) is not specified in the legend. The GFP-Calb2 signal is heavily saturated, making it difficult to distinguish true signals from artifacts. These issues undermine interpretability.

      We thank the reviewer for pointing out these issues. In the revised manuscript, VMH images in Figure 5K have been reoriented to match Figure 5E, and the third ventricle (3v) is now indicated as an anatomical landmark. The figure legend has been revised to clarify that NMU<sup>+</sup> neurons are identified by GFP expression from a Cre-dependent AAV2/1-DIO-GFP injected into NMU-Cre mice, rather than by NMU immunohistochemistry or FISH. In addition, GFP–Calb2 images have been reprocessed to clearly distinguish true signals from background and imaging artifacts.

      (9) Figure 5L-M. Details of the NMU injection method are absent (route, dose, delivery parameters). The number of animals (n) is also not reported. Furthermore, AUC reduction alone is not sufficient evidence of robust inhibition. To convincingly demonstrate causality, NMU-IRES-Cre mice should be combined with DREADD or optogenetic approaches to directly inhibit NMU neurons and test whether rNST Calb2 activity is reduced.

      We thank the reviewer for these helpful comments. We have revised the manuscript to include all missing methodological details. These details are now clearly described in the Methods section and figure legend.

      We fully acknowledge that cell-type–specific manipulations, such as DREADD or optogenetic inhibition of NMU neurons, would provide more definitive causal evidence. However, our main goal in the mouse experiments was to demonstrate that NMU<sup>+</sup> neurons can directly sense glucose and modulate sweet sensitivity, thereby supporting the evolutionary conservation of the Hugin mechanism identified in Drosophila. Detailed dissection of the downstream circuit architecture and behavioral consequences in mammals is indeed an important direction for future research, but it lies beyond the current study’s primary focus on cross-species conservation.

      (10) In Drosophila, hugin neurons respond selectively to nutritive glucose (Fig. 2H), but whether NMU neurons share this property is unknown. Notably, Calb2 neurons in the rNST respond to the artificial sweetener AceK (Hao Jin et al., 2021, Cell), leaving open whether the NMU-rNST circuit is calorie-dependent or calorie-independent.

      We have added a statement in the Discussion acknowledging this limitation and emphasizing that future work will be needed to test whether the NMU–Calb2 circuit is selectively engaged by metabolically active sugars or also by sweet taste signals independent of caloric value.

      Minor comments

      (11) All bar graphs should include individual data points.

      We have added individual data points to all bar graphs.

      (12) In Figures 3E, 4C, and 4D, it appears that a combination of GAL4 and LexA was used, but the information about the fly lines is missing.

      We have now included the complete list of fly lines used for these experiments, including their genotypes and sources.

      (13) The source for PK2-R1 KO, AstA-R1 KO fly lines and NMU-IRES-Cre, Calb2-IRES-Cre mice is missing.

      We have added the complete source information for all genetic lines mentioned.

      (14) Figure 5B-D, This is a sucrose preference test, so why is the y-axis labeled as glucose? Is this an error, or were the values converted to glucose equivalents?

      We thank the reviewer for catching this mistake. The assay shown in Figure 5B–D measured sucrose preference, not glucose preference. The inconsistency resulted from a typographical error in the Methods description. In the revised manuscript, we have corrected this error to clearly state that sucrose was used in the preference test,

      (15) Supplementary Figure 15. The NMU images are of poor quality and should be improved.

      The punctate appearance of NMU signals in Supplementary Figure 15 is not due to poor image quality but rather reflects the physiological distribution of the NMU neuropeptide. As NMU is stored in secretory vesicles within neuronal terminals and somata, its immunostaining typically appears as discrete puncta rather than diffuse cytoplasmic labeling.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.<br /> Readers would also benefit from noting that the mice were male and discussion of the exclusion of females.

      In the revised manuscript, we have included full statistical reporting for all key experiments in the resource data. Regarding animal sex, we confirm that all mouse experiments were conducted using male mice. This choice was made to minimize variability caused by hormonal cycles in females, which can influence feeding behavior and glucose metabolism. We have now explicitly stated this information in the Methods section and included a brief discussion noting that sex-specific differences in NMU–Calb2 circuitry and feeding regulation represent an important question for future investigation.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC- 2025-03341

      Corresponding author(s): Thomas, Leonard

      1. General Statements [optional]* *

      The reviews are positive, constructive, and balanced. The reviewers highlighted the novelty, scope, technical rigor, and strength of evidence of the study. The reviewers also noted the technological advance in modeling of multi-domain proteins that we report. In summary, there are two major advances reported in this study, both of which have important implications, both within the field of lipid signaling and in the broader field of in silico structural modeling.

      Lipid signaling. We have elucidated the mechanism by which a protein kinase is allosterically activated by a specific lipid second messenger (PIP3) at atomic resolution. To the best of our knowledge, this has not been achieved for any kinase to date. Our findings have implications for (a) the spatial and temporal confinement of Tec signaling in cells by PIP3, (b) the rationalization of disease-causing mutations in XLA, and (c) the development of novel therapeutics that could be of clinical value in the treatment of B-cell malignancies. As such, we believe that this study will be of interest to a wide spectrum of basic scientists in the cell signaling community, as well as translational, and clinical scientists.

      __In silico structural modeling. __Whilst developed primarily to answer the biological question of PIP3-mediated activation of the Tec kinases (see above), the improvement in AlphaFold modeling that we report has significant implications for all scientists concerned with structural modeling in silico, specifically with respect to the modeling of both multi-domain proteins and protein complexes. Given the widespread adoption of AlphaFold as a hypothesis generator, the audience for which these developments are relevant is actually very large, transcending all fields of the biological sciences.

      2. Description of the planned revisions

      • *The major suggestion made by reviewers #2 and #3 was the inclusion of a negative control in the lipid nanodisc assays (Figure 5) to confirm that it is PIP3 that specifically activates MbTEC. This is a constructive and valuable addition to our study, particularly in light of the fact that PI(4,5)P2 is present in cells at 2-4 orders of magnitude greater concentration than PIP3. This experiment will be combined with reviewer #2's suggestion to perform a PIP3 titration in the lipid nanodiscs.

      • *

      Reviewer #2____

      Although the nanodisc experiments clearly show PIP3-dependent activation, titrating the PIP3 content in nanodiscs (e.g., 0.1%, 0.5%, 1%, 3%, 5% of PIP3) to determine whether MbTEC activation shows a graded response to lipid abundance would strengthen the conclusions. This would support the suggested allosteric mechanism and aid in differentiating between digital and analogue activation behaviour.

      • We thank the reviewer for the nice suggestion, which we will combine with the negative control suggested by the reviewer in the next comment.

      A good negative control for Figure 5C, would be a nanodisc containing another phosphoinositide. Given prior evidence that TEC-family PH domains display selectivity for PIP3, it would nevertheless be informative to test nanodiscs containing other phosphoinositides (e.g., PI(4,5)P2, PI(3,4)P2, and PI3P).

      • See response above. Reviewer #3

      Fig 5B/C: The nanodisc experiment lack some controls. In order to conclude that PIP3 is indeed critical for the observed enhance autophosphorylation of MbTEC, nanodiscs with e.g. PI3P, PI4P or PI5P should be used that are not expected to bind the MbTEC PH domain with high affinity. Likewise, or alternatively, a mutant PH domain with largely reduced PIP3 binding affinity would support trust in this central result of the paper. (estimated time investment: 1-2 months).

      • We appreciate the reviewer's suggestion, which was also proposed by reviewer #2. These experiments are planned as the number one priority (see response above).

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1

      Major comment: We think the proposal is overall coherent and reasonable and found it interesting. It is not, however, conclusive. Modeling played a key role in supporting this proposal, but the modelling itself was dependent on choices of parameters made by the authors. The reported AlphaFold 3 model depended on a customized MSA strategy: the authors report divergent placement of the PH domain with respect to the kinase domain in their AlphaFold 3 runs. In light of this observation they used a manually curated TEC family MSA with taxonomic reweighting. This helped the model convergence but it introduced arbitrarity in the modeling step.

      • We believe that it is necessary to clarify what exactly our custom AF pipeline does so as to avoid confusion, but also to render our work more impactful for future studies that employ AlphaFold. The divergent placement of the PH domain in AF's standard configuration arises from the inclusion of sequences in the MSA that do not belong to the Tec gene family (Supplementary Figure 4C) but are structurally related at the individual domain level and therefore identified by the profile Hidden Markov Models used by AF to generate deep MSAs. These sequences are unrelated to Tec phylogenetically and therefore have evolved under different selection pressures. What our custom pipeline does is exclude these sequences from the MSA, such that the evolutionary covariance signature exploited by AF to guide inter-residue distance restraints comes only from bona fide Tec sequences. In a second step, we sample the sequences to ensure taxonomic balance (sequence databases are heavily biased in terms of taxonomic representation). This step increases sequence diversity and, with it, the strength of the co-variance signal. Therefore, rather than introducing "arbitrariness" in the modeling, we actually reduce it.
      • Since the advance that we report in modeling multi-domain proteins with AlphaFold is applicable to all multi-domain proteins and protein complexes, we believe that it is valuable to convey the significance of the input MSA in as clear a fashion as possible. To illustrate why AlphaFold fails in its standard configuration, we have therefore performed an in silico analysis of the MSA automatically generated by AF when it is prompted to predict the structure of MbTec. We now include this analysis as a new Supplementary Figure (Supplementary Figure 4C). As can be seen, of the 50,000 sequences in the AF3-generated MSA, only 1,898 contain the complete set of regulatory PH, SH3, SH2 and kinase domains that characterize the Tec kinases. The remaining 48,102 sequences, while containing one or more of the individual domains found in Tec, are phylogenetically unrelated. This means that the co-variance signature that AF relies upon for accurate prediction of inter-domain interactions is contained in

        Minor comment: In two places the authors wrote "PIP3 is necessary and sufficient for both MbTEC activation and inactivation." This seems logically impossible. Revision is required.

      • We appreciate the reviewer's confusion here. This conclusion stemmed from the observation that PIP3 engagement is sufficient to promote full activation of MbTec on lipid nanodiscs in vitro (the synergistic effect of the hydrophobic stack mutation is lost in this context due to the presence of the polyproline motif in the PH-SH3 linker). However, in vivo, the SH2 domain is essential for BTK activation (by mediating its recruitment to activated receptors) and therefore it is incorrect to state that PIP3 is necessary and sufficient. It is necessary, but not sufficient - this is, again, analogous to an AND gate in an electronic circuit. We have revised the manuscript accordingly. Significance

      It attempted to clarify the role of the PH domain in TEC activation from a mechanistic perspective. If confirmed, it can potentially lead to novel approaches of drug discovery targeting TEC kinases.

      • Whilst we shied away from a discussion of therapeutic potential in our discussion to avoid unnecessary hype, the reviewer raises an important point, especially in light of the recent clinical success of BTK inhibitors in treating B-cell malignancies. As such, we have used the request made by Reviewer #2 to compare MbTec with Akt to highlight the potential for a new therapeutic modality in Tec kinase inhibition. The recent FDA approval of Capivasertib (November, 2023), an allosteric inhibitor of Akt, for the treatment of hormone-receptor (HR) positive, HER2-negative advanced or metastatic breast cancer provides a nice proof-of-concept. This discussion can be found in the response to Reviewer #2. Reviewer #3 also alluded to the "blockbuster drugs" used to treat B-cell malignancies, so we felt it appropriate to at least comment on the potential implications of our findings for the development of novel therapeutics. Reviewer #2

      • The inference for Figure 3 that PH domain exerts a strong autoinhibitory influence on kinase activity that cannot be overcome by disruption of the SH3-kinase interaction would benefit from further clarification. It is not immediately clear from the data that PH-domain-mediated inhibition should be seen as dominant rather than synergistic with SH3-kinase linker interactions. Although the autophosphorylation stoichiometry was measured for MbTEC32K L396A and MbTECFL L396A, a more thorough quantitative evaluation of the relative contributions of PH-domain removal versus SH3-linker disruption would be possible if this analysis were extended to MbTEC32K. Discussing whether these inhibitory components might instead work together/cooperatively to limit kinase activity or is it one dominant over the other , the authors are urged to thoroughly explain the reasoning behind the conclusion provided.

      • The reviewer raises an interesting question regarding the relative contributions of the various regulatory domains to autoinhibition. Ultimately, what our data show, both for MbTec autophosphorylation and substrate phosphorylation, is that disruption of the SH3-kinase interface results in kinase activation. The amplitude of the activation, however, is dependent on whether the PH domain is present or not. In the presence of the PH domain, the activation is very modest, whereas when it is removed, the amplitude is an order of magnitude greater. This reflects the fact that SH3 domain displacement without PH domain displacement does not permit acquisition of a conformation compatible with activation loop autophosphorylation. This implies that PIP3-dependent allosteric activation is a prerequisite for complete activation of Tec. PH domain deletion is also not permissive for complete activation, which requires SH3 domain displacement on top to drive autophosphorylation, an observation consistent with previous experimental data on Src. As the reviewer indicates, these are synergistic with one another - Tec is a coincidence detector of multiple signals, all of which are required for full activation. Our conclusion that the inhibitory influence of the PH domain cannot be overcome by displacement of the SH3 and SH2 domain, however, is important, since it strongly implies that PIP3 is necessary for Tec activation (i.e. that Tec is an AND gate and not an OR gate). We have revised our description of these results to better reflect the relative contributions of the various regulator domains:

      "These observations indicate that the PH and SH3 domains exert synergistic inhibitory effects on the kinase domain and that disengagement of both domains by ligand binding is required for complete activation of MbTec. This is the equivalent of an AND gate in an electronic circuit, as opposed to an OR gate."

      It would also be valuable if the authors in the discussion section can draw a contrast with PIP3-dependent activation mechanism of AKT. This would be helpful in highlighting the uniqueness of PIP3 dependent TEC activation.

      • We thank the reviewer for highlighting the value of comparing MbTec to Akt, for which the activation mechanism has been intensively studied, both in our lab and in many others. There are, indeed, some interesting similarities, which we now comment on in the following paragraph, which has been incorporated into our discussion section: "It is worth noting that the regulation of MbTec by PIP3 is analogous, although not entirely homologous, to the regulation of the Ser/Thr kinases Akt and PDK1. Like Tec, Akt and PDK1 contain PIP3-sensing PH domains which mediate autoinhibition of their respective kinase domains (PMIDs: 28157504 and 35387990). Although the autoinhibitory interfaces of Tec and Akt are structurally different, both interfaces impair activation loop phosphorylation and substrate binding, as well as PIP3 binding (PMIDs: 28157504, 29632185, 3438531). The specific autoinhibitory conformation of Akt has been exploited in the development of allosteric inhibitors, which exhibit significantly improved on-target specificity and have recently been approved for the treatment of cancer (PMID: 38592948). As such, our findings open a new potential therapeutic modality for the development of selective Tec kinase inhibitors. Given the recent success of ATP-competitive BTK inhibitors in treating B-cell malignancies (PMIDs: 26639149, 36511784), there is enormous therapeutic potential."

      *Minor Comments

      *

      Y579 and R581 comes without a significant context. Can the authors elaborate on these residues a bit.

      • We have tried to better introduce the rationale behind mutation of these residues by rephrasing this part of the results. The changes from the previous version are underlined:

      "Consistent with the loss of an energetically favorable interface, deletion of the PH domain resulted in a 6{degree sign}C reduction in thermal stability (Figure 2F, Supplementary Figure 6C). We next tested the specificity of the predicted PH-kinase interaction by mutating Y579 and R581, which are conserved residues in the interface (Figure 2G). Mutation of Y579 and R581 to alanine reduced thermal stability by 3{degree sign}C, while their mutation to asparate and glutamate respectively resulted in the same thermal stability as MbTEC32K lacking its PH domain (Figure 2F, Supplementary Figure 6D). These observations indicate that substitution of Y579 and R581 with alanine weakens the autoinhibitory conformation by reducing van der Waals contacts, but substitution with charged residues that introduce unfavorable interactions is sufficient to completely disrupt the interface. Consistently, MbTEC32K bound to the PH domain with an affinity of 4.0 mM, but binding of MbTEC32K Y579D R581E was barely detected (Figure 2H)." +

      Figure 2H - In the legend make wt as WT so that it matches the figure panel

      • Fixed.
      • Supplementary Figure 1J - Adjust the orientation of intensity on y axis

      • Fixed (now Supplementary Figure 2J).

      • Supplementary Figure 1H - In the figure it should be Y579 and R581

      • Fixed (now Supplementary Figure 2H).

      • Can the authors add that 5C is the representative autoradiographs for each construct from panel 5B. Make it clear.

      • Fixed.

      • Write the units for intensity on the y axis for the entire supplementary figure 1 • Supplementary Figure 2J and 2K - Make the 6 subscript in the legend for Gly 6.

      • Fixed (now Supplementary Figure 3J-K).

      • Can the authors include RRID wherever applicable in the methods section.

      • We have added in the RRID reference for the cell line employed in this study.

      • Include a space between i and was in the sentence " Each sequence iwas assigned a raw weight .

      • Fixed.

      • I think MSA is coming twice in the line above structure inference in the methods section. MSAs is repeating after balanced MSA. Kindly look into it.

      • Fixed.

        The work has been done using the TEC kinase from the choanoflagellate M.brevicollis, presumably for practical reasons of expression and purification. PIP3 signalling, to my knowledge, has not formally been demonstrated in choanoflagellates. This remains a concern in respect of the relevance of these findings to true metazoans which is the setting in which Class I PI3kinase generated PIP3 signalling is seen.

      • We appreciate the reviewer's concerns regarding the relevance of our findings to PIP3 signaling in metazoans. Whilst the production and sensing of PIP3 has not formally been demonstrated in a choanoflagellate, we believe that sufficient circumstantial evidence exists that should allay these concerns. Specifically:

      • Evolutionary evidence exists for the presence of the PI3K machinery in the last eukaryotic ancestor (LECA) (PMID: 26482564), approximately 1.2-1.8 billion years ago. Choanoflagellates, are, by comparison quite young (600-650 My).
      • Choanoflagellates have an extensive tyrosine kinase signaling network, including RTKs (PMID: 18621719)
      • PI3K/PIP3/PTEN signaling has been robustly demonstrated in organisms that predate choanoflagellates by hundreds of millions of years, including Amoebozoa e.g. D. discoideum and E. histolytica (PMIDs: 9778249, 11352940, 12062103, 12062104, 12802064).
      • Monosiga brevicollis encodes:
      • class I PI3K p110 and p85 homologs (Manning et al, PNAS 2008)
      • a PTEN homolog
      • note that class I PI3Kd is responsible for the plasma membrane PIP3 signal in metazoan immune cells, meaning that a homolog of this enzyme is present in choanoflagellates
      • Choanoflagellates encode homologs of metazoan proteins that are known to respond specifically to PIP3, including:
      • MbTec
      • PDK1 (NCBI Reference Sequence: XP_004995400.1)
      • Akt (NCBI Reference Sequence: XP_001743446.1)
      • A recent kinase inhibitor screen in the choanoflagellate S. rosetta revealed the activity of known PI3K inhibitors (regulation of growth, phosphotyrosine signaling etc) (PMID: 40226336)
      • Conclusion: choanoflagellates inherited an ancient lipid-signaling toolkit.
      • Nevertheless, we believe that the reviewer makes an important point that is important to clarify for the uninitiated reader. We therefore propose the following additional paragraph to our discussion section that deals explicitly with these concerns:

      "Although PIP3 signaling has not been explicitly demonstrated in a choanoflagellate, the machineries for its production predate choanoflagellates by at least 500 My (PMID: 26482564). PI3K-mediated production, PH domain-mediated sensing, and PTEN-mediated degradation of PIP3 have all been robustly demonstrated to control chemotaxis in the slime mold Dictyostelium discoideum (PMIDs: 9778249, 11352940, 11389841, 12062103, 12062104, 12802064). While the Tec kinases emerged more recently (PMID: 30183386), PI3K, PTEN, PDK1, and Akt are all found in choanoflagellates, suggesting that choanoflagellates inherited an ancient lipid signaling toolkit and that the Tec kinases were a novel evolutionary addition to the toolbox."

      Reviewer #3____

      Points to be addressed:

      Fig 1B: For the sequence alignment, a few more residues before/after the four critical selected residues should be shown. This allows the reader to evaluate how conserved these residues really are. (estimated time investment: ~1 day max.)

      • Figure 1B is not actually a conventional sequence alignment, since it shows four residues that are structurally related, but not found in a contiguous sequence. However, we have added a new Supplementary Figure panel (Supplementary Figure 1A) to show the sequence motifs for each residue.

        Fig. 2 I/J/K: It is more customary to show HDX-MS results mapped on a structural cartoon representation (and not surface representation). The current representation makes it impossible to see which functional areas of the different domains show increased/decreased HDX. In addition, mapping HDX changes on a linear sequence/sec structure plot (as also commonly used to represent HDX-MS data) should be shown in SI. (estimated time investment: Reviewer #1

      This is important because the whole thesis of this manuscript rest on the model's suggestion that the kinase domain sequesters the PIP3 binding site of the PH domain. The authors found that in cells full-length MbTEC transiently associated with the membrane but the isolated PH domain enjoyed more prolonged membrane association. The authors interpreted this difference in membrane association in terms of different sequestration of the PIP3-binding PH domain by the kinase domain, but the PH-kinase interaction is based on a model and it needs further validation.

      • Model validation, particularly in the era of AlphaFold, is critical, as the reviewer correctly notes. However, we dispute the reviewer's assertion that the PH-kinase interface derived from our model needs further validation. The following is a summary of all the orthogonal ways in which we validated the model. In terms of publishing standards, we believe we have exceeded what is widely accepted as robust evidence for a specific interface.
      • The pair-alignment error (PAE) plot (Figure 1H) exhibits prediction errors in the PH-kinase interface which are (a) extremely low and (b) comparable with those in the SH3-kinase, SH2-kinase, and SH3-SH2 interfaces, all of which are superimposable with experimental structures.
      • Comparison of the model with experimental small-angle X-ray scattering (SAXS) in solution revealed a near-perfect fit (Figure 2A). This demonstrates that the global conformation of the model is an accurate reflection of the conformation of MbTEC in solution.
      • Mutation of the interface on the kinase side leads to a loss of thermal stability equivalent to deletion of the PH domain (Figure 2F-G) and a failure to bind the PH domain in trans (Figure 2H).
      • Changes in HDX-MS of the interface-mutated protein (Figure 2I-L) are comparable to those in the PH domain-deleted construct (Supplementary Figure 6E-J).
      • Reciprocal mutation of the interface on the PH domain leads to a reduction in binding affinity for the SH3-SH2-kinase (32K) protein (Figure 4C).

      While autophosphorylation is dramatically enhanced by PIP3 containing nanodiscs, the interpretation can be complicated, as the manuscript itself acknowledged that membrane based experiments cannot readily deconvolute local concentration effects from allosteric effects, because concentrating proteins on a membrane can promote dimerization dependent autophosphorylation.

      • It is precisely for these reasons that we conducted the experiments detailed in Figure 3, since they do not convolute allosteric activation with local concentration on a membrane. These experiments underpin our conclusions that MbTec is specifically activated by dissociation of its PH domain from the kinase domain and not just by local concentration on a PIP3-containing membrane. Whilst the experiments in Figure 3 do not say anything about the specificity of the PH-kinase interface (which we addressed with other experiments), they unambiguously confirm the inhibitory effect of the PH domain that other studies have reported previously. Reviewer #2

      To elaborate on the point of sufficiency, can the authors utilise FRB-FKBP system to synthesize PIP3 ectopically and see if it leads to the recruitment of FL and PH in addition to PDGF stimulation. It will also be valuable if the authors can use PI3K inhibitors post PDGF stimulation to validate this point further. A colocalization with PIP3 biosensor post PDGF stimulation will also be a great control.

      • The reviewer's suggestion to use the FKBP-FRB system to synthesize PIP3 ectopically is elegant but, in our opinion, not necessary. The specific recruitment of Tec kinases to the plasma membrane in response to growth factor-stimulated production of PIP3 is well established (e.g. Varnai et al, JBC 1999). As such, a PIP3 biosensor is not necessary, since the Tec kinases are well established PIP3 sensors in cells.
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Overall Summary

      The authors have tried to put forward a comprehensive structural and biochemical analysis of an ancestral TEC kinase from Monosiga brevicollis (MbTEC). The authors have used a wide array of state-of-the art approaches such as protein biochemistry, mutagenesis, thermal stability assays, SAXS, AlphaFold 3 modeling with curated MSAs, HDX-MS, kinase assays, lipid nanodiscs, mass spectrometry, and cell-based imaging to propose a detailed mechanism for autoinhibition of MbTEC and PIP3-dependent allosteric activation.

      Overall Comment

      The overall study presented is quite scientifically challenging and conceptually ambitious. The authors are to be commended for coming up with a manuscript with impressive technical rigor , experimental thoroughness and care with which the manuscript is written and presented.

      Major Comments

      1. The inference for Figure 3 that PH domain exerts a strong autoinhibitory influence on kinase activity that cannot be overcome by disruption of the SH3-kinase interaction would benefit from further clarification. It is not immediately clear from the data that PH-domain-mediated inhibition should be seen as dominant rather than synergistic with SH3-kinase linker interactions. Although the autophosphorylation stoichiometry was measured for MbTEC32K L396A and MbTECFL L396A, a more thorough quantitative evaluation of the relative contributions of PH-domain removal versus SH3-linker disruption would be possible if this analysis were extended to MbTEC32K. Discussing whether these inhibitory components might instead work together/cooperatively to limit kinase activity or is it one dominant over the other , the authors are urged to thoroughly explain the reasoning behind the conclusion provided.
      2. To elaborate on the point of sufficiency, can the authors utilise FRB-FKBP system to synthesize PIP3 ectopically and see if it leads to the recruitment of FL and PH in addition to PDGF stimulation. It will also be valuable if the authors can use PI3K inhibitors post PDGF stimulation to validate this point further. A colocalization with PIP3 biosensor post PDGF stimulation will also be a great control.
      3. Although the nanodisc experiments clearly show PIP3-dependent activation, titrating the PIP3 content in nanodiscs (e.g., 0.1%, 0.5%, 1%, 3%, 5% of PIP3) to determine whether MbTEC activation shows a graded response to lipid abundance would strengthen the conclusions. This would support the suggested allosteric mechanism and aid in differentiating between digital and analogue activation behaviour.
      4. A good negative control for Figure 5C, would be a nanodisc containing another phosphoinositide. Given prior evidence that TEC-family PH domains display selectivity for PIP3, it would nevertheless be informative to test nanodiscs containing other phosphoinositides (e.g., PI(4,5)P2, PI(3,4)P2, and PI3P)
      5. It would also be valuable if the authors in the discussion section can draw a contrast with PIP3-dependent activation mechanism of AKT . This would be helpful in highlighting the uniqueness of PIP3 dependent TEC activation.

      Minor Comments

      • Y579 and R581 comes without a significant context. Can the authors elaborate on these residues a bit.
      • Figure 2H - In the legend make wt as WT so that it matches the figure panel
      • Supplementary Figure 1J - Adjust the orientation of intensity on y axis
      • Supplementary Figure 1H - In the figure it should be Y579 and R581
      • Can the authors add that 5C is the representative autoradiographs for each construct from panel 5B. Make it clear.
      • Write the units for intensity on the y axis for the entire supplementary figure 1
      • Supplementary Figure 2J and 2K - Make the 6 subscript in the legend for Gly 6.
      • Can the authors include RRID wherever applicable in the methods section.
      • Include a space between i and was in the sentence " Each sequence iwas assigned a raw weight .
      • I think MSA is coming twice in the line above structure inference in the methods section. MSAs is repeating after balanced MSA. Kindly look into it.

      Significance

      General assessment

      This is a study on the TEC family of kinases that have an important role in the immune cells. Thus, alterations in their function is linked to both primary immunodeficiency as well as hematological malignancies. Thus understanding their mechanism of activation is of fundamental importance understanding protein kinase regulation as well as developing potential therapy for blood cell disorders.

      The work has been done using the TEC kinase from the choanoflagellate M.brevicollis, presumably for practical reasons of expression and purification. PIP3 signalling, to my knowledge, has not formally been demonstrated in choanoflagellates. This remains a concern in respect of the relevance of these findings to true metazoans which is the setting in which Class I PI3kinase generated PIP3 signalling is seen.

      Advance

      This study advances details of the molecular mechanism by which PIP3 interacts with and regulates TEC kinase function. This is a study in basic structural biology.

      Audience

      This study will be of interest to structural biologists and those with an interest in understanding phosphoinositide regulated protein function.

      My expertise

      Biochemistry and cell biology, phosphoinsoitde signalling

    1. Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions". There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled non-additive interaction discovery in machine learning models".

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

      Comments for the revision:

      I want to thank the authors for their efforts in revising the manuscript. Most of the concerns raised in the initial review have been adequately addressed.

      However, one important issue remains. I previously asked the authors to benchmark their method against stronger baselines. The authors declined, arguing that these alternatives are "not directly applicable to the types of analyses." I am not persuaded by this rationale. In my view, these baseline methods target essentially the same underlying problem, and at least some, if not all, should be included in a comparative evaluation (or the manuscript should provide a clearer, more technically grounded explanation of why such comparisons are not feasible or not meaningful).

    2. Reviewer #3 (Public review):

      Summary:

      Sethi and Zou present a new neural network to study the importance of epistatic interactions in pairs and groups of amino acids to the function of proteins. Their new model is validated on a small simulated data set, and then applied to 10 empirical data sets. Results show that epistatic interactions in groups of amino acids can be important to predict the phenotype of a protein, especially for sequences that are not very similar to the training data.

      Strengths:

      The manuscript relies on a novel neural network architecture that makes it easy to study specifically the contribution of interactions between 2, 3, 4 or more amino acids. The novel network architecture achieves such a level of interpretability without noticeable performance penalty. The study of 10 different protein families shows that there is variation among protein families in the importance of these interactions, and that higher order interactions are particularly important to predict the phenotypes of distant proteins.

      Weaknesses:

      The Github repository provides a README file to run a standard pipeline, but a user will need to go through the code to actually know what that pipeline is doing.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present an approach that uses the transformer architecture to model epistasis in deep mutational scanning datasets. This is an original and very interesting idea. Applying the approach to 10 datasets, they quantify the contribution of higher-order epistasis, showing that it varies quite extensively.

      Suggestions:

      (1) The approach taken is very interesting, but it is not particularly well placed in the context of recent related work. MAVE-NN, LANTERN, and MoCHI are all approaches that different labs have developed for inferring and fitting global epistasis functions to DMS datasets. MoCHI can also be used to infer multidimensional global epistasis (for example, folding and binding energies) and also pairwise (and higher order) specific interaction terms (see 10.1186/s13059-024-03444-y and 10.1371/journal.pcbi.1012132). It doesn't distract from the current work to better introduce these recent approaches in the introduction. A comparison of the different capabilities of the methods may also be helpful. It may also be interesting to compare the contributions to variance of 1st, 2nd, and higher-order interaction terms estimated by the Epistatic transformer and MoCHI.

      We thank the reviewer for the very thoughtful suggestion.

      Although these methods are conceptually related to our method, none of them can be realistically used to perform the type of inference we have done in the paper on most the datasets we used, as they all require explicitly enumerating the large number of interaction terms.

      We have included new text (Line 65-74) in the introduction to discuss the advantages and disadvantages of these models. We believe this has made our contribution better placed in the broader context of the field.

      (2) https://doi.org/10.1371/journal.pcbi.1004771 is another useful reference that relates different metrics of epistasis, including the useful distinction between biochemical/background-relative and backgroundaveraged epistasis.

      We have included this very relevant reference in the introduction. We also pointed out the limitation of these class of methods is that they typically require near combinatorically complete datasets and often have to rely on regularized regression to infer the parameters, making the inferred model parameters disconnected from their theoretical expectations. Line 49-56.

      (3) Which higher-order interactions are more important? Are there any mechanistic/structural insights?

      We thank the reviewer for pointing out this potential improvement. We have now included a detailed analysis of the GRB2-SH3 abundance landscape in the final section of the results. In particular, we estimated the contribution of individual amino acid sites to different orders (pairwise, 3-4th order, 4-8th order) of epistasis and discuss our finding in the context of the 3D structure of this domain. We also analyzed the sparsity of specific interactions among subsets of sites.

      Please see Results section “Architecture of specific epistasis for GRB2-SH3 abundance.”

      Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      We thank the reviewer for the positive feedback.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions." There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled nonadditive interaction discovery in machine learning models."

      We thank the reviewer for this very helpful comment. These references are indeed conceptually quite similar to our framework. Although they are not directly applicable to the types of analyses we performed in this paper (partitioning contribution of epistasis into different interaction orders in terms of variance components), we have included a discussion of these methods in the introduction (Line 70-74). We believe this helps better situate our method within the broader conceptual context of interpreting machine learning models for epistatic interactions.

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      Again, we thank the reviewer for the thoughtful comment. We have addressed this comment together with a related comment by Reviewer1 by including a detailed analysis of the GRB2-SH3 landscape using a marginal epistasis framework, where we quantified the contribution of individual sites to different orders of epistasis as well as the sparsity of epistatic interactions. We also present these results in the context of the structure of this protein. Please see Results section “Architecture of specific epistasis for GRB2-SH3 abundance.”

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      We agree that the under parameterization of the simple sigmoid function could be be potentially confounding. We did compare different choices of functional forms for modeling global epistasis. Overall, we found that there is no difference between a simple sigmoid function with four trainable parameters and the more complex version (sum of multiple sigmoid functions, used by popular methods such as MAVENN). Therefore, all results we presented in the paper were based on the model with a single scalable sigmoid function.

      We have added relevant text; line 153-158. We have also included side-by-side comparisons of the model performance for the GRB-abundance and the AAV2 dataset to corroborate this claim (Supplemental Figure 1).

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

      We thank the reviewer for the thoughtful suggestion. We have rewritten the description of our metrics for measuring the importance of "pairwise", "3-4-way", and ">4-way" interactions; Line 232-239.

      We have also added a table to improve clarity, as suggested; Table 2.

      Reviewer #3 (Public review):

      Summary:

      Sethi and Zou present a new neural network to study the importance of epistatic interactions in pairs and groups of amino acids to the function of proteins. Their new model is validated on a small simulated data set and then applied to 10 empirical data sets. Results show that epistatic interactions in groups of amino acids can be important to predict the function of a protein, especially for sequences that are not very similar to the training data.

      Strengths:

      The manuscript relies on a novel neural network architecture that makes it easy to study specifically the contribution of interactions between 2, 3, 4, or more amino acids. The study of 10 different protein families shows that there is variation among protein families.

      Weaknesses:

      The manuscript is good overall, but could have gone a bit deeper by comparing the new architecture to standard transformers, and by investigating whether differences between protein families explain some of the differences in the importance of interactions between amino acids. Finally, the GitHub repository needs some more information to be usable.

      We thank the reviewer for the thoughtful comments. We have listed our response below in the “Recommendations for the authors” section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some of the dataset labels are confusing. For example, GRB is actually the protein GRB2 and more specifically just one of the two SH3 domains from GRB2 (called GRB2-SH3 in Faure et al.).

      We thank the reviewer for catching this. Our original naming of the datasets followed the designation of library number in the Faure et al paper (which constructed 3 variant libraries and performed different assays on them). To avoid confusion (and also save space in the figure titles), we have now renamed the datasets using this mapping:

      Author response table 1.

      Reviewer #3 (Recommendations for the authors):

      (1) What is the cost of the interpretability of the model? It would be interesting to evaluate how a standard transformer, complete with its many non-linearities, performs on the simulated 13-position data, using the r2 metric. This is important as the last sentence of the discussion seems to suggest that the model proposed by the authors could be used in other contexts, where perhaps interpretability would be less important.

      We thank the reviewer for this suggestion. We have run a generic transformer model on the GRBabundance and AAV2 datasets. Overall, we found minimal difference between the generic model and our interpretable model, suggesting that fitting the interpretable transformer does not incur significant cost in performance.

      We have included a side-by-side comparison of the performance of the generic transformer and our three-layer model in Supplemental Figure 5 and a discussion of this finding in Line 256-259.

      (2) The 10 data sets analyzed by the authors differ in their behaviour. I was wondering whether the proteins have different characteristics, beyond the number and distribution of mutants in the data sets. For instance, do high-order interactions play a bigger role in longer proteins, in proteins with more secondary structures, in more hydrophobic proteins?

      We fully agree that this is a highly relevant question. Unfortunately, the paucity of datasets suitable for the type of analyses we performed in the paper limit our ability to draw general conclusions. Furthermore, the differences in genotype distribution among the 10 datasets may be the main driving factor in the behaviors of the models.

      We included our thoughts on this issue in the discussion (Line 477-481).

      We will definitely revisit this question if this type of high-order combinatorial DMS data becomes more available in the (hopefully) near future.

      (3) Although the code appears to be available in the repository, there is no information about the content of the different folders, about what the different scripts do, or about how to reproduce the article's results. More work should be done to clarify it all.

      Thank you for pointing this out. We have substantially improved our github repository and included many annotations for reproducibility.

      (4) Typos and minor comments:

      (a) p3 "a multi-peak fitness landscapes": landscape.

      (b) p3 "Here instead of directly fitting the the regression coefficients in Eq. 2": remove 'the'.

      (c) p3 "neural network architectures do not allow us to control the highest order of specific epistasis": a word is missing.

      (d) p6 "up to 1,926, 3,014, and 4,102 parameters, respectively-all smaller than the size of the training dataset": it's not very clear what size of the dataset means: number of example sequences?

      (e) p6 "This results confirm": This result confirms.

      (f) p6 "to the convergence of of the variance components of the model landscape to the ground truth.": remove 'of'.

      (g) p7 "to characterize the importance higher-order interactions": the importance of.

      (h) p7 "The improvement varies across datasets and range": and ranges.

      (i) p9 "over the pairwise model is due to the its ability": remove 'the'.

      (j) p13 "This results suggest that pairwise": result suggests.

      (k) p13 "although the role assessed by prediction for randomly sampled genotypes seems moderate": sampled. Also, I'm not sure I understand this part of the sentence: what results are used to support this claim? It's not 6b, which is only based on the mutational model.

      This is in Supplemental Figure 7.

      (l) p13 "potentially by modeling how the these local effects": remove the.

      (m) p13 "We first note that the the higher-order models": remove the.

      (n) p15 "M layers of MHA leads to a models that strictly": lead to a model.

      (o) Supp Figure 1: "Solid lines shows the inverse": show.

      (p) Supp p 10 "on 90% of randomly sample data": sampled.

      (q) Supp p11 "Next, assume that Eq. 5 is true for m > 0. We need to show that Eq. 5 is also true for m + 1.": shouldn't it be m>=0 ? It seems important to start the recursive argument.

      Good catch.

      (r) Supp p11 "Since the sum in line 9 run through subsets": runs.

      (s) Supp p11 "we can further simplify Eq. 11 it to": remove it.

      We have fixed all these problems. We very much appreciate the reviewer’s attention.

    1. Reviewer #1 (Public review):

      The manuscript by Luciano et al is a collection of experiments about the yeast histone 3 lysine 4 methyltransferase, Set1, starting with 10 yeast two-hybrid screens (Y2H). Y2H screens were briefly popular 20+ years ago, but the persistently unfavourable false-to-true positive ratios limited their utility, and the conclusion emerged that Y2H is an unreliable approach for gathering protein-protein interaction data. Y2H outcomes are candidate interaction lists at best, strongly contaminated by false positives. Here, the authors employed a company (Hybridomics) to perform the Y2H screens.

      The primary data is not presented, and the outcomes are summarized using the Hybridomics in-house quality scoring system in Figure 1A. It is not possible to evaluate these data, and the manuscript presents cartoon summaries that the reader must accept as valuable.

      (1) Based on the extensive knowledge about Set1C/COMPASS acquired from genetics and biochemistry by many labs (including the Geli lab), the results presented here from the 10 Y2H screens are notably patchy. Of the 7 subunits of this complex, only one (Spp1) was identified using Set1 as bait. Conversely, as baits, Swd2, Spp1, Shg1, captured Set1, and the Bre2-Sdc1 interaction was reciprocally identified. These interactions were scored at the highest confidence level, which lends some confidence to the screens. However, the missing interactions, even at the third confidence level, indicate that any Y2H conclusions using these data must be qualified with caution. The authors do not appear to be cautious in their lengthy evaluations of these candidate interactions, which are illustrated with cartoons in Figures 2 and 3, with some support from the literature but almost without additional evidence. Snf2 is a particularly interesting candidate, which the authors support with pull-down experiments after mixing the two proteins in vitro (Figure 4). After Y2H, this is the least convincing evidence for a protein-protein interaction, and no further, more reliable evidence is supplied.

      (2) Figure 5 continues the cartoon summary of extrapolations from the Y2H screens, again without supporting evidence, except that the authors state, "We have refined the interaction region between Set1, Prp8 and Prp22, showing that Prp8 and Prp22 interact strongly with Set1-F4 (n-SET). Prp22 interacts in addition with Set1-F1 (Figure S2)." However, Figure S2 does not show this evidence and is incoherent.

      The figure legends for Figure S2B and C (copied here in bold) do not correspond to the figure.

      B - Expression of the F1-F5 fragments in yeast cells. Fusion proteins were detected with an anti-GAL4 monoclonal antibody. TOTO yeast cells (Hybrigenics) were transformed with the different pB66-Set1-F1 to F5 plasmids and subsequently with either P6, pP6-Snf2 762-968, pP6-Prp8 37-250, or pP6-Prp22 379-763 that were identified in the Y2H screens. Transformed cells were incubated 3 days at 30{degree sign}C on SD-LEU-TRP and then restreaked on SD-LEU-TRP-HIS with 3AT. Cell growth was monitored after 2 days at 30{degree sign}C.

      C - Solid and dotted arrows indicate that transformed TOTO cells transformed with pB66-Set1-F1 to F5 and the indicated prey (Snf2, Prp8, and Prp22) are growing in the presence of 20 mM and 5 mM of AT, respectively.

      Figure S2D is two almost featureless dark grey panels accompanied by the figure legend D) Control experiment showing that TOTO cells transformed with p6 and pB66-Set1-F4 are not gowing (sic) in the presence of 5 mM or 20 mM AT.

      Line 343. Interestingly, the two-hybrid screens reveal that Set1 1-754 interacted with Gag capsid-like proteins of Ty1 (Figure S5), raising the possibility that Set1 binding to Ty1 mRNA is linked to the interaction of Set1 1-754 with Gag.

      This is another example of the primary mistake repeatedly made by the authors -Y2H interactions are candidate results and not conclusive evidence. To further illustrate this point, the authors highlight the candidate interaction between Nis1 and 3 Set1C subunits.

      (3) After multiple speculations based on the Y2H candidates, the authors changed to focus on sumoylation of Set1, which has previously reported to be sumoylated. Evidence identifying two sumoylation sites in Set1, in the N-SET and SET domains, is valuable and adds important progress to the role of sumoylation in the regulation of H3K4 methyltransferase, relevant for all eukaryotes. This illuminating part of the manuscript is only tenuously connected to the preceding Y2H screens and concomitant speculations.

      (4) The manuscript then describes a red herring exercise involving Set1 methylation of Nrm1. In an already speculative and difficult manuscript, it is exasperating to read a paragraph about a failed idea. Apart from panel E, Figure 7 is a distraction, and I believe it should not be shared.

      (5) However, despite the failure with Nrm1, Line 443 - The H3K4-like domain in Nrm1 raised our attention to other yeast proteins that carry such sequences. This line of thinking is even less connected to the Y2H screens than the sumoylation work.

      However, the authors present a reasonable evaluation of the yeast proteome screened for six amino acids similar to the known H3K4 motif ARTKQT (Figure 7e).

      (6) However, this evaluation goes nowhere and has no connection with the next section of the manuscript, which is entirely speculation about the regulation of metabolism and stress responses based on the Y2H results and selected evidence from the literature.

      (7) The manuscript then describes more failed experiments regarding lysine methylation of Snf2 by Set1C, which unexpectedly reports arginine methylation rather than lysine. The manuscript does not currently meet the standard expected for this type of paper - the composition is somewhat incoherent and there are no previous reports of arginine methylation by SET domain proteins.

      The manuscript presents a very experienced grasp of the literature and a sophisticated appreciation of the forefront issues, but a surprising failure to eliminate uninformative failures and peripheral distractions. The overinterpretation of Y2H results is a dominating failure. There are some valuable parts within this manuscript, and hopefully, the authors can reformat to eliminate the defects and appropriately qualify the candidate data.

    2. Author response:

      eLife Assessment

      This study uses the yeast two-hybrid assay to identify proteins that may interact with yeast Set1 and other subunits of COMPASS/Set1C, the histone H3K4 methyltransferase, providing also some evidence for Set1 sumoylation and a role of SET1C methylating other factors in vitro. The results are valuable, and they should contribute to understanding the functions of the conserved SET1C complex, as they suggest potential functional connections with RNA biogenesis, chromatin remodeling, and non-histone methylation, whose implications would yet need to be explored. Nevertheless, apart from the fact that only a small subset of the Y2H interactions is further examined, the validating experiments are only partial or inconclusive, the strength of evidence being at this point incomplete.

      We thank the reviewers for their thoughtful comments, which primarily raise three major concerns: the overinterpretation of the Y2H data, issues related to validation, and the manuscript’s structure. At the same time, the reviewers acknowledge that the dataset is extensive and that aspects of the validation work are valuable. Below, we provide point-by-point responses to the public reviews. We will prepare a revised version of the manuscript that carefully addresses the public comments and incorporates the referees’ recommendations.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Luciano et al is a collection of experiments about the yeast histone 3 lysine 4 methyltransferase, Set1, starting with 10 yeast two-hybrid screens (Y2H). Y2H screens were briefly popular 20+ years ago, but the persistently unfavourable false-to-true positive ratios limited their utility, and the conclusion emerged that Y2H is an unreliable approach for gathering protein-protein interaction data. Y2H outcomes are candidate interaction lists at best, strongly contaminated by false positives. Here, the authors employed a company (Hybridomics) to perform the Y2H screens.

      The primary data is not presented, and the outcomes are summarized using the Hybridomics in-house quality scoring system in Figure 1A. It is not possible to evaluate these data, and the manuscript presents cartoon summaries that the reader must accept as valuable.

      We agree that false positives contaminate the list of potential interactors. Some interactions may also be indirect through a common interactor and do not reflect a physiological interaction. Nevertheless, some positives reflect real interactions that can occur under specific physiological conditions. This is the case, for example, with the interaction between Spp1 and Mer2 (from this screen), which has led to major discoveries (Acquaviva et al. Science 2013; Sommermeyer et al. Mol Cell 2013). The publication of these 10 screens should be viewed as a valuable resource for the broader community.

      Hybrigenics brings extensive experience from conducting numerous screens, enabling the team to recognize recurring false positives that commonly arise in screening assays.

      (1) Based on the extensive knowledge about Set1C/COMPASS acquired from genetics and biochemistry by many labs (including the Geli lab), the results presented here from the 10 Y2H screens are notably patchy. Of the 7 subunits of this complex, only one (Spp1) was identified using Set1 as bait. Conversely, as baits, Swd2, Spp1, Shg1, captured Set1, and the Bre2-Sdc1 interaction was reciprocally identified. These interactions were scored at the highest confidence level, which lends some confidence to the screens. However, the missing interactions, even at the third confidence level, indicate that any Y2H conclusions using these data must be qualified with caution. The authors do not appear to be cautious in their lengthy evaluations of these candidate interactions, which are illustrated with cartoons in Figures 2 and 3, with some support from the literature but almost without additional evidence. Snf2 is a particularly interesting candidate, which the authors support with pull-down experiments after mixing the two proteins in vitro (Figure 4). After Y2H, this is the least convincing evidence for a protein-protein interaction, and no further, more reliable evidence is supplied.

      We agree with referee 1 that more caution is needed, and we will take this into account in the revised version. We agree that Y2H interaction is an indication of potential interaction and not proof of interaction. We have therefore made a significant effort to compile elements from the literature that may support the interaction. Once again, this study can be considered a resource.

      (2) Figure 5 continues the cartoon summary of extrapolations from the Y2H screens, again without supporting evidence, except that the authors state, "We have refined the interaction region between Set1, Prp8 and Prp22, showing that Prp8 and Prp22 interact strongly with Set1-F4 (n-SET). Prp22 interacts in addition with Set1-F1 (Figure S2)." However, Figure S2 does not show this evidence and is incoherent.

      When we say that we have refined the interaction region between Set1, Prp8, and Prp22, we mean that we have restricted the interaction regions according to Y2H criteria. Indeed, we have not shown the spots illustrating the results. This will be corrected in the revised version.

      The figure legends for Figure S2B and C (copied here in bold) do not correspond to the figure.

      We agree that the legend for Figure S2 is unclear and does not accurately describe the panels shown in the figure. We will revise the legend accordingly in the updated version to ensure it accurately reflects the content of all panels.

      (B) Expression of the F1-F5 fragments in yeast cells. Fusion proteins were detected with an anti-GAL4 monoclonal antibody. TOTO yeast cells (Hybrigenics) were transformed with the different pB66-Set1-F1 to F5 plasmids and subsequently with either P6, pP6-Snf2 762-968, pP6-Prp8 37-250, or pP6-Prp22 379-763 that were identified in the Y2H screens. Transformed cells were incubated 3 days at 30{degree sign}C on SD-LEU-TRP and then restreaked on SD-LEU-TRP-HIS with 3AT. Cell growth was monitored after 2 days at 30{degree sign}C.

      (C) Solid and dotted arrows indicate that transformed TOTO cells transformed with pB66-Set1-F1 to F5 and the indicated prey (Snf2, Prp8, and Prp22) are growing in the presence of 20 mM and 5 mM of AT, respectively.

      Figure S2D is two almost featureless dark grey panels accompanied by the figure legend D) Control experiment showing that TOTO cells transformed with p6 and pB66-Set1-F4 are not gowing (sic) in the presence of 5 mM or 20 mM AT.

      Line 343. Interestingly, the two-hybrid screens reveal that Set1 1-754 interacted with Gag capsid-like proteins of Ty1 (Figure S5), raising the possibility that Set1 binding to Ty1 mRNA is linked to the interaction of Set1 1-754 with Gag.

      This is another example of the primary mistake repeatedly made by the authors -Y2H interactions are candidate results and not conclusive evidence.

      This statement is supported by our previous findings demonstrating that Set1 binds Ty1 mRNA independently of it dRRM and represses Ty1 mobility at a post-transcriptional stage (Luciano et al., Cell Discovery, 2017 PMID:29071121). Binding of Set1 to Ty1 mRNA could stem from the interaction between Set1 1-754 and the Gag capsid-like protein.

      To further illustrate this point, the authors highlight the candidate interaction between Nis1 and 3 Set1C subunits.

      While we agree that the Nis1-Set1C interaction has not been demonstrated beyond doubt, we feel that our Y2H and in vitro binding experiments provide reasonable evidence that the interactions may be relevant. It is important to consider that any interaction assay can provide negative (and false positive) results, this includes Y2H, in vitro binding and mass-spec analysis of purified complexes from cells. We feel that it is not appropriate to only trust protein interactions that are strong and stable enough to be demonstrated via purified complexes. It is clear that some protein interactions do occur in transient and weak manner and therefore are not compatible with biochemical purification approach. This indeed is the strength of alternative methods like Y2H and in vitro binding assays, that interactions can be identified and tested even if the physiological context of the interaction may be more complex.

      (3) After multiple speculations based on the Y2H candidates, the authors changed to focus on sumoylation of Set1, which has previously reported to be sumoylated. Evidence identifying two sumoylation sites in Set1, in the N-SET and SET domains, is valuable and adds important progress to the role of sumoylation in the regulation of H3K4 methyltransferase, relevant for all eukaryotes. This illuminating part of the manuscript is only tenuously connected to the preceding Y2H screens and concomitant speculations.

      We thank Referee 1 for their comment. While it is true that there is only a modest connection between Set1 interactors involved in direct or indirect sumoylation and the characterization of Set1 SUMOylation sites, we believe that this does not constitute a weakness of the manuscript.

      (4) The manuscript then describes a red herring exercise involving Set1 methylation of Nrm1. In an already speculative and difficult manuscript, it is exasperating to read a paragraph about a failed idea. Apart from panel E, Figure 7 is a distraction, and I believe it should not be shared.

      According to this comment, we will remove Fig. 7 panels A-D.

      (5) However, despite the failure with Nrm1, Line 443 - The H3K4-like domain in Nrm1 raised our attention to other yeast proteins that carry such sequences.

      This line of thinking is even less connected to the Y2H screens than the sumoylation work.

      However, the authors present a reasonable evaluation of the yeast proteome screened for six amino acids similar to the known H3K4 motif ARTKQT (Figure 7e).

      (6) However, this evaluation goes nowhere and has no connection with the next section of the manuscript, which is entirely speculation about the regulation of metabolism and stress responses based on the Y2H results and selected evidence from the literature.

      We will take into account of these remarks (points 5 and 6) in the revised version.

      (7) The manuscript then describes more failed experiments regarding lysine methylation of Snf2 by Set1C, which unexpectedly reports arginine methylation rather than lysine. The manuscript does not currently meet the standard expected for this type of paper - the composition is somewhat incoherent and there are no previous reports of arginine methylation by SET domain proteins.

      We respectfully disagree with referee 1. We have integrated extensive in vitro reconstruction experiments with complementary in vivo studies, all conducted according to the rigorous standards expected by leading journals. These approaches have allowed us to reach the conclusions presented in this manuscript. While some of these findings are unexpected, they are supported by the data. We have carefully discussed the results and their limitations to provide a comprehensive interpretation.

      The manuscript presents a very experienced grasp of the literature and a sophisticated appreciation of the forefront issues, but a surprising failure to eliminate uninformative failures and peripheral distractions. The overinterpretation of Y2H results is a dominating failure. There are some valuable parts within this manuscript, and hopefully, the authors can reformat to eliminate the defects and appropriately qualify the candidate data.

      We thank Referee 1 for these insightful comments. In the revised version, we will follow the advice to remove non-informative failures and peripheral distractions. Additionally, we will exercise greater caution to avoid overinterpreting the Y2H results.

      Reviewer #2 (Public review):

      Summary:

      This paper starts with a large-scale yeast two-hybrid (Y2H) screen using Set1 (full-length and smaller parts) and other Set1C/COMPASS subunits as bait. There are hundreds of possible interactions identified, but only a small number are given any follow-up. While it's useful to document all the possible interactions, the unfocused and preliminary nature of the results makes the paper feel scattered and incomplete.

      Strengths:

      The Y2H screen was very comprehensive, producing lots of interesting possible leads for further experiments.

      Weaknesses:

      The results are useful but incomplete because only a small subset of the Y2H interactions is further examined. Even in the case of those that were further tested, the validating experiments are only partial or inconclusive.

      Referee 2’s comments align in some respects with those of Referee 1. We will follow the detailed Referee 2 suggestions to reduce the scattered nature of the manuscript.

      We will follow his/her recommendations, in particular we will provide and AlphaFold model of the interaction between the Set1 N-term 1-754 with the SID domain of Kap104 that involves the proposed Set1 PY-NLS sequence.

      Reviewer #3 (Public review):

      The SET1C/COMPASS complex is the histone H3K4 methyltransferase in Saccharomyces cerevisiae, where it plays pivotal roles in transcriptional regulation, DNA repair, and chromatin dynamics. While its canonical function in histone methylation is well-established, its full interactome remains poorly defined. Moreover, whether SET1C methylates non-histone substrates has been an open question. In this study, Luciano et al. employ systematic yeast two-hybrid (Y2H) screening to uncover novel interactors and functions of SET1C. Their findings reveal potential functional connections to RNA biogenesis, chromatin remodeling, and non-histone methylation.

      The authors performed multiple Y2H screens using Set1 (full-length, N-terminal, and C-terminal fragments) and each of its seven subunits as baits. They identified high-confidence interactors that link SET1C to diverse cellular processes, including chromatin regulation (e.g., the SWI/SNF complex via Snf2), DNA replication (e.g., Mcm2, Orc6), RNA biogenesis (e.g., spliceosome components Prp8 and Prp22; polyadenylation factors Pta1 and Ref2), tRNA processing (e.g., Trm1, Trm732), and nuclear import/export (e.g., importins Kap104 and Kap123). Some of these interactions were further validated by immunoprecipitation or in vitro assays.

      Given the interaction of Set1 with Slx5 and Wss1 - proteins involved in SUMO-dependent processes - the authors investigated and convincingly demonstrated that Set1 is sumoylated. This modification may influence the function and regulation of the SET1C complex.

      Finally, the authors provide evidence that SET1C methylates proteins beyond histone H3K4, notably Nrm1, a transcriptional corepressor, and Snf2, the catalytic subunit of the SWI/SNF chromatin remodeling complex. Although Nrm1 contains a domain resembling the H3K4-methylated sequence (H3K4-like domain), this region does not appear to be required for its methylation. The search for other proteins containing similar domains as potential methylation candidates (p.12, first paragraph) seems less justified, given the lack of evidence supporting the requirement for the H3K4-like domain in methylation.

      This study offers valuable insights into the interactome of SET1C, suggesting potential links between the complex and a wide range of cellular processes. However, the functional implications of the Y2H interactions remain to be explored further. Additionally, the study provides intriguing information on the possible regulation of Set1 by sumoylation. The discovery of Nrm1 and Snf2 as methylation substrates could significantly expand the known targets and functions of SET1C.

      The results are supported by high-quality data.

      We thank referee 3 for his/her positive comments

    1. Author response:

      We sincerely appreciate the constructive comments and valuable suggestions from the editors sand reviewers. We highly value the feedback and will carefully address all concerns in our revised manuscript.

      (1) We will supplement more details of the processing steps and key results in the analyses of sCCA and SVR to improve the transparency and reproducibility of our methods.

      (2) According to the reviewers’ suggestions, we will adjust and present a more conventional and cautious conclusion regarding clinical specificity and neuroplasticity reserve.

      (3) We will supplement the results of structural connections (termed “symptom-related network” in the manuscript) across the three subgroups to strengthen the interpretation of subgroup-specific neurobiological characteristics.

      (4) All the suggestions from the reviews will be respected, and we will carefully revise our manuscript to improve its clarity, rigor, and scientific quality.

      We believe these revisions will significantly improve the quality of our work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments and constructive suggestions. We describe how we have addressed each point below and are grateful for the guidance on areas where our work could be clarified or expanded. In particular, we note the following:

      Selection scan summary statistics: In our revised manuscript, we have included summary statistics from the selection scans. We believe this addition will enhance transparency and provide additional context for readers.

      Reporting of outliers: As highlighted by the editor, the reviewers expressed differing views on the most appropriate way to report outliers. To provide a comprehensive and balanced presentation, we now report both the empirical selection statistics and the corresponding converted p-values in either the main text or supplement, and both outputs are also provided in the full summary files. This dual approach will allow readers to fully interpret the results under both perspectives.

      Expanded discussion of admixture timing and population structure: We have carefully considered the reviewers' suggestions to incorporate additional descriptions of population structure or demographic analyses, and have done so in our revisions where possible. These changes strengthen the rigor and clarity of the analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper reports an analysis of whole-genome sequence data from 40 Faroese. The authors investigate aspects of demographic history and natural selection in this population. The key findings are that the Faroese (as expected) have a small population size and are broadly of Northwest European ancestry. Accordingly, selection signatures are largely shared with other Northwest European populations, although the authors identify signals that may be specific to the Faroes. Finally, they identify a few predicted deleterious coding variants that may be enriched in the Faroes.

      Strengths:

      The data are appropriately quality-controlled and appear to be of high quality. Some aspects of the Faroese population history are characterized, in particular, by the relatively (compared to other European populations) high proportion of long runs of homozygosity, which may be relevant for disease mapping of recessive variants. The selection analysis is presented reasonably, although as the authors point out, many aspects, for example differences in iHS, can reflect differences in demographic history or population-specific drift and thus can't reliably be interpreted in terms of differences in the strength of selection.

      Weaknesses:

      The main limitations of the paper are as follows:

      (1) The data are not available. I appreciate that (even de-identified) genotype data cannot be shared; however, that does substantially reduce the value of the paper. Minimally, I think the authors should share summary statistics for the selection scans, in line with the standard of the field.

      We agree with the reviewer that sharing the selection scan results is important, so we have now made the selection scan summary statistics publicly available, and clearly lay out the guidelines and research questions for which the data can be accessed in our Data Availability statement.

      (2) The insight into the population history of the Faroes is limited, relative to what is already known (i.e., they were settled around 1200 years ago, by people with a mixture of Scandinavian and British ancestry, have a small effective population size, and any admixture since then comes from substantially similar populations). It's obvious, for example, that the Faroese population has a smaller bottleneck than, say, GBR.

      More sophisticated analyses (for example, ARG-based methods, or IBD or rare variant sharing) would be able to reveal more detailed and fine-scale information about the history of the populations that is not already known. PCA, ADMIXTURE, and HaplotNet analysis are broad summaries, but the interesting questions here would be more specific to the Faroes, for example, what are the proportions of Scandinavian vs Celtic ancestry? What is the date and extent of sex bias (as suggested by the uniparental data) in this admixture? I think that it is a bit of a missed opportunity not to address these questions.

      We clarify that we did quantify the proportions of various ancestry components as estimated by HaploNet in main text Figure 5 and supplemental figures S6 and S7. To better highlight this result, we now also include the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      We agree that more fine-scale demographic analyses would be informative. We now additionally provide an estimation of the admixture date in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes and discussion using the DATES software which is optimized for ancient genomes.

      We have encountered problems with using different standard date estimation software, including DATES, which give very inconsistent and unstable results. As we note in our text, we suspect this might be due to the strong bottleneck experienced in the history of the Faroe Islands, low LD differentiation between the source populations, or multiple pulses of admixture, which may be breaking one or more of the assumptions of these methods. Assessing the limitations of these methods is beyond the scope of this current manuscript; however, we will continue working on this problem for future studies, possibly using simulations to assess where the problem might be. We recognize that our relatively small sample size places limits on the fine-scale demographic analyses that can be performed. We are addressing this in ongoing work by generating a larger cohort, which we hope will enable more detailed inference in the future.

      (3) I don't really understand the rationale for looking at HLA-B allele frequencies. The authors write that "ankylosing spondylitis (AS) may be at a higher prevalence in the Faroe Islands (unpublished data), however, this has not been confirmed by follow-up epidemiological studies". So there's no evidence (certainly no published evidence) that AS is more prevalent, and hence nothing to explain with the HLA allele frequencies?

      We agree that no published studies have confirmed a higher prevalence of ankylosing spondylitis (AS) in the Faroe Islands. Our recruitment data suggest that AS might be more common than in other European populations, but we understand that this is only based on limited, unpublished observations and what we are hearing from the community. We emphasized in our original manuscript that this is based on observational evidence from the FarGen project. However, as this reviewer pointed out, we can be more clear that this prevalence has not been formally studied.

      In revision, we clarify in the Main Text - Results - HLA-B Allele Frequencies and Discussion that our recruitment data suggest a higher prevalence of AS may be possible, but more formal epidemiological studies are needed to confirm this observation. The reason we study HLA-B allele frequencies is to see if the genetic background of the Faroese population could help explain this possible difference, since HLA-B27 is already known to play a strong role in AS.

      Reviewer #2 (Public review):

      In this paper, Hamid et al present 40 genomes from the Faroe Islands. They use these data (a pilot study for an anticipated larger-scale sequencing effort) to discuss the population genetic diversity and history of the sample, and the Faroes population. I think this is an overall solid paper; it is overall well-polished and well-written. It is somewhat descriptive (as might be expected for an explorative pilot study), but does make good use of the data.

      The data processing and annotation follows a state-of-the-art protocol, and at least I could not find any evidence in the results that would pinpoint towards bioinformatic issues having substantially biased some of the results, and at least preliminary results lead to the identification of some candidate disease alleles, showing that small, isolated cohorts can be an efficient way to find populations with locally common, but globally rare disease alleles.

      I also enjoyed the population structure analysis in the context of ancient samples, which gives some context to the genetic ancestry of Faroese, although it would have been nice if that could have been quantified, and it is unfortunate that the sampling scheme effectively precludes within-Faroes analyses.

      We note that although the ancestry proportions were not originally specified in the main text, we did quantify ancestry proportions in the modern Faroese individuals and other ancient samples, and we visualized these proportions in Figure 5 and Supplementary Figures S6 and S7. As stated in our response to Reviewer #1, in our revisions, we now more clearly state the average global ancestry of the various components in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes.

      I am unfortunately quite critical of the selection analysis, both on a statistical level and, more importantly, I do not believe it measures what the authors think it does.

      Major comments:

      (1) Admixture timing/genomic scaling/localization:

      As the authors lay out, the Faroes were likely colonized in the last 1,000-1,500 years, i.e., 40-60 generations ago. That means most genomic processes that have happened on the Faroese should have signatures that are on the order of ~1-2cM, whereas more local patterns likely indicate genetic history predating the colonization of the islands. Yet, the paper seems to be oblivious to this (to me) fascinating and somewhat unique premise. Maybe this thought is wrong, but I think the authors miss a chance here to explain why the reader should care beyond the fact that the small populations might have high-frequency risk alleles and the Faroes are intrinsically interesting, but more importantly, it also makes me think it leads to some misinterpretations in the selection analysis.

      See response to point #3

      (2) ROH:

      Would the sampling scheme impact ROH? How would it deal with individuals with known parental coancestry? As an example of what I mean by my previous comment, 1MB is short enough in that I would expect most/many 1MB ROH-tracts to come from pedigree loops predating the colonization of the Faroes. (i.e, I am actually quite surprised that there isn't much more long ROH, which makes me wonder if that would be impacted by the sampling scheme).

      The sampling scheme was designed to choose 40 Faroese individuals that were representative of the different regions and were minimally related. There were no pairs of third-degree relatives or closer (pi-hat > 0.125) in either the Faroese cohort or the reference populations. It is possible that this sampling scheme would reduce the amount of longer ROHs in the population, but we should still be able to see overall patterns of ROH reflective of bottlenecks in the past tens of generations. Additionally, based on this reviewer's earlier comment, 1 Mb ROHs would still be relevant to demographic events in the last 40-60 generations given that on average 1 cM corresponds to 1 Mb in humans, though we recognize that is not an exact conversion.

      That said, the “sum total amount of the genome contained in long ROH” as we described in the manuscript includes all ROHs greater than 1Mb. Although we group all ROHs longer than 1Mb into one category in Main Text Figure 2, we now additionally provide the distribution in ROH lengths across all individuals for each cohort in a new Supplemental Figure S3. As this plot shows, there certainly are ROHs longer than 1Mb in the Faroese cohort, and on average there is a higher proportion of long ROH particularly in the 5-15 Mb range in the Faroese cohort relative to the other cohorts. As the reviewer points out, these longer ROHs are possibly indicative of a more recent or stronger bottleneck in the Faroes relative to the comparison cohorts. We highlight this result in Main Test - Results - Population Structure and Relatedness.

      (3) Selection scan:

      We are talking about a bottlenecked population that is recently admixed (Faroese), compared to a population (GBR) putatively more closely related to one of its sources. My guess would be that selection in such a scenario would be possibly very hard to detect, and even then, selection signals might not differentiate selection in Faroese vs. GBR, but rather selection/allele frequency differences between different source populations. I think it would be good to spell out why XP-EHH/iHS measures selection at the correct time scale, and how/if these statistics are expected to behave differently in an admixed population.

      The reviewer brings up good points about the utility of classical selection statistics in populations that are admixed or bottlenecked, and whether the timescale at which these statistics detect selection is relevant for understanding the selective history of the Faroese population. We break down these concerns separately.

      (1) Bottlenecks: Recent bottlenecks result in higher LD within a population. However, demographic events such as bottlenecks affect global genomic patterns while positive selection is expected to affect local genomic patterns. For this reason, iHS and XP-EHH statistics are standardized against the genome-wide background, to account for population-specific demographic history.

      (2) Admixture: The term “admixture” has different interpretations depending on the line of inquiry and the populations being studied. Across various time and geographic scales, all human populations are admixed to some degree, as gene flow between groups is a common fixture throughout our history. For example, even the modern British population has “admixed” ancestry from North / West European sources as well, dating to at least as recently as the Medieval & Viking periods (Gretzinger et al. 2022, Leslie et al. 2015), yet we do not commonly consider it an “admixed” population, and we are not typically concerned about applying haplotype-based statistics in this population. This is due to the low divergence between the source populations. In the case of the Faroe Islands, we believe admixture likely occurred on a similar timescale or even earlier, based on the DATES estimates. We see low variance in ancestry proportions estimated by HaploNet, both from the historical Faroese individuals (dated to 260 years BP) and the modern samples. This indicates admixture predating the settlement of the Faroe Islands, where recombination has had time to break up long ancestry tracts and the global ancestry proportions have reached an equilibrium. That is, these ancestry patterns suggest that the modern Faroese are most likely descended from already admixed founders. In the original manuscript, we mentioned this as a likely possibility in the Main Text - Discussion: “This could have occurred either via a mixture of the original “West Europe” ancestry with individuals of predominantly “North Europe” ancestry, or a by replacement with individuals that were already of mixed ancestry at the time of arrival in the islands (the latter are not uncommon in Viking Age mainland Europe).” In our revisions, we further included the DATES estimations of the timing of admixture in the modern and historical Faroese samples, which pre-date the timing of settlement in both cases. We highlight these points in the Discussion. And, as with the case of the British population, the closely-related ancestral sources for the Faroese founders were likely not so diverged as to have differences in allele frequencies and long-range haplotypes that would disrupt signals of selection from iHS or XP-EHH.

      (3) Time scale: It is certainly possible, and in fact likely, that iHS measures selection older than the settlement of the Faroe Islands. In our manuscript, we calculated iHS in both the Faroese and the closely related British cohort, and we highlight in the main Main Text that the top signals, with the exception of LCT, are shared between the two cohorts, indicative of selection that began prior to the population split (Discussion and Results - Signals of Positive Selection). iHS is a commonly calculated statistic, and it is often calculated in a single population without comparing to others, so we feel it is important to show our result demonstrating these shared selection signals. In our revisions, we now clarify in the Discussion the limitations and time-scale at which the iHS statistic may detect selection. As far as XP-EHH, it is a statistic designed to identify differentiated variants that are fixed or approaching fixation in one population but not others. The time-scale of selection that XP-EHH can detect would therefore be dependent on the populations used for comparison. As XP-EHH has the best power to identify alleles that are fixed or approaching fixation in one population but not others, it is less likely to detect older selection events / incomplete sweeps from the source populations. We highlight this point in the Discussion.

      (4) Similarly, for the discussion of LCT, I am not convinced that the haplotypes depicted here are on the right scale to reflect processes happening on the Faroes. Given the admixture/population history, it at the very least should be discussed in the context of whether the 13910 allele frequency on the Faroes is at odds with what would be expected based on the admixture sources.

      We agree that more investigation into the LCT allele frequency in the other ancient samples may provide some insight into the selection history, particularly in light of ancient admixture. Please note, we did look at the allele frequency of the LCT allele rs4988235 and stated in the main text that it was present at high frequencies in the historical (250BP) Faroese samples. The frequency of this allele in the imputed historical Faroese samples is 82% while the allele is present at ~74% frequency in modern samples. We originally did not report the exact percentage in the main text because the sample size of the historical samples (11 individuals) is small and coverage of ancient samples is low, leading to potential errors in imputation.

      However, given the reviewer’s comment, we have now included the frequencies as well as these caveats in the Discussion. We additionally calculated the LCT allele frequency in other ancient samples, and assuming that we had good proxies for the sources at the time of admixture, we calculated the expected allele frequency in the admixed ancestors of the Faroese founders (Discussion), but again note the limitations in using such a calculation in this context.

      (5) I am lacking information to evaluate the procedure for turning the outliers into p-values. Both iHS and XP-EHH are ratio statistics, meaning they might be heavy-tailed if one is not careful, and the central limit theorem may not apply. It would be much easier (and probably sufficient for the points being made here) to reframe this analysis in terms of empirical outliers.

      Given that there are disagreements on the best approach to reporting selection scan results from the reviewers, in our revision, we have additionally supplied both the standardized iHS / XP-EHH values in Supplementary Fig. S10 as well as these values transformed to p-values in Main Text Fig. 3. Additionally, both outputs are provided in the publicly available selection scan results files. We provide the method for obtaining p-values in the subsection “Selection scan” from the Methods section - we used a method developed earlier by Fariello et al.

      (6) Oldest individual predating gene flow: It seems impossible to make any statements based on a single individual. Why is it implausible that this person (or their parents), e.g., moved to the Faroes within their lifetime and died there?

      We agree with the reviewer that this is a plausible explanation, and in our revisions, we have updated the Main Text - Discussion to acknowledge this possibility.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Please note that there was disagreement among the reviewers regarding the reporting of outliers.

      As stated in our response to the public reviews, given the disagreement, we include both the empirical selection statistics as well as the converted p-values in the main text, supplement and selection scan files.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2:

      Define labels / explain why they differ from 1000k populations / make them consistent throughout the manuscript.

      We apologize for the error in labels for Figure 2. These are the same populations used in other figures and analyses. We have fixed this in our revisions so that the labels are consistent with the rest of the manuscript.

      (2) Figure S2 label:

      "The matrix is rescaled after subsetting the individuals, so although the scales are different, the overall structure remains the same." I do not understand this sentence. The samples are different, the scale is different, the apparent pattern is different - what overall structure is supposed to be the same?

      We apologize that the language was not clear in the figure label. The scales between panels A and B are different, because popkin rescales the kinship labels after subsetting so that the minimum kinship is zero. This is necessary when subsetting individuals from an already estimated kinship matrix particularly when subsetting from global populations to a single region. From the popkin documentation: “This rescaling is required when subsetting results in a more recent Most Recent Common Ancestor (MRCA) population compared to the original dataset (for example, if the original data had individuals from across the world but the subset only contains individuals from a single continent)” (https://rdrr.io/cran/popkin/man/rescale_popkin.html).

      We also described this in the Methods - Population Genetics - Kinship and runs of homozygosity section: “When calculating the kinship matrix for the Faroese WGS cohort only, we used the rescale_kinship() function, which will change the most recent common ancestor and give different absolute values, but the overall relationship structure in the subpopulation remains the same.”

      That is, the relative kinship within the Faroese cohort remains consistent, despite the different scale.

      It is difficult to see the kinship of Faroese individuals in the larger plot with all cohorts, which is why we subset and visualize the Faroese cohort alone. We have updated the Fig. S2 label language to make this more clear.

      (3) "Iron Age Wet Europe"

      We have corrected this typo to “Iron Age West Europe.”

      I'm confused if the ancient Faroese were part of the imputation panel: Figure 5 legend implies they are, methods imply they are not.

      The ancient samples are not imputed with the modern Faroese and reference samples, but they are the imputed data downloaded from Allentoft et al. and merged with the modern Faroese cohort. We specify that we downloaded imputed ancient samples in both the Methods - Fine-scale structure estimation using ancient genomes and in the Main Text - Results - Fine-Scale Structure and Connections to Ancient Genomes. The description of the imputation panel in the Methods - Bioinformatics - Variant calling and imputation refers only to the modern samples.

      (4) Kinship:

      The kinship of the Faroes is useful (and nice) as a QC analysis showing the genetic data matches the expectations from the pedigree. I don't know what I should learn from the kinship of the 1000kg samples (I'd assume one could learn something about bottleneck strength from this), but it's not developed/discussed.

      The global kinship matrix provides complementary information to PCA and ROH, as another way to quantify and visualize the relationships within and between populations. Additionally, as the reviewer mentioned, bottlenecks increase kinship within populations. Given that popkin estimates kinship measured from a Most Recent Common Ancestor, we can best observe this increase in kinship when comparing to other global populations. We more clearly delineate what can be observed from Fig. S2A versus Fig. S2B in the Results - Population Structure and Relatedness.

      Reference

      (1) Gretzinger, J. et al. The Anglo-Saxon migration and the formation of the early English gene pool. Nature 610, 112–119 (2022)

      (2) Leslie, S. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015).

    1. R0:

      Reviewer #1:

      In the abstract: - please use the background, methods, results, conclusions subheadings. Indicate more clearly that this is not the results of a trial, but an examination of baseline data among trial participants for a different purpose. It is unclear whether there are 12,000 participants who provided data, or if only 1,600 provided data for analysis? How many patients in the V-CaN hybrid type 2 implementation effectiveness trial? What is the main aim of this trial and when will that be reported? "We have also created a best-fit regression model for risk factors as outcome variables" What are the outcome variables? predictors? Please lead the reader through to understand why this was done. - it is not clear what these abbreviations signify: RKSK, RBSK, NP-NCD, NFHS. -was the prevalence of smoking only 0.3%?

      line 34: don't use DM without explaining it. line 35 HTN? is this the same definition of high blood pressure mentioned in line 39? If so, use same terms. Why higher percentages of high blood pressure in 10-12 year olds compared to 13-19 year olds?

      line 93: was the data cleaned using R studio? How?

      line 109: (perhaps give context to 600 met-minutes - i.e. ~ an hour of running, for example, if appropriate.

      line 120: Capitals for American Academy of Pediatrics.

      Table 1: is Other backward class the best descriptor?

      Could there be a figure to show the predictors of each outcome?

      line 227: should be 0.50

      line 227: compared to a farmer and laborer

      Instead of gender, should it be sex?

      Were the blood samples only tested for blood glucose status?

      Reviewer #2:

      Overall relevance and significance

      The topic is highly relevant, as studying NCD risk factors among adolescents is important because this age group is particularly vulnerable to commercial determinants of health.

      Adolescence represents a critical window for early prevention, making the study timely and valuable.

      1. Clarification on participant flow

      Please clarify what happened to participants who consented for testing but were not included.

      You mention that ~11,000 agreed, but the final sample was 1,600.

      A clear flow diagram (screened → consented → included → analyzed) with reasons for exclusion would improve transparency and reduce potential selection bias concerns.

      1. Regression modelling approach

      In the prediction models, consider including only biologically plausible predictors for each outcome.

      For example, for stress, variables such as psychosocial factors may be more appropriate.

      For salt intake, predictors like dietary patterns or processed food consumption would be more relevant.

      Some associations (e.g., smoking with higher salt intake, or stress with vegetable intake) may not be conceptually necessary and could introduce overadjustment or model instability.

      1. Variable selection strategy

      Instead of stepwise methods, an enter (forced entry) method based on conceptual framework is preferable.

      This helps avoid data-driven selection and preserves meaningful associations grounded in theory.

      1. Interpretation of findings

      The discussion should expand on the biological and behavioral plausibility of key associations.

      For example, explain why salt intake might relate to physical activity (e.g., dietary patterns, sports drinks, higher caloric intake).

      Similar explanatory reasoning with supporting literature should be provided for other significant findings.

      1. Strengthening the discussion

      Link findings more explicitly to existing evidence and mechanisms, not just statistical significance.

      Highlight how results inform adolescent NCD prevention strategies or policy implications.

      Academic Editor:

      Thank you for this timely paper. My comments: 1. The Introduction is well-written in terms of presenting the burden of NCDs and citing national datasets such as NFHS and CNNS. However, it may be helpful to strengthen the rationale slightly to more clearly justify the need for conducting this large baseline assessment.

      1. At present, the paragraphs primarily describe existing prevalence data. It would add clarity if a few sentences explicitly explained how this study differs from or adds to existing datasets, particularly given that CNNS and other national surveys have already reported adolescent risk factors. Clarifying what specific gap this study addresses (for example, extension of the age range to 30 years, district-level HDSS-based profiling, integration of behavioral and biochemical measures, or linkage to the implementation arm of the V-CaN trial) would strengthen the justification.

      2. Introduce your study clearly and boldly, early introduce your protocol paper and then it may be useful to state clearly that this manuscript presents baseline findings intended to inform and tailor the implementation component of the hybrid trial. Making this linkage more explicit would help readers better appreciate the added value and purpose of the baseline survey. Otherwise, readers may be high with the thought that this study would give us something from A hybrid type 2 implementation effectiveness trial, whereas, in current form it is just a cross-sectional study.

      In Methods 1. I would suggest clearly adding standard sub-headings for easy reading. 2. Given the number of variables assessed in this study, it may be helpful to consider presenting a consolidated table in the Methods section summarizing the operational definitions and measurement approaches used. This could include definitions of outcome variables, biochemical assessment methods (including laboratory techniques and cut-offs), procedures for estimating salt intake in grams per person per day, and the method used to assess fruit and vegetable servings and physical activity. 3. 10-20 and 20-30 years: rationale for these categories. consider aligning with The World Health Organization (WHO) or similar body, that provides specific definitions for different age groups within the young population. According to WHO, adolescents are defined as individuals between 10 and 19 years of age, young people encompass those in the age group of 10-24, and youth are categorized as those aged 15-24. Currently we are overlapping.

      Results 1. Table 3: Association of stress with socio-demographic and other behavioral characteristics among 10-30 years old persons. Why are we presenting this as a separate table, as it was less talked about while building the rationale. Need more clarity on this. 2. table 4: only talks about Smokeless tobacco. rationale? 3. while doing regression, please recheck multicollinearity, as currently there are so many variables, that do not fit in causal pathways. example: role of marital status while doing regression for salt and sugar. Tables in general are very cluttered, that affects readability and making inference.

      Discussion 1. Please add 4 key messages from your study, that would help u during the intervention or would inform policy, and some recommendations at the end of the discussion.

    1. Reviewer #3 (Public review):

      Henshall et al. study invasion of human erythrocytes by Plasmodium falciparum merozoites and report knockout of PfMSP2, a critical merozoite surface protein with unknown function. They describe conservation of MSP2 in P. falciparum and key avian malaria parasites, unabated growth of two knockout lines (∆MSP2) produced in divergent 3D7 and Dd2 strains, no differences in expression of key invasion-associated genes, no effect on invasion kinetics (with or without protease treatment of erythrocytes), nonsignificant effects of knockout on parasite growth inhibition by antibodies directed against key invasion-associated antigens, and do find a significant effect on potentiating AMA1 invasion inhibitory antibodies. The studies are interesting and have potential for directing vaccine design targeting erythrocyte invasion, a critical step in bloodstream expansion of malaria parasites.

      Major points:

      (1) Much of the manuscript describes negative results and this reviewer found it arduous to get through many negative or nonsignificant results before finally getting to the significant effect on AMA1 inhibitory antibodies, not presented until Figure 6! Computational studies in Fig. 1 could be a supplementary figure. Figs. 2 and 3. demonstrate knockout in 3D7 and Dd2, respectively and could be assembled into a single figure. (Notably Fig. 2A and 3A are almost identical with use of some different primers.) Fig. 2E, 2F, 3D-H, all of Fig. 4, most of Fig. 5 are all negative or insignificant results that could also be moved to supplementary data. As MSP4, MSP5, and SUB1 are presumably included in the whole genome RNA-seq experiments shown in Fig. 4C, it makes sense to remove Fig. 4A data from the paper fully. These consolidating changes would help highlight the key finding of improved binding and block of AMA1's role in invasion.

      (2) The potentiating effects on anti-AMA1 antibodies are shown with rabbit sera and purified antibodies, mouse monoclonal antibodies, and smaller i-bodies inspired by shark antibody-like receptors but not with human monoclonal antibodies (hmAbs). As naturally acquired hmAbs targeting AMA1 have been identified and characterized (PMIDs: 39632799, 40020675), would it not be important to test these antibodies in the ∆MSP2, especially as the authors emphasize the importance of their model in designing better human malaria vaccines?

      (3) Fig. 7 presents quantitative fluorescence microscopy to measure anti-AMA1 binding and support a model where MSP2 serves to sterically hinder antibody access to AMA1 on individual merozoites. I understand that the negative WD33 control is useful to contrast to the positive WD34 antibody (both bind AMA1 but only WD34 exhibits parasite growth inhibitory effects), but it seems that use of smaller i-bodies rather than conventional larger mouse or ideally human monoclonal antibodies may compromise demonstration of steric hindrance by MSP2 because smaller i-bodies may be less hinder.

      (4) Some explanation for why WD33 fails to inhibit growth despite targeting the same antigen as WD34 is needed. Are the epitopes known? Does one bind further from the RON2 binding pocket?

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Henshall et al. delete the highly abundant merozoite surface protein PfMSP2 from two Plasmodium falciparum laboratory lines (3D7 and Dd2) using CRISPR-Cas9. Parasites lacking MSP2 replicate and invade red cells normally, opposing the experimental history that suggests MSP2 is essential. Unexpectedly, the knock-outs become more susceptible to several inhibitory antibodies - most strikingly those that target the apical antigen AMA1-while antibodies to other surface or secreted proteins are largely unaffected. Recombinant MSP2 added in vitro can dampen AMA1-antibody binding, supporting a "conformational masking" model. The reported data suggest that MSP2 helps shield key invasion ligands from host antibodies and may itself be a double-edged vaccine target.

      Reviewer 1 did not have any comments we needed to address.

      Reviewer #2 (Public review):

      (1) The section describing Laverania and avian Plasmodium MSP2 comparison is a lengthy section and could be told much more concisely for clarity in delivering the key message, i.e., that conservation in distantly related Plasmodium species could indicate an important function. The identification of MSP2-like genes in avian Plasmodium species was highlighted previously in the referenced Escalante paper, so it is not entirely novel, although this paper goes into more detailed characterisation of the extent of conservation. Overall, this section takes up much more space in the manuscript than is merited by the novelty and significance of the findings.

      As outlined in point (1) for Reviewer 1 (Recommendations for the authors), we have cut back through this section and focussed on the important comparisons rather than the general observation. We have also moved the elements of Table 1 to Supplementary Figures 2, 3 and 4 to streamline the manuscript. Further description of the changes is available in the Reviewer #1 (Recommendations for the authors).

      (2) Characterisation of the knockout strains is generally thorough, though relatively few interactions were followed by live microscopy (Figures 3E-H). A minimum of 30 merozoites were followed in each assay (although the precise number is not specified in the figure or legend), but there are intriguing trends in the data that could potentially have become significant if n was increased.

      In the Figure 3 Legend we have now indicated the number of merozoite invasions followed as per the following:

      “(E-H) Key parameters of merozoite invasion were measured for both PfDd2 WT (n = 43) and PfDd2 ΔMSP2 (n = 35) parasites that had successfully invaded a RBC using live cell imaging of merozoite invasion.”

      We have also removed the more general description of ‘a minimum of 30 merozoites’ from the same Figure Legend.

      The number of schizont ruptures and subsequent merozoite invasions followed for each experiment is in line with previous studies that have investigated phenotypes with invasion inhibitors and gene knock-outs (e.g. Weiss et al. 2015, PLoS Pathogens). It is important to note that the data refers to merozoites that have completed invasion, and not just the number of merozoites that have been released from a schizont which is typically 2-4 times more than have invaded. This means we are comparing the kinetics of invasion across a relatively large sample size compared to other studies of inhibitory phenotypes. While it is possible that increasing the number of merozoites being filmed might lead to some statistical significance for some of the trends, we note that there is a limited growth phenotype overall in both short and long-term culture and this fits with the limited defect we are seeing. In order to better address this, as outlined in our response to point (7) for Reviewer 2 (Recommendations for the authors), we now discuss the trends seen in the data in additional detail.

      (3) The comparative RNAseq data is interesting, but is not followed up to any significant degree. Multiple transcripts are up-regulated in the absence of PfMSP2, but they are largely dismissed because they are genes of unknown function, not previously linked to invasion, or lack an obvious membrane anchor. Having gone to the lengths of exploring potentially compensatory changes in gene expression, it is disappointing not to validate or explore the hits that result.

      While we understand the reviewers comment, as outlined in the text we did not identify any upregulated proteins that looked like strong candidates to compensate for loss of MSP2 to explore in this manuscript. Instead, we chose to further investigate any potential loss of MSP2 phenotype that yielded the observations around improved potency of antibodies targeting some merozoite antigens with loss of MSP2. This will be explored in future studies as we try and understand the role of MSP2 in more detail and the interactions between proteins and antibodies on the merozoite surface.

      (4) Given the abundance of PfMSP2 on the merozoite surface, it would have been interesting to see whether the knockout lines have any noticeable difference in surface composition, as viewed by electron microscopy, although, of course, this experiment relies on access to the appropriate facilities.

      We agree with the reviewer, but this lies outside the scope of this manuscript and optimisation of the imaging platform used to gain biologically useful insights would take a considerable amount of work based on feedback from people working with these techniques.

      (5) One of the key findings is that deletion of PfMSP2 increases inhibition by some antibodies/nanobodies (some anti-CSS2, some anti-AMA1) but not others (anti-EBA/RH, anti-EBA175, anti-Rh5, anti-TRAMP, some anti-CSS2, some anti-AMA1). The data supporting these changes in inhibition are solid, but the selectivity of the effect (only a few antibodies, and generally those targeting later stages in invasion) is not really discussed in any detail. Do the authors have a hypothesis for this selectivity? The authors make attempts to explore the mechanisms for this antibody-masking (Figure 7), but the data is less solid. Surface Plasmon Resonance was non-conclusive, while an ELISA approach co-incubating MSP2 and anti-AMA1 antibodies to wells coated with AMA1 lacks appropriate controls (eg, including other merozoite proteins in similar experiments).

      As outlined in our response to point (7) for Reviewer 2 (Recommendations for the authors), we have repeated the ELISA based assessment of recombinant MSP2s impact on anti-AMA1 antibody binding. In addition, we have included two comparator control proteins, the intrinsically disordered MSP4 of P. falciparum and the globular domain of the neural cell adhesion molecule (NCAM, CD56, 16 kDa), and found these proteins did not impact binding of anti-AMA1 antibodies. This strengthens the data that links the presence of MSP2 to reduced activity of anti-AMA1 antibodies.

      As covered in our response to point (7) for Reviewer 2 (Recommendations for the authors) we provide additional discussion of this phenotype. We note that the list of inhibitory antibodies tested is not exhaustive, and additional antibodies may be identified where loss of MSP2 could improve potency. So although we see a consistent effect with a relatively small number of antibody targets, this does not rule out additional examples that may act earlier in invasion (for example, we noticed a small, but not statistically significant, trend for mildly inhibitory antibodies targeting MSP1-19 as well) and this makes speculating on why these two initial antibody targets at this time problematic.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) If feasible, perform ex vivo assays to demonstrate that the masking effect operates with physiologically relevant antibodies.

      For this manuscript, we focussed on characterising the MSP2 knock-out parasites using the best reagents available. We remain interested in understanding whether these lines can be used to investigate the activity of functional antibodies from malaria exposed human serum and this will be the subject of future studies.

      Reviewer #2 (Recommendations for the authors):

      (1) As noted in the Public Review, the section describing MSP2 orthologues in other Laverania and avian Plasmodium species is overly long and not the most novel section of the manuscript. It could be really radically trimmed back.

      We have taken this suggestion for the reviewer on board and have significantly cut back on our descriptions of the basic similarity properties of the conserved N and C-terminal regions as well as the description of the central variable region. Effectively, we have cut back the number of words through this section from 864 across 3 paragraphs to 478 across 2 paragraphs. While we have chosen to greatly economise our description of the N and C-terminal conserved regions, we have maintained much of the description of the similarities and differences in the central variable region as we believe the observation that this variant region still maintaining repeats, though they differ in size, number and amino acid composition, across such evolutionary distances is of interest.

      Taking the reviewers comment on board, we have also removed Table 1 from the manuscript (shows amino acid sequence properties of these regions) and instead have inserted the tables relevant for each alignment in Supplementary Figures 2, 3 and 4 as appropriate. This will streamline the main manuscript and better align amino acid property and alignment data in the one Figure. We thank the reviewer for this feedback and believe that this has helped focus the text on the most important observations.

      (2) Figure 2C - As MSP2 has stage-specific expression, it could be informative to incorporate an antibody targeting another gene with a similar stage-specific expression pattern, such as AMA,1 into the blot. This would confirm that both protein samples were collected at a similar point during blood stage development.

      We have modified Figure 2C to include both the original comparison using PfAldolase as the loading control and also the merozoite expressed PfGAP45 as a loading/stage specific control as per the Figure.

      (3) Figure 2D - Magenta and red are hard to distinguish in the merge channel. Is it possible to pseudocolour one of these channels a different colour? Also, it would be simpler to keep PfMSP2 a consistent colour in both rows.

      Thank you for this suggestion and we agree that the comparison could be made clearer. For this figure, we have coloured DAPI to label the nuclei (Cyan), and antibodies targeting PfMSP2 (Magenta), PfAMA1 and PfMSP1-19 (Yellow). This is also reflected in the merged image. The Figure legend now reads:

      “(D) Distribution of key merozoite surface proteins in the presence or absence of PfMSP2 was visualised by immunofluorescence. PfMSP2 (magenta), the nucleus stained by DAPI (cyan) and PfAMA1 (yellow, top two rows) or PfMSP1-19 (yellow, bottom two rows), and the coloured merge of the preceding panels. Scale bar = 0.7 µm. Representative images shown from a minimum of 10 schizonts imaged per condition.”

      (4) Figure 2F - Static growth relative to shaking growth is plotted in this panel; perhaps this could be more clearly described in the legend or mentioned in the text that there was not a significant alteration in growth in static or shaking conditions.

      As suggested, we have clarified the result in the Figure legend text as follows:

      “(E-F) Growth of Pf3D7 WT compared to Pf3D7 ΔMSP2 P. falciparum parasites, measured as fold increase in parasitaemia, over one (48 hrs) or two (96 hrs) cycles in either standard (still- (E)) or shaking (F) conditions, with no measurable difference between parasite growth rates seen between standard or shaking conditions.”

      Please also describe the shaking conditions used (i.e., speed, culture size, and vessel) in the methods.

      We have updated the methods to provide information on the growth conditions used in the standard versus shaking growth assays:

      “The initial parasitemia of cultures was determined by flow cytometry and then measured again after the 50 mL cultures in 96 well plates were maintained under standard (still) or shaking (50 rpm) conditions for 48 hrs or 96 hrs of growth.”

      (5) Figure 3G - Annotate legend for strength of deformation to describe what 1,2, or 3 refers to.

      We have added the following to the Figure legend of Figure 3G:

      “Deformation scores are as defined by Weiss et al (Weiss et al., 2015), with 1 = weak deformation of the RBC membrane at the point of contact, 2 = strong deformation leading to the RBC membrane extending up the sides of the merozoite and changes in RBC membrane curvature beyond the point of contact and 3 = extreme deformation indicated by the merozoite being deeply embedded in the RBC membrane and strong deformation of the RBC well beyond the point of contact.”

      There is a small visible shift in the deformation event scores. Is this also not significant? Even if deformation is not significantly longer, could this small effect alter the exposure of epitopes on other proteins for antibody targeting?

      We did test the deformation event scores and the differences were non-significant. We have considered this possibility raised by the reviewer, but we are cautious in over interpreting the possibility that these trends might contribute to the increased potency of certain antibodies in the absence of additional data. We note that, although deformation may happen over a slightly longer timescale and show more aggressive deformations with PfMSP2 knock-out, this also seems to translate into a weak trend for faster overall entry for those merozoites that go on to invade. Therefore, although deformation may be longer and stronger, antibodies may have less time to block invasion overall. We are not confident that we can interpret around what might be happening at the molecular scale here based on this data and have chosen not to discuss this possibility in the manuscript. However, we have added the following to the results to better explain the phenotype the phenotype we observed.

      “This analysis showed that, although there was a trend for PfDd2 ΔMSP2 knock-out parasites to have a higher mean time to attach to the RBC, as well as for the length and strength of RBC deformation, these trends did not reach significance. For those merozoites that did invade the RBC, on average it took less time for PfDd2 ΔMSP2 knock-out parasites to invade then PfDd2 WT, but this again did not reach significance (Figure 3 E-H). Together these data show PfMSP2 is not essential for blood-stage replication in vitro in two P. falciparum laboratory isolates from different geographical regions and knock-out of PfMSP2 does not seem to significantly impact parasite growth or merozoite invasion in vitro.”

      (6) Figure 4C - Legend refers to black lines, but on the figure, they are red? Is the horizontal red line in the correct place, or should some of the dots below it be black rather than blue if they fall outside the adjusted p-value significance cut-off? Were 4 schizont harvests performed in total, or 4 for each cell line?

      We thank the reviewer for pointing this out and we have now changed the text to say red lines. We have also provided more information in the Figure legend to more clearly define what data is represented. In short, 4 harvests were performed for each cell line (8 in total across the 2 cell lines) and the data represents the distribution from one of these harvests. The blue shaded genes are those that, on average, across the 4 Pf3D7 WT and Pf3D7 ΔMSP2 paired harvests show up or down-regulated expression. This is why some of the blue shaded genes lie near or below the cut-off values represented by the red line. The Figure legend text has now been modified as follows.

      “(C) Log2(fold change) for differentially expressed genes, including multigene families, between the transcriptome of Pf3D7 WT and Pf3D7 ΔMSP2 schizonts. Plot represents the results for one of four independent schizont RNA harvests for Pf3D7 WT and Pf3D7 ΔMSP2 parasites and red lines differentiate genes with a log2 (fold change) > 0.5 and < -0.5 with adjusted p-value < 0.01. Genes shaded blue represent those genes that were found to have an average log2 (fold change) > 0.5 (dark blue) or < -0.5 (light blue) across the four replicate samples compared. Significance determined as below p< 0.05 after correction for multiple testing.”

      (7) Figure 7D - ELISA results don't show a convincing concentration-dependent inhibition, and repeating with another recombinant protein is essential before inferring that the effect is specific to PfMSP2

      We have repeated the ELISA experiment using recombinant PfMSP2 to reduce variability across the assay and again found a dose dependent reduction of anti-PfAMA1 binding with increasing concentrations of recombinant PfMSP2. It should be noted that this is a completely new set of experiments that recapitulate the original findings. See updated Figure 7D.

      We agree with the reviewer that the experiment and interpretation of the data would be strengthened by comparing any potential inhibitory impact on anti-PfAMA1 binding to a different recombinant protein. Therefore, we have completed identical experiments using the similarly intrinsically disordered PfMSP4 recombinant protein (40 kDa) and the highly structured 16 kDa immunoglobulin domain of human neural cell adhesion molecule (NCAM). We find that there is no dose dependent loss of anti-PfMAMA1 binding to recombinant PfAMA1 with addition of PfMSP4 or NCAM immunoglobulin domain recombinant protein. These controls are contained in Supplementary Figure 6, the relevant text is provided below.

      ‘In contrast, increasing concentrations of the intrinsically disordered MSP4 from P. falciparum 3D7 (40 kDa) and the highly structured immunoglobulin domain of neural cell adhesion molecule (NCAM, CD56, 16 kDa) recombinant proteins did not impact on binding of anti-PfAMA1 antibodies to recombinant AMA1 (Supplementary Figure 6).’

      (8) Again, as noted in the public review, the target-specificity of the inhibition-masking effect is perhaps the most surprising aspect of the data - this could do with much more thorough discussion. Why only these proteins, both of which function late in invasion?

      Overall, we tested several growth inhibitory and non-inhibitory antibodies shown to bind specifically to individual or some combination of nine P. falciparum merozoite surface and secreted proteins. However, we do not consider this to be an exhaustive list of potentially invasion inhibitory antibodies by any means. We mostly did not observe any non-inhibitory antibodies becoming significantly more growth inhibitory to PfMSP2 KO lines, indicating that these antibodies were not impacted by loss of PfMSP2 or had no functional inhibitory effect in these assays.

      What we do demonstrate here is that we see a consistent impact with different rabbit, mouse monoclonal and i-body growth inhibitory antibodies targeting PfAMA1, indicating that it is not a spurious result from a single antibody or antibody type. We also find a second example, with nanobodies targeting the PfPCRCR complex protein PfCSS potentiated with loss of PfMSP2. This opens up the possibility that other growth inhibitory antibodies to the antigens tested here, or growth inhibitory antibodies targeting other antigens involved in merozoite invasion, may also become more potent with MSP2KO. Although both PfAMA1 and PfCSS function late in invasion, it is too early to say whether this is a functional trend or an observation that is related to the panel of antibodies tested. Therefore, further testing using lines developed in this study could yield additional examples of antibodies that become more inhibitory with MSP2 KO and provide additional information on the potential impact that MSP2 may have on their vaccine potential. In order to address this, we have added the following text to the discussion:

      “Here we show consistent potency improvement with PfMSP2 knock-out for growth inhibitory rabbit, mouse monoclonal and i-body antibodies targeting PfAMA1, as well as demonstrate improved activity for and Fc-tagged nanobody targeting PfCSS, indicating that these are not outlier results from a single antibody or antibody type. However, increased antibody potency was not shared across all antibodies tested, possibly because the specific function or localisation of a target protein, the region that an antibody binds to or the functional activity (or lack thereof) of an antibody may all play a role in determining whether loss of PfMSP2 can potentiate growth inhibitory activity. Further investigation using the parasite lines developed in this study and a wider panel of antibodies that target different stages of the merozoite invasion process could shed more light on this potentially novel mechanism of vaccine derived antibody efficacy.”

      (9) Typos/minor editorial points:

      L111 – conserved

      This text has been modified.

      L235-237 - check the wording in this sentence for clarity

      This text has been modified.

      Figure 3E - 'attachment' on axis

      This Figure has been modified.

      L350 - mentions eight 'proteins' having expression increase, instead 'transcripts' should be referred to when describing RNAseq data, as transcript levels may not correspond directly with protein levels. Also, be careful when referring to transcript or protein throughout this paragraph.

      This text has been modified.

      Figure 4A - instead of 'transcription during schizonts', better to say 'schizont transcript abundance'

      This text has been modified.

      L514 - 'detectable binding to PfAMA1'

      This text has been modified.

      L589 - Is it a mouse Fc region or a human Fc region that is added? The human Fc region is mentioned in the results.

      In the growth inhibition assays anti-AMA1 WD34 i-body with a human FC region was used and in the ELISA assays anti-AMA1 WD34 i-body with a mouse FC region (to enable detection of AMA1 binding use the same secondary anti-body for both the WD34 i-body and the 4G2 mouse monoclonal antibody) was used. The text has been been checked and modified accordingly to clearly say this.

      Supplementary figure 3 - 'repeats'

      This text has been modified.

    1. Ok, so these steps aren't really "easy". Some of them will take months. Just figuring out how to use IRC will take awhile if you're new to it. I've purposely been vague on the details so you can figure it out on your own. You'll need ingenuity, a search engine, and a lot of patience.But I guarantee, 100%, for-sure, that if you complete my 5 step program, you will be able to easily find a good starting job programming.

      Some observations:

      • While it isn't easy, it is at least as easy as getting into a good school, and going to class and making good grades for four years. And with that said:

      • Isn't this (the guarantee here—"guarantee"!) basically the same as the beliefs corresponding to the aforementioned bubble?

      I think what Boodman failed to grasp at the time he first wrote this was that (b) many of those in his audience were persons who, having just invested four years into going to college with a reasonable expectation that it would pay off, weren't in a position (read: didn't have the resources) to suspend life for another 2-to-4 years trying to follow a plan like this (step #4 here—"Repeat[ing] steps 2 and 3 about 50 times"—is landing a patch to fix a new Chrome bug every two weeks) on the promise of an even more uncertain payoff, and—more perniciously—(b) a subset of those in his audience would have been those who were (more or less) already as capable as needed by the requirements of this process, but were in his audience nonetheless because, despite being so, it wasn't enough to "get them a good job" due to the stochastic nature of the market (incl. the fickle humans who are often gatekeepers in the "find a good job" process).

    1. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe the generation of a Drosophila model of RVCL-S by disrupting the fly TREX1 ortholog cg3165 and by expressing human TREX1 transgenes (WT and the RVCL-S-associated V235Gfs variant). They evaluate organismal phenotypes using OCT-based cardiac imaging, climbing assays, and lifespan analysis. The authors show that loss of cg3165 compromises heart performance and locomotion, and that expression of human TREX1 partially rescues these phenotypes. They further report modest differences between WT and mutant hTREX1 under overexpression conditions. The study aims to establish Drosophila as an in vivo model for RVCL-S biology and future therapeutic testing.

      Strengths:

      (1) The manuscript addresses an understudied monogenic vascular disease where animal models are scarce.

      (2) The use of OCT imaging to quantify fly cardiac performance is technically strong and may be useful for broader applications.

      (3) The authors generated both cg3165 null mutants and humanized transgenes at a defined genomic landing site.

      (4) The study provided initial in vivo evidence that human TREX1 truncation variants can induce functional impairments in flies.

      Weaknesses:

      (1) Limited mechanistic insight.

      RVCL-S pathogenesis is strongly linked to mislocalization of truncated TREX1, DNA damage accumulation, and endothelial/podocyte cellular senescence. The current manuscript does not examine any cellular, molecular, or mechanistic readouts - e.g. DNA damage markers, TREX1 subcellular localization in fly tissues, oxidative stress, apoptosis, or senescence-related pathways. As a result, the model remains largely phenotypic and descriptive.

      To strengthen the impact, the authors should provide at least one mechanistic assay demonstrating that the humanized TREX1 variants induce expected molecular consequences in vivo.

      (2) The distinction between WT and RVCL-S TREX1 variants is modest.

      In the cg3165 rescue experiments, the authors do not observe differences between hTREX1 and the V235Gfs variant (e.g., Figure 3A-B). Phenotypic differences only emerge under ubiquitous overexpression, raising two issues:

      (i) It is unclear whether these differences reflect disease-relevant biology or artifacts of strong Act5C-driven expression.

      (ii) The authors conclude that the model captures RVCL-S pathogenicity, yet the data do not robustly separate WT from mutant TREX1 under physiological expression levels.

      The authors should clarify these limitations and consider additional data or explanations to support the claim that the model distinguishes WT vs RVCL-S variants.

      (3) Heart phenotypes are presented as vascular defects without sufficient justification.

      RVCL-S is a small-vessel vasculopathy, but the Drosophila heart is a contractile tube without an endothelial lining. The authors refer to "vascular integrity restoration," but the Drosophila heart lacks vasculature.

      The manuscript would benefit from careful wording and from a discussion of how the fly heart phenotypes relate to RVCL-S microvascular pathology.

      (4) General absence of tissue-level or cellular imaging.

      No images of fly hearts, brains, eyes, or other tissues are shown. TREX1 nuclear mislocalization is a hallmark of RVCL-S, yet no localization studies are included in this manuscript.

      Adding one or two imaging experiments demonstrating TREX1 localization or tissue pathology would greatly enhance confidence in the model.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe the generation of a Drosophila model of RVCL-S by disrupting the fly TREX1 ortholog cg3165 and by expressing human TREX1 transgenes (WT and the RVCL-S-associated V235Gfs variant). They evaluate organismal phenotypes using OCT-based cardiac imaging, climbing assays, and lifespan analysis. The authors show that loss of cg3165 compromises heart performance and locomotion, and that expression of human TREX1 partially rescues these phenotypes. They further report modest differences between WT and mutant hTREX1 under overexpression conditions. The study aims to establish Drosophila as an in vivo model for RVCL-S biology and future therapeutic testing.

      Strengths:

      (1) The manuscript addresses an understudied monogenic vascular disease where animal models are scarce.

      (2) The use of OCT imaging to quantify fly cardiac performance is technically strong and may be useful for broader applications.

      (3) The authors generated both cg3165 null mutants and humanized transgenes at a defined genomic landing site.

      (4) The study provided initial in vivo evidence that human TREX1 truncation variants can induce functional impairments in flies.

      Weaknesses:

      (1) Limited mechanistic insight.

      RVCL-S pathogenesis is strongly linked to mislocalization of truncated TREX1, DNA damage accumulation, and endothelial/podocyte cellular senescence. The current manuscript does not examine any cellular, molecular, or mechanistic readouts - e.g. DNA damage markers, TREX1 subcellular localization in fly tissues, oxidative stress, apoptosis, or senescence-related pathways. As a result, the model remains largely phenotypic and descriptive.

      We thank the reviewers for these suggestions. We are planning to perform experiments addressing the RVCL-S linked cellular deviations. We will examine DNA damage markers on cellular level and perform TUNEL tissue staining to visualize apoptosis, etc.

      To strengthen the impact, the authors should provide at least one mechanistic assay demonstrating that the humanized TREX1 variants induce expected molecular consequences in vivo.

      Yes, we are planning to demonstrate the distinct effects from TREX1 and TREX1 V235G expression on molecular level.

      (2) The distinction between WT and RVCL-S TREX1 variants is modest.

      In the cg3165 rescue experiments, the authors do not observe differences between hTREX1 and the V235Gfs variant (e.g., Figure 3A-B). Phenotypic differences only emerge under ubiquitous overexpression, raising two issues:

      i) It is unclear whether these differences reflect disease-relevant biology or artifacts of strong Act5C-driven expression.

      Thanks for pointing out this issue. We will discuss the differences between two expression models in the revised manuscript.

      ii) The authors conclude that the model captures RVCL-S pathogenicity, yet the data do not robustly separate WT from mutant TREX1 under physiological expression levels.

      We will provide more details related to the RVCL-S disease development and agerelated manifestations.

      The authors should clarify these limitations and consider additional data or explanations to support the claim that the model distinguishes WT vs RVCL-S variants.

      We will address the reviewer concerns and re-write the related manuscript sections to provide more clarity.

      (3) Heart phenotypes are presented as vascular defects without sufficient justification.

      RVCL-S is a small-vessel vasculopathy, but the Drosophila heart is a contractile tube without an endothelial lining. The authors refer to "vascular integrity restoration," but the Drosophila heart lacks vasculature.

      We will expand the model justification section and will be more careful with our statements to avoid misunderstanding of the experimental conclusions.

      The manuscript would benefit from careful wording and from a discussion of how the fly heart phenotypes relate to RVCL-S microvascular pathology.

      We thank the reviewer for pointing to this issue. Justifying Drosophila usage for human disease modelling is always challenging. We will re-write the corresponding parts of the manuscript.

      (4) General absence of tissue-level or cellular imaging.

      No images of fly hearts, brains, eyes, or other tissues are shown. TREX1 nuclear mislocalization is a hallmark of RVCL-S, yet no localization studies are included in this manuscript. Adding one or two imaging experiments demonstrating TREX1 localization or tissue pathology would greatly enhance confidence in the model.

      As suggested by the reviewers,we will add tissue imaging experiments to illustrate the pathological effects of RVCL linked TREX1 expression. We are also planning to utilize CRIMIC line CR70804 to visualize fly TREX1 tissue distribution.

      Reviewer #2 (Public review):

      Summary:

      The authors used the Drosophila heart tube to model Retinal vasculopathy with the goal of building a model that could be used to identify druggable targets and for testing chemical compounds that might target the disease. They generated flies expressing human TREX1 as well as a line expressing the V235G mutation that causes a C-terminal truncation that has been linked to the disease. In humans, this mutation is dominant. Heart tube function was monitored using OCM; the most robust change upon overexpression of wild-type or mutant TREX1was heart tube restriction, and this effect was similar for both forms of TREX1.

      Our results are consistent with the human disease nature, RVCL-S carriers and non-carriers are both healthy and asymptomatic at young age; however, the accumulation of physiological stress becomes obvious in midlife, leading to premature death in 40s and 50s. We will expand the discussion section focusing on RVCL-S manifestations in aged animals.

      Lifespan and climbing assays did show differential effects between wt and mutant forms when they were strongly and ubiquitously expressed by an actin-Gal4 driver. Unfortunately, these types of assays are less useful as drug screening tools. Their conclusion that the primary effect of TREX is on neuronal function is inferential and not directly supported by the data.

      We will revise this experiment discussion and plan to include additional experiments to strengthen the conclusions.

      The authors do not show that CG3165 is normally expressed in the heart. Further fly heart tube function was similarly restricted in response to expression of either wild-type or mutant TREX1. The fact that expression of any form of human TREX1 had deleterious effects on heart function suggests that TREX1 serves different roles in flies compared to humans. Thus, in the case of this gene, it may not be a useful model to use to identify targets or use it as a drug screening tool.

      We will examine the expression of cg3165, human TREX1 transgenes in whole organism to demonstrate tissue expression profiles, as noted above. We will also expand the relevant manuscript sections to address the systemic manifestations of RVCL.

      The significant effects on lifespan and climbing that did show differential effects required ubiquitous overexpression using an actin-gal4 driver that does not allow the identification of tissue-specific effects.

      We plan to carry out additional experiments to determine cg3165, and human TREX1 tissue expression profile.

      Thus, their assertion that the results suggested a strong positive correlation between Drosophila neuromotor regulation and transgenic hTREX1 presence and a negative impact from hTREX1 V235G" is not supported by these data.

      Thanks for pointing this out. We will revise our conclusions appropriately after we include the results from additional new experiments.

      Also worrisome was the inability to identify the mutant TREX1 protein by Western blot despite the enhanced expression levels suggested by qPCR analysis. Mutant TREX1 cannot exert a dominant effect on cell function if it isn't present.

      We will try to resolve this issue by technical means.

      There are also some technical problems. The lifespan assays lack important controls, and the climbing assays do not appear to have been performed correctly.

      We would disagree with this statement. We will re-write the method description for better clarity.

      It is unclear what the WT genetic background is in Figure 1-3, so it is unclear if the appropriate controls have been used. Finally, the lack of information on the specific statistical analyses used for each graph makes it difficult to judge the significance of the data.

      We will provide clearer descriptions of our controls and procedures.

      Overall, the current findings establish the Retinal vasculopathy disease model platform, but with only incremental new data and without any mechanistic insights.

      We will include additional experiments addressing the mechanism (see previous responses above).

      Reviewing Editor Comments:

      I (Hugo Bellen) also read your paper and noted that you do not document the expression pattern in the nervous system and other tissues, such as the heart. The stock https://flypush.research.bcm.edu/pscreen/crimic/info.php?CRname=CR70804 may help you do this and should allow you to compare the GAL4 induced expression of the stock you created and this stock. If compatible, you should consider reporting expression patterns.

      Thank you for the suggestion. We will obtain the line and will use it for expression visualization.

        • Tourist is an ugly human being* - It immediately establishes a critical tone.
      1. How alone you feel in this crowd- tourist fell slightly alienated but locals experience deeper issues like poverty and exploitation
      2. Heaps of Death and ruin- Tourists can leave anytime, while locals are stuck
        • People who inhabit the place in which you just passed cannot stand you*- Tourists can leave anytime while locals are stuck with consequences of tourism
      3. And every tourist is a native somewhere- Travel freedom is privilege that many people in the world do not have due to money, or is it that they lack freedom of the mind?
    1. via Thom Cholowski at Rebel Typewriter Workshop in Saskatoon,Saskatchewan, Canada at https://www.facebook.com/groups/typewritermaintenance/posts/4373364699567612/

      Rules of Good Repair Practices:<br /> The Success of the repair depends upon the Rapairman's ability in following the rules of good repair practices. These rules can be summed up as follows:

      1. Read the Service Bulletin thoroughly.
      2. Know what you are looking for. Never guess. Always investigate the source of trouble. Ask, if you do not know.
      3. Determine troubles by the process of elimination.
      4. Every part, no matter how small, has a job to perform. Do not overlook the smallest detail.
      5. Use the proper tool. Keep tools in efficient working order.
      6. Carefully handle smooth, plated or bakelite parts to prevent damaging or scratching.
      7. Make the repair like you would want it, if you were the customer.
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors appear to be excluding a significant fraction of the TCRlow gamma delta T cells from their analysis in Figure 1A. Since this population is generally enriched in CD25+ gamma delta T cells, this gating strategy could significantly impact their analysis due to the exclusion of progenitor gamma delta T cell populations.

      We were cautious in our gating strategy since the TCR𝛿+ CD3e+ subset is rather small and so low signal/background noise ratio can be an issue if the gates used are too broad/generous. There is some inevitable low level background staining with the TCR𝛿 that sits just above the bulk of the negative population and is CD3ε -ve. Although this background represents a tiny fraction of total cells, we were wary of gate contamination into our TCR𝛿+ CD3e<sup>+</sup> subset and we wanted a gating strategy that could be applied across other organs too. We do not, however, believe this conservative strategy is impacting on measurements progenitor numbers across strains or our conclusions, since the size of this progenitor population in the various IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains was never impacted by the mutations. But to reassure the reviewer, we show our conservative gate as compared with a very broad TCR𝛿 gate and see we are not missing a substantial population of CD25+ cells just below our gate. This also helps illustrate how close the background from the CD27<sup>int</sup> expressing αβ thymocytes (right column) comes to the TCR𝛿+ CD3+ gate and the importance of tight lineage gating.

      Author response image 1.

      (2) The overall phenotype of the IKKDeltaTCd2 mice is not described in any great detail. For example, it is not clear if these mice possess altered thymocyte or peripheral T cell populations beyond that of gamma delta T cells.

      Given that gamma delta T cell development has been demonstrated to be influenced by gamma delta T cells (i.e, trans-conditioning), this information could have aided in the interpretation of the data.

      Apologies for not being clearer on this point. We have studied conventional αβ T cell development in these strains in considerable detail, and these studies are published and discussed in some detail in the introduction in paragraph 3 on page 3-4 and in cited references Schmidt-Supprian et al 2004, SIlva et al 2014, Xing et al 2016, Webb et al 2019, Carty et al 2023. These detail how IKK expression is critical for thymic development of αβ T cells and their peripheral survival, and dissects the role of NF-κB activation and cell death regulation by IKK. However, we now add new discussion (page 11-12) that considers the potential impact of altered αβ T cell development in the strains used for this study.

      We agree that trans-conditioning is also an important consideration, since CD4 TH17 T cells can enhance type 17 𝛾𝛿 T cell development (10.1038/icb.2011.50). This is of relevance to the limited conclusions we draw concerning type 17 𝛾𝛿 T cells. The REL and IKK deficient strains do lack effector populations, including type 17 αβ T cells, so it is possible that the absence of type 17 αβ T cells in these strains does contribute to the modest impact of IKK deletion in the type 17 𝛾𝛿 subset. We now highlight this information and discuss in the manuscript (page 11-12).

      Related to this, it would have been helpful if the authors provided a comparison of the frequencies of each of the relevant subsets, in addition to the numbers.

      We now provide both the absolute frequencies of different 𝛾𝛿 subsets and their relative frequencies to one another, as supplementary figure 2. We still believe assessing absolute numbers is the gold standard, since the differential impact of gene deletions on the αβ T cell compartments in different strains will effect whether or not αβ T cells are present, and therefore overall representation of 𝛾𝛿 T cells can vary considerably between strains. Hence, absolute numbers are more reliable measure of cell abundance.

      (3) The manner in which the peripheral gamma delta T cell compartment was analyzed is somewhat unclear. The authors appear to have assessed both spleen and lymph node separately. The authors show representative data from only one of these organs (usually the lymph node) and show one analysis of peripheral gamma delta T cell numbers, where they appear to have summed up the individual spleen and lymph node gamma delta T cell counts. Since gamma deltaT17 and gamma deltaT1 are distributed somewhat differently in these compartments (lymph node is enriched in gamma deltaT17, while spleen is enriched in gamma deltaT1), combining these data does not seem warranted. The authors should have provided representative plots for both organs and calculated and analyzed the gamma delta T cell numbers for both organs separately in each of these analyses.

      We did of course process and calculate numbers of different subsets in both lymph nodes and spleen. Where we saw loss of peripheral 𝛾𝛿 subsets, or rescue, this was reflected in seperate analysis of both organs and we did not see any organs specific effects in the mouse strains analysed. We therefore took the initial view that presenting aggregate data was most efficient and least repetitive representation of data. However, we very much recognise the reviewers concern, and interest to see these data, so have now included representative plots across both organs for figure 1D, and show cell numbers of lymph nodes and spleen separately, as well as together, for figures 1, 2, 4 and 7, and these plots reflect the differences observed when we combined data. We did not break down the data for all figures (e.g. figures 3 and 5) as it was more cumbersome for more complex multi-strain comparisons and so attempt to balance clarity and transparency against unnecessary repetitive data presentation.

      (4) The authors make extensive use of surrogate markers in their analysis. While the markers that they choose are widely used, there is a possibility that the expression of some of these markers may be altered in some of their genetic mutants. This could skew their analysis and conclusions. A better approach would have been to employ either nuclear stains (Tbx21, RORgammaT) or intracellular cytokine staining to definitively identify functional gamma deltaT1 or gamma deltaT17 subsets.

      We did share a similar concern, but think this is not an issue where subsets disappear and are almost completely absent, such as in IKK1/2 KO and Casp8 KO settings. Where we saw rescue with RIPK1<sup>D138N</sup> in Casp8ΔT<sup>CD2</sup> strains, we were keen to demonstrate that the populations we saw restored did exhibit their expected function, and so confirmed this in figure 5C by intracellular cytokine staining after a short 4h restimulation in vitro. This also served to validate our gating strategy, since what we designated as Type 1 cells - CD27+CD122+CD44<sup>int</sup> cells were the only source of IFN-gamma, while CD27–CD44<sup>hi</sup> CD122<sup>lo</sup> cells were the only source of IL-17. Adaptive/ naive cells made neither cytokine. So while we did not include nuclear stains, we were satisfied that the cytokine assays validated the gating strategy.

      (5) The analysis and conclusion of the data in Figure 3A is not convincing. Because the data are graphed on log scale, the magnitude of the rescue by kinase dead RIPK1 appears somewhat overstated. A rough calculation suggests that in type 1 game delta T cells, there is ~ 99% decrease in gamma delta T cells in the Cre+WT strain and a ~90% decrease in the Cre+KD+ strain. Similarly, it looks as if the numbers for adaptive gamma delta T cells are a 95% decrease and an 85% decrease, respectively. Comparing these data to the data in Figure 5, which clearly show that kinase dead RIPK1 can completely rescue the Caspase 8 phenotype, the conclusion that gamma delta T cells require IKK activity to repress RIPK1-dependent pathways does not appear to be well-supported. In fact, the data seem more in line with a conclusion that IKK has a significant impact on gamma delta T cell survival in the periphery that cannot be fully explained by invoking Caspase8-dependent apoptosis or necroptosis. Indeed, while the authors seem to ultimately come to this latter conclusion in the Discussion, they clearly state in the Abstract that "IKK repression of RIPK1 is required for survival of peripheral but not thymic gamma delta T cells." Clarification of these conclusions and seeming inconsistencies would greatly strengthen the manuscript. With respect to the actual analysis in Figure 3A, it appears that the authors used a succession of non-parametric t-tests here without any correction. It may be helpful to determine if another analysis, such as ANOVA, may be more appropriate.

      Yes, we completely agree with this assessment and conclusion. While kinase dead RIPK1 does provide some rescue, this appears relatively modest, and instead supports the view, validated in figure 7, that maybe the dominant function of IKK in 𝛾𝛿 T cells is to activate NF-κB dependent survival signals. Nevertheless, RIPK1<sup>D138N</sup> does provide some significant rescue, which allows some peripheral cells to repopulate and demonstrates that IKK is repressing RIPK1 mediated cell death. It is actually not trivial to assess the relative importance of IKK-RIPK1 and IKK-NF-κB functions. In the IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice, we prevent RIPK1 induced death, but still lack the NF-κB-dependent survival signal. Consistent with this, the ~1log reduction in 𝛾𝛿 numbers between WT and IKKΔT<sup>CD2</sup> RIPK1<sup>D138N</sup> mice is actually similar to what we observe in the absence of REL subunits (Fig. 7) which is a smaller reduction than we observe in IKKΔT<sup>CD2</sup> mice. What would have been ideal is to have a scenario where IKK regulation of RIPK1 was defective but NF-κB survival signalling was intact. This would reveal the full impact of loosing IKK dependent regulation of RIPK1 alone, which we suspect would result in substantial cell death that could not be blocked by NF-κB. Unfortunately, we not have or know of suitable mouse mutants to test this. This is quite a nuanced discussion and we now clarify the scope and extent of conclusions we can draw (p. 7, 11).

      (6) The conclusion that the alternative pathway is redundant for the development and persistence of the major gamma delta T cell subsets is at odds with a previous report demonstrating that Relb is required for gamma delta T17 development (Powolny-Budnicka, I., et al., Immunity 34: 364-374, 2011). This paper also reported the involvement of RelA in gamma delta T17 development. The present manuscript would be greatly improved by the inclusion of a discussion of these results.

      Thank you - we include a discussion of these papers now (p12).

      (7) The data in Figures 1C and 3A are somewhat confusing in that while both are from the lymph nodes of IKKdeltaTCD2 mice, the data appear to be quite different (In Figure 3A, the frequency of gamma delta T cells increases and there is a near complete loss of the CD27+ subset. In Figure 1A, the frequency of gamma delta T cells is drastically decreased, and there is only a slight loss of the CD27+ subset.)

      Yes, we agree these do like quite different and could be confusing. The lymph nodes from IKKΔT<sup>CD2</sup> lack αβ T cells and B cells, and so the cellularity is much lower than normal. Consequently, the percentage representation of remaining cells can be more noisy, while total cellularity calculations are more consistent. This is not an issue in the other strains that all have more cells in lymph nodes. We now show plots from spleen of the same mice which appear better aligned with additional splenic data shown in Figure 1.

      Reviewer #2 (Public review):

      (1) All approaches used confer changes to the entire T cell compartment. Therefore, the authors are unable to resolve whether the observations are mediated by direct and/or indirect effects (e.g., disorganized lymphoid architecture impacting maintenance/survival/homing).

      We address this important point in the discussion (p11-12). The impacts of gene deletions upon αβ and 𝛾𝛿 T cells operate independently of one another (as also discussed in response to reviewer 1). For instance, the phenotype of αβ T cells is identical in IKKΔT<sup>CD2</sup> and IKKΔT<sup>CD4</sup> mice - 𝛾𝛿 T cells are only targeted in IKKΔT<sup>CD2</sup> mice. Similarly, the phenotype of 𝛾𝛿 T cells is similar in IKKΔT<sup>CD2</sup> vs Casp8.IKKΔT<sup>CD2</sup> strains. αβ T cells are absent from IKKΔT<sup>CD2</sup> but present in near normal numbers in Casp8.IKKΔT<sup>CD2</sup> mice. Others have also noted that 𝛾𝛿 T cell development is normal in Rag deficient mice (10.1126/science.1604321). In any case, an absence of αβ T cells is expected to promote 𝛾𝛿 T cell survival in the absence of competition for common utilised cytokines such as IL-7 and IL-15, though we do not see much evidence for this in mice with and without αβ T cells such as IKKΔT<sup>CD2</sup> vs Casp8. IKKΔT<sup>CD2</sup> strains. We do now discuss the potential contribution of trans-conditioning for type 17 𝛾𝛿 T cell development (p12).

      (2) Assessment of factors that impact T cell numbers in the periphery is necessary. Are there observable changes to the proliferation, survival, and migration of gd T cell subsets?

      In IKKΔT<sup>CD2</sup> and Casp8. IKKΔT<sup>CD2</sup> deficient strains, we infer a defect in survival, since they lack peripheral 𝛾𝛿 T cells, despite normal thymic development. Their absence made it hard to assess proliferation and migration, though 𝛾𝛿 T cells were absent from all lymphoid organs. The conclusions that defective survival is responsible for the absence of 𝛾𝛿 T cells in the different strains is also supported by the rescue of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains by kinase dead RIPK1D138N. Furthermore, the presence of small numbers of residual populations in lymph nodes and spleen of IKKΔT<sup>CD2</sup> and Casp8ΔT<sup>CD2</sup> strains demonstrates that migration patterns were normal. Were cells unable to recirculate, they might be expected to fail to leave the thymus, or to accumulate in the spleen. We so no evidence of either of these scenarios.

      (3) TCRd chain usage, especially among type 3 gd T cells, should be assessed.

      We did not unfortunately, assess chain usage, choosing rather to rely of phenotypic identity of specific subsets, which we show in figure 5C, was extremely robust. IL-17 was only secreted by CD27– CD44<sup>hi</sup> 𝛾𝛿 T cells, while IFN-gamma was only secreted by CD27+ CD44<sup>hi</sup> 𝛾𝛿 T cells. We argue that the production of these key effector cytokines is the most direct test of a subsets functional identity and the phenotypic designation is robust.

      (4) The functional consequences of IKK signaling on gd T cells were largely unaddressed. Cytokine analyses were performed only in the RIPK1D138N Casp8∆TCD2 model, leaving open the question of how canonical NF-κB-dependent signaling impacts the long-term functionality of gd T cells.

      Yes, we agree this remains an open question around the transcriptional mechanisms by which NFκB signalling promotes cell survival, and one best addressed in future studies. We did not perform cytokine staining more widely, because the cytokine assay relies on short term re-stimulation of T cells with PMA and ionomycin. PMA activates PKC which in turn activates NF-κB signalling to elicit the cytokine response measured in this assay. As such, the results of such assays would be hard to interpret. We agree it would be interesting to investigate the functional consequences of REL deficiency in future studies, although this may need a more nuanced setting where 𝛾𝛿 T cells are not lost as a result of their defective survival.

      (5) The authors suggest that Caspase 8 is required for the development and maintenance of type 3 gd T cells. While the authors discussed the limitations of assessing adult mice in interpreting the data, it seems like a relatively straightforward experiment to perform.

      We did attempt these experiments with collaborators by analysing type 17 𝛾𝛿 T cell development in fetal thymic organ culture (FTOC). However, the GM mice are not so easy to breed and generating the large numbers of embryos required to set up the FTOCs proved too challenging and we were unable to generate these data.

      (6) While analyses of Casp8∆TCD2 RIPK1D138N mice suggest that loss of adaptive and type 1 gamma delta T cells in Casp8∆TCD2 animals is due to necroptosis, the contribution of RIPK3 kinase activity remains unexamined. RIPK3 activity determines whether cells die via necroptosis or apoptosis in RIPK1/Caspase8-dependent signaling, and inclusion of this analysis would strengthen mechanistic insights.

      Given time and resources, it would have been ideal to confirm necroptotic cell death by alternative knockouts, such as RIPK3 or MLKL. However, formation of the necrosome is dependent on kinase active RIPK1, since autophosphorylation of RIPK1 changes its conformation to allow recruitment of RIPK3 and MLKL and formation of the necrosome. Therefore, the rescue of CASPASE8 deficient T cells from cell death by kinase dead RIPK1 is very solid genetic evidence of necroptosis.

      (7) Canonical NF-κB signaling through cRel alone was not evaluated, leaving a gap in the understanding of transcriptional pathways required for gd T cell subsets.

      This was assessed in p105/RelA knockout strain, which only express cREL. What we lacked was an assessment of what RelA/p50 dimers can support in the absence of cREL. We do however, show the impact of RelA single deficiency, and RelA/p50 deficiency.

      In truth, we had many REL deficient strains and it was challenging to make all the combinations we wanted. However, we try to compensate for this by discussing what cREL:cREL dimers and cREL:P50 dimers are capable of doing by analysing 𝛾𝛿 T cell development in p105/RELA DKO and RELA KO mice - these do show that cREL:P50 can compensate in the absence of RELA, but cREL:cREL cannot.

      Reviewer #3 (Public review):

      Weaknesses:

      The paper would benefit greatly from a graphical abstract that could summarize the key findings, making the key findings accessible to the general immunology or biochemistry reader. Ideally, this graphic would distinguish the requirements for NF-κB signals sustaining thymic γδ T cell differentiation from peripheral maintenance, taking into account the various subsets and signaling pathways required. In addition, the authors should consider adding further literature comparing the requirements for NF-κB /necroptosis pathways in regulating other non-conventional T cell populations, such as iNKT, MAIT, or FOXP3+ Treg cells. These data might help position the requirements described here for γδ T cells compared to other subsets, with respect to homeostatic cues and transcriptional states.

      Thank you - we have added such discussions. We are happy to add a graphical abstract if journal constraints permit this.

      Last and least, there are multiple grammatical errors throughout the manuscript, and it would benefit from further editing. Likewise, there are some minor errors in figures (e.g., Figure 3A, add percentage for plot from IKKDT.RIPK1D138N mouse; Figure 7, “Adative").

      Thank you !

    1. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the protein composition and functional role of the C2a projection of the central apparatus (CA) in vertebrate motile cilia. Using three knockout mouse models (Ccdc108, Mycbpap, and Cfap70), the authors demonstrate that these genes - homologs of Chlamydomonas FAP65, FAP147, and FAP70 - are required for normal motile cilia function in ependymal and tracheal multiciliated cells. Specifically, the authors show that:

      (1) Knockout mice for each gene exhibit primary ciliary dyskinesia phenotypes (hydrocephalus and sinusitis), accompanied by abnormal ciliary motion and reduced ciliary beat frequency.

      (2) CCDC108, MYCBPAP, and CFAP70 physically interact and localize to the axonemal central lumen, consistent with the C2a projection.

      (3) Loss of any one of these proteins destabilizes the others and disrupts CA integrity in a tissue-specific manner.

      (4) ARMC3 and MYCBP are C2a-associated proteins.

      Strengths:

      (1) Clarity: the results are presented in a coherent sequence that facilitates understanding of both the rationale and conclusions.

      (2) Genetic rigor: three independent knockout mouse lines that exhibit consistent motile cilia phenotypes provide in vivo support for the proposed role of these proteins.

      (3) Integration of structural and functional analyses: combination of ultrastructural (TEM) and immunofluorescence data with CBF measurements provides convincing correlation between structural defects and impaired ciliary function.

      (4) Mutual dependency model: reciprocal destabilization of CCDC108, MYCBPAP, and CFAP70 supports their interdependence in the C2a assembly.

      (5) Expansion of the vertebrate C2a proteome: the identification of ARMC3 and MYCBP as C2a-associated proteins provides a foundation for future mechanistic studies.

      Weaknesses:

      (1) Mechanistic depth: the data show a convincing correlation between C2a and ciliary function, but the cell type-specificity of CCDC108, MYCBPAP, and CFAP70 knockout effects is underdeveloped. This is an interesting observation that raises mechanistic/structural questions not addressed in the study, such as what is the role of C2a in CP nucleation, maintenance, or mechanical stabilization? Is C2a composition different in different cell types?

      (2) Cell model choice: co-immunoprecipitation was performed using mouse testis lysates. While this is a reasonable source of CA proteins from flagellated cells, the functional analyses in this study focus on ependymal and tracheal multiciliated cells. It would therefore be helpful for the authors to clarify the extent to which these interactions are expected to be conserved across ciliated cell types, and to discuss potential tissue-specific differences in CA assembly.

      (3) Statistical analysis: the manuscript states "Statistical significance was defined as P < 0.5", which is likely a typo, but should be P < 0.05. In general, the statistical methods require more clarification. In several figures (e.g., 2B, 2D, 5J, 5K), multiple knockout genotypes are compared with WT, yet unpaired t-tests are reported. When more than two groups are analyzed, multiple pairwise t-tests inflate Type I error unless appropriately corrected; a one-way ANOVA with post hoc comparisons (e.g., Dunnett's test for WT-referenced comparisons) would be more appropriate. Furthermore, the analysis of ciliary movement modes (Figure 2D) involves categorical data, for which a t-test is not statistically appropriate. These comparisons could instead be evaluated using chi-square or Fisher's exact tests. Addressing these issues is important to ensure accurate statistical inference.

      (4) Methods section: does not sufficiently describe how image-based quantifications were performed. For example, the criteria used to define cilia number, basal body number, and rotational beating are not specified, nor is how CBF measurements were analyzed. The authors should also provide details regarding analysis software and imaging parameters used (and whether they were kept constant across genotypes).

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The central pair apparatus of motile cilia consists of two singlet microtubules, termed C1 and C2, each of which is associated with a set of projections, referred to as the C1 and C2 projections. Each projection comprises multiple distinct structural domains, designated a, b, c, and so on. Biochemical studies combined with genetic analyses in Chlamydomonas identified three proteins as the major components of the C2a projection, and subsequent cryo-EM studies confirmed these findings.

      In this paper, the authors aim to study the homologues of these three proteins-CCDC108/CFAP65, CFAP70, and MYCBPAP/CFAP147-using knockout mouse models. Biochemical and cell biological analyses demonstrate that, as in Chlamydomonas, these proteins are components of the C2 projection and form a complex that depends on the presence of each other. In addition, the authors use affinity purification to identify two previously uncharacterized proteins and show that they are central pair apparatus proteins that associate with the aforementioned complex. Knockout mice lacking any of the three core proteins exhibit phenotypes consistent with primary ciliary dyskinesia (PCD).

      Overall, the manuscript is clearly written, and the data are convincing and support the authors' conclusions. However, given the previous findings in Chlamydomonas, this work provides limited conceptual advances to the field. Nonetheless, it represents a useful and well-documented resource for understanding the conserved organization of the central pair apparatus in motile cilia. It will be of interest to cell and developmental biologists, biochemists, and clinicians studying and treating human ciliopathies.

      We thank the reviewer for their positive comments on our work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the protein composition and functional role of the C2a projection of the central apparatus (CA) in vertebrate motile cilia. Using three knockout mouse models (Ccdc108, Mycbpap, and Cfap70), the authors demonstrate that these genes - homologs of Chlamydomonas FAP65, FAP147, and FAP70 - are required for normal motile cilia function in ependymal and tracheal multiciliated cells. Specifically, the authors show that:

      (1) Knockout mice for each gene exhibit primary ciliary dyskinesia phenotypes (hydrocephalus and sinusitis), accompanied by abnormal ciliary motion and reduced ciliary beat frequency. 

      (2) CCDC108, MYCBPAP, and CFAP70 physically interact and localize to the axonemal central lumen, consistent with the C2a projection. 

      (3) Loss of any one of these proteins destabilizes the others and disrupts CA integrity in a tissue-specific manner. 

      (4) ARMC3 and MYCBP are C2a-associated proteins. 

      Strengths:

      (1) Clarity: the results are presented in a coherent sequence that facilitates understanding of both the rationale and conclusions. 

      (2) Genetic rigor: three independent knockout mouse lines that exhibit consistent motile cilia phenotypes provide in vivo support for the proposed role of these proteins. 

      (3) Integration of structural and functional analyses: combination of ultrastructural (TEM) and immunofluorescence data with CBF measurements provides convincing correlation between structural defects and impaired ciliary function. 

      (4) Mutual dependency model: reciprocal destabilization of CCDC108, MYCBPAP, and CFAP70 supports their interdependence in the C2a assembly. 

      (5) Expansion of the vertebrate C2a proteome: the identification of ARMC3 and MYCBP as C2a-associated proteins provides a foundation for future mechanistic studies. 

      We appreciate our reviewer's positive comments.

      Weaknesses:

      (1) Mechanistic depth: the data show a convincing correlation between C2a and ciliary function, but the cell type-specificity of CCDC108, MYCBPAP, and CFAP70 knockout effects is underdeveloped. This is an interesting observation that raises mechanistic/structural questions not addressed in the study, such as what is the role of C2a in CP nucleation, maintenance, or mechanical stabilization? Is C2a composition different in different cell types? 

      We agree with our reviewer and value their insightful comments. Indeed, CP-MT defects, including the loss of one or both CP-MTs, were only observed in a subset of mouse ependymal cells (mEPCs) at day 10 post-serum starvation, and were rare in tracheal multiciliated cells, although the C2a projections were severely damaged in these tracheal cells. Based on these observations, we hypothesize that the loss of CP-MTs is probably a secondary effect caused by mechanical stress during ciliary movement. To investigate the role of C2a in CP-MT nucleation, maintenance, or mechanical stabilization, we plan to examine the axoneme structures of mEPCs at day 5 post-serum starvation using TEM. By comparing axoneme defects in these cells at days 5 and 10, we hope to gain insights into this question. Based on our findings and previous findings in Chlamydomonas, we speculate that the core components (CCDC108/FAP65, MYCBPAP/FAP147, and CFAP70/FAP70) of the C2a projection are highly conserved across species, but the peripheral associated C2a proteins may vary among different cell types. Therefore, we will perform co-immunoprecipitation using mEPCs and mouse tracheal epithelial cells to investigate potential cell-type-specific differences and expand the related discussion.

      (2) Cell model choice: co-immunoprecipitation was performed using mouse testis lysates. While this is a reasonable source of CA proteins from flagellated cells, the functional analyses in this study focus on ependymal and tracheal multiciliated cells. It would therefore be helpful for the authors to clarify the extent to which these interactions are expected to be conserved across ciliated cell types, and to discuss potential tissue-specific differences in CA assembly.

      We appreciate our reviewer's insightful comments. We will follow their suggestion and perform co-immunoprecipitation using mEPCs and mouse tracheal epithelial cells to investigate potential cell-type-specific differences and expand the related discussion.

      (3) Statistical analysis: the manuscript states "Statistical significance was defined as P < 0.5", which is likely a typo, but should be P < 0.05. In general, the statistical methods require more clarification. In several figures (e.g., 2B, 2D, 5J, 5K), multiple knockout genotypes are compared with WT, yet unpaired t-tests are reported. When more than two groups are analyzed, multiple pairwise t-tests inflate Type I error unless appropriately corrected; a one-way ANOVA with post hoc comparisons (e.g., Dunnett's test for WT-referenced comparisons) would be more appropriate. Furthermore, the analysis of ciliary movement modes (Figure 2D) involves categorical data, for which a t-test is not statistically appropriate. These comparisons could instead be evaluated using chi-square or Fisher's exact tests. Addressing these issues is important to ensure accurate statistical inference.

      We thank our reviewer for pointing out these errors. We will double-check our statistical results and perform new analyses following their suggestion.

      (4) Methods section: does not sufficiently describe how image-based quantifications were performed. For example, the criteria used to define cilia number, basal body number, and rotational beating are not specified, nor is how CBF measurements were analyzed. The authors should also provide details regarding analysis software and imaging parameters used (and whether they were kept constant across genotypes). 

      We apologize for overlooking these method details. We will expand the relevant method section to include this information.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this paper, the authors combine advanced quantitative mass spectrometry-based proteomics with the PRIMMUS cell sorting technique to profile cell cycle proteins across the G1, S, and G2/M phases in CD8 T cells, embryonic stem cells (mESCs), and NIH3T3 fibroblasts. Their well-designed experimental approach-avoiding artifacts from synchronization-yields an impressive and detailed dataset on cell cycle dynamics. The experimental approach to interrogate three different cell lines for cell cycle protein dynamics, without using synchronization, is technically excellent. The data extracted from that initial experiment, which is shown in figures 2, 3, 4 and 5, is clearly described and well presented.

      My concern is with conceptual advance of study. The author set out to determine how cell cycle control mechanisms could give rise to rapid proliferation in CD8 t cells, but it is not clear they have addressed that question. The data do not clearly support a model in which mESCs or CD8 T cells exhibit dampened cell cycle protein periodicity compared to 3T3 fibroblasts, nor for a change in APC/C being a key driver. Notably, the depletion of Emi1 appears to most significantly impact later cell cycle stages, as has been reported previously by others, and thus it is unclear that in this system it is promoting S-phase entry. And, while the impact on re-replication is clear, this is a very well reported phenotype. Overall, I am very impressed by the technical aspects of the study, but thought the authors could put more effort toward developing more meaningful interpretations of the data, which felt under-developed.

      Comments and recommendations:

      • There does not appear to be any statistical comparative analysis accompanying the proteomics data. While the differences highlighted in the text are generally clear from the figures, including statistical comparisons would strengthen the analysis and provide greater confidence in the reported findings.
      • It is unclear what the box and whiskers represent in these plots. By convention, box plots typically display the interquartile range, with the central line indicating the median and the whiskers representing the minimum and maximum values or a defined range. If the authors are using a different convention, it should be clearly stated in the figure legends. Alternatively, I recommend using a different data representation to avoid potential confusion.
      • Line 133 and figure 2B and D. It's unclear to me what 'common cell cycle regulated protein' are. The methods: "Fold changes were calculated and ANOVAs performed to identify significantly changing proteins in each cell line using the following filtering criteria: p < 0.05 and fold change > 2." does not fully clarify the analysis. For example, it's not clear what the fold changes are being compared to-is it a specific cell cycle phase, a baseline, or another reference point? Additionally, is the reported p-value corrected for multiple hypothesis testing? Overall, the methodology could benefit from further clarification.
      • In Figure 2B, it is unclear how the data are normalized. Is it appropriate to compare values across rows, columns, or both? Additionally, the ordering of the CD8 T cell G2/M samples appears to be ascending from top to bottom-what is the rationale behind this ordering? Clarifying the normalization method and how to interpret comparisons within the figure would help improve its readability. As it is now, I am not clear on how this data supports the statement where it is called in the text "There was a core set of 121 proteins that were consistently cell cycle regulated across all three cell types" (lines 130-132). It would be interesting to know how these data compare the authors prior proteomics data on cell cycle proteins and to that of authors.
      • In Figure 2C, are these the most frequent GO terms among the "cell cycle-regulated proteins"? It is unclear whether this represents a curated list of commonly known cell cycle-related terms or if these are the top GO terms identified through enrichment analysis. Clarifying this distinction would help readers better understand how these terms were selected and what the figure is intended to convey.
      • The interpretation of differences in some APC/C components is not clear to me, particularly given the relationship to APC/C substrate expression. For example, ANAPC10 is down in CD8 cells, but the APC/C appears highly active given the high differential between substrate abundance between G1 and S/G2/M cells.
      • I have some concerns regarding the Emi1 knockout experiments. First, the knockout does not appear to be particularly efficient. In addition, the results do not necessarily support the proposed conclusion. If Emi1 was a key driver of G1/S progression, I would think you would observe more 2N cells, not fewer. The relationship to DNA damage, made in Fig 6f is unclear.
      • In the flow cytometry data in Figure 6C, the Emi1 knockout cells are EdU-positive and have >4N DNA content. This is consistent with a re-replication phenotype and for a vital role of Emi1 in preventing re-activation of the APC/C in late S or G2 phase. It is difficult to square this with the assertion that it is promoting G1/S progression. Notably, this is a well reported Emi1 phenotype.

      Minor comments:

      • Some figures lack error bars (e.g., Figures 2C, 2D, and 5A). I also recommend including individual data points on all graphs, as this is now required by many journals. Since most of the graphs likely represent experimental triplicates, adding these elements should be straightforward and will allow readers to see actual spread of the data.
      • The terminology used to refer to CD8 T cells is inconsistent across the manuscript-for example, "CD8T" in Figure 2 and "CD8" in Figure 3. Standardizing this terminology throughout the figures and text would improve clarity.
      • The authors frequently refer to "increases in pRB-," which is somewhat confusing. I recommend rephrasing this as "decrease in pRB" to make the interpretation clearer and easier to follow.

      Significance

      Significance and Advance

      Strength: Excellent deep, quantitative proteomic dataset of cell cycle proteome dynamics across three independent cell lines. Done without chemical or other synchronization. This incrementally improves on previous proteomics datasets.

      Weaknesses: Limited conceptual advance in our understanding of cell cycle proteomes, or insight into how different cell types proliferate at different rates.

      Audience: interested audiences include those study cell cycle control, proteomics.

      My expertise: cell cycle control and protein dynamics.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      This study uses proteomics as an approach to understand regulatory mechanisms contributing to a short cell division cycle in CD8 T cells. While the cell cycle machinery has been studied in mouse embryonic stem cells (mESCs) and these discoveries have provided a framework for understanding atypically short cell cycles, whether those mechanisms are conserved in other cell types that undergo a short cycle has not been clear. Analysis of the cellular proteome was performed at three different cell-cycle stages (G1, S, G2/M) and across three cell lines (CD8 T cells and mESCs and fibroblast 3T3 cells as comparators for short and normal cycles, respectively). An important innovation here is the application of a PRIMMUS protocol, which uses markers of cell cycle stage and sorting rather than treatments, such as inhibitors, that can have unwanted perturbations. The authors find both similarities and differences between CD8 T cells and mESCs. Of particular interest, CycE levels are constitutively high in both cell types with a short cycle compared to fibroblast cells, but APC/C substrates are uniquely lower during G1 in CD8 T cells. Using CRISPR knock-out, the APC/C inhibitor Emi1 is then implicated as a key regulator of G1/S control in CD8 T cells as well as phenotypic fate.

      Major Comments:

      Overall, the conclusions are fairly made from the data. In particular, the proteomics data and analysis are of high technical quality. The data and methods are clearly presented in a manner that could be reproduced, and experiments are, with one exception noted gere, performed with appropriate replicates and statistical analysis. I have only two major comments.

      1. From the description of the CRISPR methods and results, the origin of the Emi 1KO cells used in all the experiments is unclear. Only one Emi1 KO sample or experiment is shown along with control in each panel. A number of questions should be answered in order to clarify. Was this sample originated from the total cells grown out of the gRNA treatment (and on which the TIDE analysis was performed)? If so, which exon was the gRNA targeting in the cells used in subsequent experiments? Were single clones ever grown? Ideally, experiments should be performed with multiple single clonal lines, and at least one from each exon used, if possible. Alternatively, a population generated from each of the two guids should be used in all the experiments as a proper biological replicate.

      It may be that homozygous deletion of Emi1 is not possible because of the effects on proliferation. If so, the authors should comment on this along with why the protein level in Sup. Fig. 2b is still so high in the KO line. Considering the inefficiency of the overall KO, it is especially important to understand Emi1 expression throughout the polyclonal population to interpret the results. Are there some cells expressing normal levels and some cells expressing no Emi1? If monoclonal, are the observed effects from haploinsufficiency? More description of the method and caveats to the interpretation of results are needed here, and results would be made more robust if more than one clone or gRNA treatment were used. It would be expected this may take 2-3 months with reasonable resources to perform this additional analysis, if not done already. 2. OPTIONAL. One shortcoming of the manuscript is the lack of insights into why knock out of Emi1 leads to G1 arrest. Follow up experiments on the APC/C substrates in Fig. S1b or analysis of proteins that are upregulated or downregulated by Emi KO would increase the significance of the manuscript. These experiments may be time consuming and could be considered beyond the scope of this study, but they would increase the impact of the study.

      Minor comments:

      1. Some brief aspects of the methods could be added to the description of results to help improve clarity in a few places. For example, the threshold for including the 121 proteins in Fig. 2b and what antibodies were used for staining in the PRIMMUS approach would be important enough that it should be stated in results with no need to look up detailed method.
      2. It should be clarified that in most cases, use of the term "abundance" in comparing protein levels is actually a relative abundance for a specific cell type. For example in the following sentence, relative and not absolute abundances are being compared : "In mESC, there was higher abundance in G1 phase compared to CD8 T cells and NIH3T3 (Fig.136 2d)." Clarifying upon first use of the term abundance would be sufficient.
      3. In some of the protein abundance panels there does not appear to be any mark in the graph, which is consistent with there being no observation of protein. This absence of any observation should be made explicit in the figure captions.
      4. It should be clarified with its first use (line 264) that "pRb" means phosphorylated Rb.

      Significance

      The key conclusions are significant in that they advance our understanding of how cell cycle regulatory mechanisms can be adapted in cell types with short cell cycles. Results are also relevant in particular for the biology of T cells, and the finding that Emi1 controls the cell cycle and cell fate in CD8 T cells is noteworthy. These represent advances in our mechanistic understanding of cell division. In addition to the significance of the conclusions, the greatest strength of the manuscript is the technical quality of the proteomics data and the fair conclusions drawn from them. The results will be of interest to scientists studying basic research in cell, immune, and cancer biology.

      The greatest limitation of the study is the lack of insights into how Emi1 is driving S phase and how its loss leads to accumulation of cells in G1. This result is quite interesting, because it is opposite what may be expected considering the known function of Emi1 as an APC/C substrate inhibitor and the observation in Fig. 5 that APC/C substrates are similarly low in CD8 T cells and 3T3 cells. This discrepancy and the logic of testing Emi KO should at least be more thoroughly discussed, and additional experiments, as described above, would increase the impact of the study. A second limitation is the lack of conclusions from many comparisons of protein levels among the cell type and cell-cycle phases. For example, the significance of similarities and differences in levels of the Cdks (Fig. 4) and Cdh1 (Fig. 5) is left undiscussed.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this study by Lewis and colleagues, the authors use highly quantitative proteomics to compare cell cycle patterns in T lymphocytes to mouse embryonic stem cells and differentiated fibroblasts. The work is significant because the cell cycle field has been dominated by only a few paradigms based on model systems such as yeasts, cancer-derived cell lines, and fibroblasts. The different cell cycles in other cell types have not been thoroughly investigated because they are less experimentally tractable. Prior studies have drawn conclusions about embryonic stem cells that turned out later to be primarily from attempts to induce synchronization prior to analysis. Here, the authors apply their previously-described method for detecting proteins according to cell cycle phase by flow-sorting and then mass spectrometry. They find expected cell cycle fluctuations in the three cell lines, but also a few unexpected differences. Primarily, cyclin E protein appears to be expressed at higher levels in the fast-proliferating mESCs and CD8 cells compared to NIH3TS and to be less cell cycle-regulated. The mechanism and functional relevance for this difference are not considered.<br /> The authors chose to focus several experiments on the APC/C ubiquitin ligase inhibitor, Emi1, which they find appropriately cell cycle-regulated in CD8 and mESCs but expressed at higher levels in CD8 cells. They show that CD8 differentiation is associated with cell cycle changes and that perturbing the cell cycle in CD8 cells affects their differentiation (which is not surprising). Altogether this study provides an excellent resource for the field from the detailed high-quality proteomics data sets (though see point #8). On the other hand, the experiments with EMI1-KO and differentiating cells seem largely unrelated to the cell cycle analysis itself; these results may be mostly indirect consequences of cell cycle perturbation rather than specific to EMI1 function. This is a very thorough, rigorous, and valuable proteomics characterization of cell cycle differences, but the choice of follow-up functional experiments isn't logical in my opinion.


      Major points:

      1. It is not clear how the quantification accounts for differences in cell size or nuclear/cytoplasmic fractions among the different cell types. For example, ESCs are typically small with relatively less cytoplasm. Since a great many cell cycle proteins are nuclear, normalizing to total protein analyzed and not cell number could make these proteins appear more abundant. In other words, does 1 ug of protein from mESCs represent the same number of cells or nuclei as 1 ug of protein from NIH3T3?
      2. The motivation for the project was given as wanting to understand if fast-cycling CD8 T cells resemble fast-cycling mESCs, which the authors show they partially do. The molecular phenotype in common is cyclin E abundance, but we are left without an understanding for how that occurs. Are there differences in expression of the E3 ligase for cyclin E?
      3. We also don't learn if the constitutive cyclin E is why cells cycle quickly because no experiments to change cyclin E are provided.
      4. In an unexpected turn for the project, the authors generated inducibly Emi1-null CD8 T cells and analyzed them for replication, re-replication, and phosphorylated Rb. This section is fine, but the rationale isn't obvious, particularly if constitutive cyclin E (not an APC/C substrate) is the proposed driver of fast proliferation. Emi1 isn't the general reason for fast proliferation if it is only high during G1 for CD8 and not mESCs. Emi1 loss has also been known to promote re-replication for quite a few years, a finding the authors cite. The replication-dependent DNA damage and cell cycle arrest are both well-known outcomes from re-replication. So, although the data are of good quality, I'm not sure that showing Emi1 has the same role in CD8 cells as what was reported in other cells adds much to the field. The authors could choose to save Figure 6 for a more APC/C-focused study and use EMI1 as a general cell proliferation marker in Figure 7.
      5. What is the cell cycle distribution of the IL-15 and IL-2 differentiated cells? If they are different, then some of the protein abundance changes to regulated proteins such as p21 might be a symptom of cell cycle differences rather than differentiation. It seems IL-15 is more cell cycle promoting form Figure S3, but that isn't clear from the assays presented.
      6. The change in the CD62L marker is argued to be unrelated to cell cycle and somehow more directly downstream of Emi1. The connection here is tenuous, in part because there's no comparison to a similar cell cycle perturbation independent of Emi1. How would Emi1 modulate CD62L if CDC62L does not show evidence of being an APC/C substrate?
      7. As presented, Emi1 has a unique effect on the immune phenotype. It is also possible that many cell cycle perturbations would similarly affect differentiation and not only Emi1. Can the authors distinguish Emi1-specific effects from more common cell cycle perturbation effects? Alternatively, interpreting the EMI1-KO data more generally could avoid confusion.
      8. The authors should provide legends to the supplementary tables in the text and/or the tables themselves. Several column headers in the tables are not defined.

      Minor points:

      • a. Figure 4g is titled Cdkn2a (which would be p16) but the legend says its Cdkn1a (p21). The p21 protein was undetected in NIH3T3? One would expect it to be in the G1 cells at least.
      • b. Line 56: cyclin E is not an APC/C substrate as implied by the text. If Skp2 is a bona fide substrate (not sure that's true), then APC/C activity would have opposite effects on Cyclin E vs Cyclin A.
      • c. What is the red arrow in Figure 6C?
      • d. Figure 7b is protein copy number whereas all other graphs are for ppm. The legend on line 686 has "cite immPRES" but this citation didn't make it into the submitted manuscript.

      Significance

      The work is significant because the cell cycle field has been dominated by only a few paradigms based on model systems such as yeasts, cancer-derived cell lines, and fibroblasts. The different cell cycles in other cell types have not been thoroughly investigated because they are less experimentally tractable. Prior studies have drawn conclusions about embryonic stem cells that turned out later to be primarily from attempts to induce synchronization prior to analysis. This is a very thorough, rigorous, and valuable proteomics characterization of cell cycle differences, but the choice of follow-up functional experiments isn't logical in my opinion.

    1. Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question in cardiac biology: whether distinct cardiomyocyte (CM) subpopulations play specialized roles during heart development and regeneration. Using single-cell RNA sequencing and newly generated genetic tools, the authors identify phlda2 as a specific marker of primordial cardiomyocytes in the adult zebrafish heart. They further show that these primordial CMs function are essential for myocardial morphogenesis and coronary vascularization but are dispensable for myocardial regeneration or revascularization after injury. These findings indicate that heart regeneration doesn't simply recapitulate developmental processes.

      Strengths:

      A major strength of the study is the generation of a phlda2 BAC reporter, which provides a specific and reliable marker for primordial cardiomyocytes. The lack of genetic tools has previously limited functional analysis of this CM population. By using phlda2 regulatory elements to generate reporter and NTR-based ablation lines, the authors can visualize and selectively manipulate primordial CMs in vivo. This enables a direct functional interrogation rather than relying on lineage tracing or correlative evidence. Through genetic ablation, the authors convincingly demonstrate that primordial CMs are essential for myocardial morphogenesis and coronary vascular organization during development but are not necessary for heart regeneration.

      Weaknesses:

      (1) The manuscript would benefit from clarifying whether the primordial cardiomyocytes ablation affects epicardial cell behaviors during heart development, given that the well-established role of the epicardium in supporting coronary vessel growth, it is possible that the vascular phenotypes observed after primordial CM ablation may be affected, at least in part, by altered epicardial cells.

      (2) Because primordial cardiomyocytes form a dense, single-cell-thick layer covering the ventricular surface, it would be informative to determine whether their loss alters the spatial distribution or inward migration of coronary endothelial cells or epicardial cells.

      (3) The manuscript carefully examines the relationship between primordial CMs and gata4⁺ cardiomyocytes during regeneration. However, their relationship during heart development should be more fully addressed.

      (4) As loss of cardiomyocytes is known to induce gata4:GFP activation during regeneration, it would be important to determine whether ablation of primordial cardiomyocytes alone triggers gata4:GFP expression in neighboring cardiomyocytes. This analysis would further support the conclusion that primordial cardiomyocytes are not required for regenerative responses.

    2. Author response:

      We thank the reviewer for the thoughtful and constructive evaluation of our work and for recognizing its potential interest to researchers working on cardiac development and regeneration. We are planning to address the specific concerns as noted by the reviewers in the following way:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question in cardiac biology: whether distinct cardiomyocyte (CM) subpopulations play specialized roles during heart development and regeneration. Using single-cell RNA sequencing and newly generated genetic tools, the authors identify phlda2 as a specific marker of primordial cardiomyocytes in the adult zebrafish heart. They further show that these primordial CMs function are essential for myocardial morphogenesis and coronary vascularization but are dispensable for myocardial regeneration or revascularization after injury. These findings indicate that heart regeneration doesn't simply recapitulate developmental processes.

      Strengths:

      A major strength of the study is the generation of a phlda2 BAC reporter, which provides a specific and reliable marker for primordial cardiomyocytes. The lack of genetic tools has previously limited functional analysis of this CM population. By using phlda2 regulatory elements to generate reporter and NTR-based ablation lines, the authors can visualize and selectively manipulate primordial CMs in vivo. This enables a direct functional interrogation rather than relying on lineage tracing or correlative evidence. Through genetic ablation, the authors convincingly demonstrate that primordial CMs are essential for myocardial morphogenesis and coronary vascular organization during development but are not necessary for heart regeneration.

      Weaknesses:

      (1) The manuscript would benefit from clarifying whether the primordial cardiomyocytes ablation affects epicardial cell behaviors during heart development, given that the well-established role of the epicardium in supporting coronary vessel growth, it is possible that the vascular phenotypes observed after primordial CM ablation may be affected, at least in part, by altered epicardial cells.

      We thank the reviewer for this thoughtful comment and agree that primordial cardiomyocyte ablation may indirectly affect coronary vessel growth through changes in epicardial cell behavior. Therefore, we will perform additional analyses to examine epicardial cell behaviors, including epicardial coverage and migration following primordial cardiomyocyte ablation using the established epicardial reporter line tcf21:nucEGFP during heart development.

      (2) Because primordial cardiomyocytes form a dense, single-cell-thick layer covering the ventricular surface, it would be informative to determine whether their loss alters the spatial distribution or inward migration of coronary endothelial cells or epicardial cells.

      We thank the reviewer for this important comment. We will analyze the spatial distribution and inward migration of coronary endothelial and epicardial cells after primordial cardiomyocyte ablation using high-resolution imaging and quantitative analysis

      (3) The manuscript carefully examines the relationship between primordial CMs and gata4⁺ cardiomyocytes during regeneration. However, their relationship during heart development should be more fully addressed.

      We appreciate the suggestion and will carefully investigate the relationship between primordial cardiomyocytes and gata4<sup>+</sup> cardiomyocytes during heart development.

      (4) As loss of cardiomyocytes is known to induce gata4:GFP activation during regeneration, it would be important to determine whether ablation of primordial cardiomyocytes alone triggers gata4:GFP expression in neighboring cardiomyocytes. This analysis would further support the conclusion that primordial cardiomyocytes are not required for regenerative responses.

      We acknowledge the reviewer’s comments and will test whether primordial cardiomyocyte ablation induces gata4:GFP activation in neighboring cardiomyocytes in the adult heart.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Primordial Cardiomyocytes orchestrate myocardial morphogenesis and vascularization but are dispensable for regeneration", Sun et al. identify a novel marker of primordial cardiomyocytes and use it to visualize and ablate the population during development and regeneration. The role of the primordial layer has not been investigated because the tools to manipulate this population have not existed. The manuscript is straightforward, easy to understand, and addresses an important question that has not been explored.

      While the manuscript provides important insights into the role of primordial CMs, backed by a convincing methodology, the authors should clarify their requirements for heart development and maturation. Specifically, is the primordial layer required for the fish to survive?

      We thank the reviewer for this important question. We will examine the survival of fish following primordial cardiomyocyte ablation during development.

      Do primordial CMs regenerate when ablated during development, and do the defects observed (in trabecular and compact CMs and coronary vessels) resolve after 10 days post-treatment when they were detected?

      We thank the reviewer for this valuable comment. We will perform additional analyses to determine whether primordial cardiomyocytes regenerate after ablation during development and to assess the extent and dynamics of their recovery. We will also evaluate whether the defects in trabecular and compact myocardium and coronary vasculature persist or resolve in adult hearts following primordial cardiomyocyte ablation during development.

      Reviewer #3 (Public review):

      Summary:

      The authors performed single-cell RNA sequencing of adult zebrafish hearts and identified markers for distinct cardiomyocyte subpopulations. One marker, phlda2, marks primordial cardiomyocytes. They generated transgenic reporter lines to characterize phlda2 expression patterns and a phlda2-NTR ablation line to determine the functional requirement of primordial cardiomyocytes during heart regeneration. They found that phlda2+ primordial cardiomyocytes are essential for myocardial morphogenesis and coronary vessel development. Interestingly, when phlda2+ primordial cardiomyocytes are ablated during heart regeneration, gata4+ cortical cardiomyocytes, coronary vessel revascularization, and scar tissue formation are not affected.

      Strengths:

      The authors identified a new primordial cardiomyocyte marker, phlda2. They further demonstrated that primordial cardiomyocytes are important for heart morphogenesis but dispensable for heart regeneration. Their findings reveal a potential difference between heart development and regeneration programs.

      Weakness:

      Despite the interesting findings, the authors did not provide supplemental data for their scRNAseq to demonstrate the data quality and support their conclusions, and some results are not well described.

      We appreciate the reviewer’s comment. We will include supplemental data to demonstrate the quality of our single-cell RNA sequencing. Additionally, we will provide more detailed descriptions of the key results in the main text and figure legends to clearly support our conclusions regarding primordial cardiomyocytes and their roles in heart morphogenesis and regeneration.

    1. Reviewer #1 (Public review):

      In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

      The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cis-regulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

      A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

      While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

      Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

      Recommendations:

      (1) To provide a comprehensive view of the current field, the authors should include Scale Biosciences (Scale Bio) in their discussion of available commercial platforms.

      (2) A head-to-head comparison with the 10x Genomics Multiome platform would be of significant interest to the single-cell genomics community and would better contextualize the performance of easySHARE-seq.

      (3) Optimizing ATAC Performance: I strongly suggest exploring methods to improve ATAC sensitivity. As the authors note, the improvement in RNA recovery may result from fewer processing steps and stronger fixation. It would be valuable to test if decreasing fixation back to 2% (as in the original SHARE-seq) recovers ATAC data quality, and to determine if the fixation level or the number of steps is the key variable in preserving transcripts.

      (4) The authors allude to the possibility of scaling this assay using a barcoded poly(T). Explicit inclusion or demonstration of this capability would dramatically increase interest in this protocol. Perhaps ATAC could be scaled using a barcoded Tn5?

      (5) The number of HSCs and B cells expressing Albumin is problematic and suggests significant ambient RNA issues that need to be addressed or computationally corrected.

    2. Author response:

      Public reviews:

      Reviewer #1 (Public review):

      In the manuscript entitled "Flexible and high-throughput simultaneous profiling of gene expression and chromatin accessibility in single cells," Soltys and colleagues present easySHARE-seq, a method described as an improvement upon SHARE-seq for the simultaneous measurement of RNA transcripts and chromatin accessibility.

      The authors demonstrate the utility of easySHARE-seq by profiling approximately 20,000 nuclei from the murine liver, successfully annotating cell types and linking cis-regulatory elements to target genes. The authors claim that easySHARE-seq supports longer read lengths potentially enabling better variant discovery or allele-specific signal assessment, though they do not provide direct evidence to support these specific claims.

      A key strength of the protocol is enhanced sequencing efficiency, achieved by shortening the Index 1 read from 99 to 17 nucleotides. This reduction does not come at a significant cost to barcode diversity, retaining approximately 3.5 million combinations. Additionally, the approach allows for the sequencing of a sub-library to assess quality prior to final barcoding and sequencing which seems quite clever.

      While the increase in RNA transcript recovery is substantial, it appears to come at a cost: there is a notable decrease in ATAC fragments per cell compared to the original SHARE-seq (and other platforms). Likely as a result, the dimensionality reduction (UMAP) shows good resolution for RNA profiles but relatively poor resolution for accessibility profiles. Furthermore, the presented data suggests potential ambient RNA contamination; specifically, the detection of Albumin in HSCs and B cells is likely an artifact of the protocol rather than a biological signal.

      Overall, the study is well-presented and represents a promising advance. However, there are significant shortcomings that should be addressed, particularly regarding "leaky" transcript recovery and reduced ATAC performance.

      Recommendations:

      (1) To provide a comprehensive view of the current field, the authors should include Scale Biosciences (Scale Bio) in their discussion of available commercial platforms.

      (2) A head-to-head comparison with the 10x Genomics Multiome platform would be of significant interest to the single-cell genomics community and would better contextualize the performance of easySHARE-seq.

      (3) Optimizing ATAC Performance: I strongly suggest exploring methods to improve ATAC sensitivity. As the authors note, the improvement in RNA recovery may result from fewer processing steps and stronger fixation. It would be valuable to test if decreasing fixation back to 2% (as in the original SHARE-seq) recovers ATAC data quality, and to determine if the fixation level or the number of steps is the key variable in preserving transcripts.

      (4) The authors allude to the possibility of scaling this assay using a barcoded poly(T). Explicit inclusion or demonstration of this capability would dramatically increase interest in this protocol. Perhaps ATAC could be scaled using a barcoded Tn5?

      (5) The number of HSCs and B cells expressing Albumin is problematic and suggests significant ambient RNA issues that need to be addressed or computationally corrected.

      We thank reviewer #1 for his comments and critique. We will include a direct comparison of easySHARE-seq with the 10x Multiome platform by adding this comparison to Fig. 1 E&F and more directly point to Table 1 as a comparison of overall assay possibilities. We will also more explicitly state and describe the possibilities and limitations of how to scale this assay up. We also thank the reviewer for raising the possible issue of ambient RNA contamination. We aim to quantify ambient RNA contamination and explore its impact as well as possibilities to correct for it if needed. Unfortunately, external circumstances make it difficult to perform further wetlab experiments in order to optimize ATAC-seq performance. We will thus update our discussion to include possibilities on how to improve ATAC-seq data quality.

      Reviewer #2 (Public review):

      Aims:

      The authors sought to optimize SHARE-seq, a multimodal single-cell method, to improve the simultaneous profiling of gene expression and chromatin accessibility. Their goal was to enhance barcode design for better sequencing efficiency and cost savings, while improving overall data quality. They then applied their optimized method, easySHARE-seq, to study liver sinusoidal endothelial cells (LSECs) to demonstrate its utility in examining gene regulation and spatial zonation.

      Strengths:

      The improved barcode design is an advance, increasing the proportion of sequencing reads dedicated to biological information rather than barcode identification. This modification offers practical benefits in terms of sequencing costs and read length, potentially reducing alignment errors. The method also demonstrates improved RNA detection compared to the original SHARE-seq protocol. The biological applications showcase how simultaneous measurement of both modalities enables analyses that would be practically impossible with single-modality approaches, particularly in examining how chromatin states change along developmental or spatial trajectories.

      Weaknesses:

      There is a notable reduction in chromatin accessibility detection compared to the original SHARE-seq method, likely limiting the broad use of the method. While the authors are transparent about this tradeoff, additional discussion would be helpful regarding how this affects data interpretation. Comparisons showing consistency between easySHARE-seq and SHARE-seq chromatin accessibility patterns at the single-cell level would strengthen confidence in the method.

      We thank reviewer #2 for his comments and great suggestions for further analyses. We will emphasize ATAC-seq data quality issues further in our discussions and more explicitly discuss the resulting implications and shortcomings. We agree with reviewer #2 that this dataset allows exploration of enhancer logic. We aim to incorporate the suggested analyses regarding RNA-ATAC correlations, expand our exploration of enhancer biology and include these results in our revisions. We will also improve clarity of our zonation analysis procedure.

      Overall:

      The authors achieve their aim of creating an optimized protocol with improved barcode design and enhanced RNA detection. The method represents a useful advance for specific experimental contexts where the tradeoffs are appropriate.

    1. Reviewer #1 (Public review):

      Summary:

      In this report, Dr Jie Sun and colleagues employed high-resolution single-cell technologies (transcriptomic + cytometry) to build a temporal map of lung responses after IAV infection in young and old mice. They performed detailed analyses of several innate and adaptive immune compartments and described how age influences each of them. The data are robustly generated, and the analyses provide interesting observations that could be associated with disease severity in aged mice. Mechanistically, the authors provide evidence that IFNa/g signaling after viral clearance could mediate some long-term respiratory outcomes, possibly by acting on MoIMs.

      Strengths:

      (1) Comprehensive temporal profiling of lung responses.

      (2) Combination of scRNA_seq and flow cytometry.

      (3) Mechanistic part assessing the role of IFNa/g signaling.

      Weaknesses:

      (1) Descriptive nature of the study.

      (2) Lack of quantification of lung lesions.

      (3) Lung functional measurements were only assessed in aged mice (with or without treatment).

      (4) No assessment of global and virus-specific humoral responses, which could be related to changes in B cells.

      (5) Recently described "pro-repair" Ly6G+ macrophages after IAV infection (PMID: 39093958) are not considered here, and the gating strategy encompasses them in the neutrophil gate.

      (6) The authors suggest that IMs in the aged lung may serve as a major contributor to the pathogenesis of long-term sequelae observed in aged hosts, but do not assess this possibility experimentally.

      Addressing the weaknesses identified above would substantially strengthen the conclusions of the manuscript.

    2. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors leverage single-cell approaches to delve deeper into the host responses and immune cells involved in immunopathogenesis of influenza virus infection in aged mice. The dynamics of gene expression and immune cell frequencies were also tracked at multiple time-points to examine the acute and chronic changes in young and aged mice after influenza virus infection. Their analyses demonstrated that the immune cell frequencies and gene signatures differed in young and aged mice, especially macrophages, T cells and B cells. Furthermore, interferon pathways were found to be differentially regulated in the young and aged mice, and blocking the interferon pathway with monoclonal antibodies led to improvement in lung respiratory functions and reduced inflammation.

      Strengths:

      A strength of this study is that multiple time points are considered for analyses, allowing assessment of temporal changes in gene expression and immune cell frequencies after virus infection during the acute and chronic phases of the disease. The data presented could also serve as a potential resource for other researchers interested in understanding the host responses to the influenza virus, especially in aged mice. Another interesting finding was that blocking interferon signalling can reduce the chronic severe symptoms caused by the influenza virus in aged mice.

      Weaknesses:

      The manuscript could greatly benefit from more rigorous approaches, particularly in the statistical analyses and data visualisation. Moreover, the scientific rationale and logic for several parts of the manuscript can be improved. Finally, the authors did not adequately dissect whether the contribution of host responses was from virus infection or from bystander effects. Specifically, my major comments are as follows:

      (1) While it is interesting to compare the difference in host responses between aged and young mice, the authors should also more deeply characterise the differences in phenotypic and infection kinetics between aged and young mice, so that the readers can better appreciate the effects of virus infection and host immune tolerance to viral infection. For instance, what are the differences in virus infection kinetics between the aged and young mice? Are the levels of infection different? Are the virus dynamics and kinetics different between aged and young mice? Besides lung tissue damage, are there also tissue damage or inflammatory responses beyond lung tissues that differ between aged and young mice?

      (2) Figure 1B: Could the authors quantify the extent of tissue damage in aged and young adults? It is challenging to interpret the extent of tissue damage, especially across the different time points.

      (3) Figure 1D: The authors claim that the senescence signatures are higher in aged mice, justifying that the pathway analyses are consistent with ageing signatures. However, it is also important to note that the senescence signatures were insignificant in aged mice after day 14. Is this expected?

      (4) Figure 1E: The stacked bar charts are difficult to read. It is unclear if the cell type frequencies or proportions are significantly changed, especially as the authors are showing these changes with averaged values. Moreover, the authors should keep the colours of the bar charts consistent with the UMAP.

      (5) Figure 1F-M: The charts show increased frequencies of innate and adaptive immune cells in aged mice. How about the young mice? Which type of cells are increased to allow these mice to be more tolerant to infection?

      (6) Figure 2D and Figure S2C: Besides showing the dynamics of the different clusters, the authors should also display the statistics for individual mice. If the analyses have to be pooled for the single-cell analysis, the authors should declare the challenges and show the statistical comparisons for the flow cytometry.

      (7) Figure 3E: The authors should not claim differences in somatic hypermutation based on gene expression. This will require BCR sequencing and evidence for clonal expansion to confirm that there are differences in somatic hypermutation. Moreover, the authors did not measure the quality and quantity of antibody responses between aged and young mice. The claims for the antibody responses are thus extrapolated, and the B cell identities cannot be identified without any functional or phenotypic readouts.

      (8) Figure 4H. Why did the authors not perform the experiments for aged mice with a higher virus dose? Also, the spider plots do not display the variability between individual mice, making it challenging to interpret whether the changes were statistically different between the conditions.

      (9) Figure 5A. Is the interferon pathway the only pathway that was significantly enriched in the aged mice? Is it the top pathway? The authors should also show the other pathways that were significantly enriched in aged mice. Did the authors also analyse whether the differences in interferon pathways were caused by infected cells or by bystander cells?

      (10) Figure 5B: Based on the pathway analyses, the peak responses for interferon are at day 9 post-infection. However, the interferon treatment is performed on day 14, where differences were less apparent. Why did the authors choose to do the interferon treatment at day 14 instead?

      (11) Figure 6: How about interferon-mediated cell-cell interactions? The authors should consider using established libraries such as Cell Chat to determine if there are any cell-cell communications that lead to differences in interferon responses and signaling.

      (12) Throughout the whole manuscript, the authors kept emphasising that the aged mice displayed uncoordinated immune responses, yet, based on the pathway analyses and phenotypic characterisation, it appears that only interferon was mainly dysregulated. I would thus like to recommend that the authors adjust the tone of the manuscript to tailor it to the results obtained from their investigations.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary of findings and key conclusions This manuscript asks how pharmacologic targeting of the outer mitochondrial membrane protein MIRO1 (RHOT1) with a MIRO1-binding compound (MR3) reshapes immunosuppressive programs in the glioma tumor microenvironment (TME). The core of the paper is a cross-species transcriptomic comparison that combines an in vivo mouse dataset with an ex vivo human perturbation dataset. Model systems and approach (as described): • Mouse in vivo: GL261-Luc intracranial glioma in C57BL/6J mice; MR3 is administered intracranially at the implantation site (10 µM in 5 µL DMSO) on days 11 and 18, and tumors are harvested on day 22 for single-nucleus RNA-seq (snRNA-seq). • Mouse snRNA-seq: NeuN-based nuclei sorting, 10x Genomics v3.1; alignment to mm10; Seurat-based integration and annotation. Tumor-cell calling is supported by CNV inference (SCEVAN/CopyKAT). One MR3-treated sample is excluded after QC, leaving 3 control vs 2 MR3-treated samples (11,940 NeuN− nuclei). • Human ex vivo: freshly resected glioma cores from 3 patients are cultured with 10 µM MR3 or DMSO for 24 h, followed by bulk RNA-seq (STAR alignment to hg19; DESeq2 for differential expression). • Cross-species integration: the analysis is restricted to 1:1 orthologs and protein-coding genes shared across datasets; inferred cell-cell signaling is explored with CellChat. Main findings (as presented): • MR3 shifts expression of a subset of glioma-associated genes toward a non-tumor-like direction ("rescued genes") and is associated with large changes in inferred cell-type composition in the mouse snRNA-seq dataset (including a marked drop in the fraction of nuclei annotated as tumor: 44.5% to 4.3%; Fig. 1E). • Across TCGA-vs-GTEx (glioma-upregulated genes) and three MR3 response analyses (mouse snRNA-seq, mouse pseudo-bulk, and human bulk RNA-seq), PARP11/Parp11 is reported as the only gene that is consistently upregulated in glioma and consistently downregulated by MR3 (Fig. 2B). • Within the mouse myeloid compartment, Parp11 is most enriched in MAC4 and MAC1, while MAC1 shows high Cd274 (Pdl1/PD-L1). MR3 reduces Parp11 in MAC4/MAC1 and reduces Cd274 in MAC1 (Fig. 2H). • CellChat analysis suggests that in controls MAC1 is the dominant sender of PD-L1/PD-1 signaling to CD8+ T cells (Fig. 3C), and that this PD-L1/PD-1 interaction is strongly diminished after MR3 (Fig. 3E). • The authors propose a paracrine model in which MAC4-derived PGE2 (via Ptges3) sustains Parp11 expression in MAC1 through cAMP/PKA/CREB, promoting PD-L1-mediated T-cell suppression; MR3 disrupts this circuitry (Fig. 4). Major comments 1. Strength of the conclusions Two parts of the story felt well supported by the data as shown. First, the cross-species convergence on PARP11/Parp11 is a clear and potentially useful result (Fig. 2B). Second, the myeloid subclustering plus CellChat analysis makes a coherent case that PD-L1/PD-1 signaling in this model is dominated by a specific macrophage subset (MAC1) and changes after MR3 (Fig. 2H, Fig. 3). Where I was less convinced is when the manuscript moves from "transcriptomic and modeling evidence" to causal statements such as "MIRO1-mediated axis driving immunosuppression" and "MR3 reduces tumor burden by reactivating immunity." At the moment, several central inferences remain indirect: • Causality is inferred primarily from transcriptomic shifts and ligand-receptor inference rather than functional immune readouts.

      -We thank the Reviewer for the constructive evaluation. We have toned down the claims throughout the manuscript with tracking.

      • __ On-target attribution to MIRO1 hinges on MR3 being a MIRO1 binder; the study does not include a genetic MIRO1 perturbation or a target-engagement/epistasis test in the relevant immune compartments (and the authors acknowledge this limitation in the Discussion).__ -We have examined on-target activity of MR3 in our other papers. For example, by depleting Miro1 with CRISPRi in glioma cells (Miro1 KD cells), we found that it phenocopied the effect of MR3. We also expressed Miro1-7A, a drug-resistant mutant of Miro1 predicted to be unable to bind MR3 (1) in Miro1 KD glioma cells, which rendered glioma cells insensitive to MR3 treatment. These data demonstrate that in cellular glioma models, Miro1 is the target of MR3 and MR3 exerts its functions via directly binding to Miro1.

      We have also excluded off-target effect of MR3 by examining other mitochondrial GTPases (1, 2) including Miro2.

      We agree these data were not done specifically in immune compartments, and have acknowledged it in Discussion and added more explanation in Introduction citing our published papers.

      • __ The very large reduction in "tumor cell proportion" (Fig. 1E) is striking but is still a composition measure of recovered nuclei; it is not, on its own, a direct measurement of tumor size/burden and could be sensitive to differential nuclei recovery or cell loss during processing.__ -We agree that the "tumor cell proportion" in Fig. 1E represents the composition of recovered nuclei and is not, by itself, a direct measurement of tumor size or burden. We have removed "tumor burden" throughout the manuscript to avoid confusion.

      To determine whether the observed reduction might reflect technical bias, we examined the quality control metrics across all samples. Of the six initial samples (three control and three treated), one treated sample (TN1) showed clear quality concerns and was therefore excluded from downstream analysis.

      For the remaining samples, the distributions of detected genes per nucleus and total RNA counts per nucleus were similar between groups. The percentage of mitochondrial reads was consistently low, and only a small fraction of nuclei was removed during filtering, indicating overall comparable nuclei quality. Notably, the treated samples yielded similar or even higher total numbers of recovered nuclei, despite showing a lower tumor cell proportion. Please refer to new Fig. S1A for these results.

      Together, these observations suggest that the decrease in tumor cell proportion is unlikely to be explained simply by differential nuclei recovery, sequencing depth, or filtering effects. That said, we recognize that compositional differences in single-nucleus RNA sequencing data do not provide a direct measurement of tumor burden. We have revised the manuscript to clarify this point and to indicate that independent future approaches would be required for definitive assessment.

      I think the paper can go forward in its current scope, but the strength of the claims should match the level of evidence. If the authors want to keep strong, causal language in the title/abstract ("driving immunosuppression," "reduces tumor burden"), then I consider one or two targeted validation experiments essential (see below). Alternatively, the authors can temper the language and position the mechanistic model more explicitly as a hypothesis generated from the transcriptomic analysis.

      -We thank the Reviewer! We have toned down the claims throughout the manuscript to make the data consistent with the conclusion.

      __ Statements that should be labeled as preliminary/speculative (unless additional validation is added) • MAC4-derived PGE2 as the upstream driver of MAC1 Parp11/PD-L1: plausible and nicely consistent with Ptges3 being MAC4-high in controls and reduced with MR3 (Fig. 4A), but not demonstrated.__

      -We have changed the conclusion of this part to:

      Together, these bioinformatic findings suggest that MAC4 may produce PGE₂, which could act on nearby MAC1 cells in a paracrine manner to increase Parp11 expression, although this model needs to be functionally validated.

      • __ MIRO1 _→ mtDNA _→ cGAS/STING _→_ Ptges3 as a mechanistic chain: interesting, but currently framed largely by pathway knowledge plus modest expression changes (Supplementary Fig. S5).__ -We have added: "which requires future functional investigation."

      • __ "MR3 reactivates anti-tumor immunity to reduce tumor burden": the gene set enrichment and CellChat shifts are consistent with immune activation, but immune-mediated tumor control is not directly tested.__ -We have toned down these claims on tumor burden and only conclude as: MR3 may enhance anti-tumor immune responses.

      __ Replication and statistics Mouse snRNA-seq replication is limited after QC (3 control vs 2 MR3-treated animals). With n=2 treated, it is hard to know whether some of the biggest composition and cluster-level changes are robust to animal-to-animal variability.__

      -As also explained to Rev 2, we originally planned 3 mice per group. Despite losing one after QC, sample-level pseudobulk PCA analysis (treating each mouse as one replicate) of the mice shows clear separation of treated from untreated groups (new Fig. S2C), supporting technical reproducibility despite a small n. The two MR3-treated samples clustered together and were clearly separated from controls, indicating that the transcriptional effect of MR3 exceeds inter-animal variability (new Fig. S2C). The reduction in tumor cell proportion was also observed in both treated animals (new Fig. S2F). We have added this description to the Results (Page 5, lines 116-118) and included a new figure showing the tumor cell proportion for each animal (new Fig. S2F).

      We acknowledge this is a limitation, but as the Reviewer also pointed out that our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Thus, our methods and resource should be still valid and useful to the community.

      Relatedly, the snRNA-seq differential expression is performed with Seurat FindMarkers (Wilcoxon rank-sum). Per-cell testing can inflate significance if biological replicate structure is not accounted for (pseudoreplication). I suggest the authors clarify exactly how they handled sample-level replication for the key DE results and, where possible, re-run the main DE comparisons using a sample-aware approach (e.g., pseudo-bulk within cell types/subclusters).

      -We thank the reviewer for raising this important point. In the original analysis, differential expression was performed using Seurat's FindMarkers function which performs per-cell testing. We acknowledge that this approach can overestimate significance if biological replicate structure is not explicitly accounted for.

      To address this, we re-ran the key differential expression analyses using a pseudo-bulk approach: counts were aggregated per cell type/subcluster per sample, and DE testing was performed across samples rather than individual cells. The main results and conclusions remain consistent with the original analysis, while this approach ensures that statistical significance properly reflects biological replication (new FigS3. D-F).

      For the human bulk RNA-seq, the methods indicate 3 patient tissues split across MR3 vs DMSO for 24 h. In DESeq2, a paired design (including patient as a blocking factor) would be important to avoid patient-to-patient variability dominating the treatment signal; the manuscript should confirm whether the design formula accounted for this.

      -In the revised manuscript, we re-ran the DESeq2 analysis using a paired design with patient as a blocking factor and compared DMSO and MR3 within each patient (P1-P3). The results are consistent with our previous analysis. PARP11 remains significantly downregulated (raw p-value Finally, several places in the Methods define significance using p-value cutoffs (e.g., GEPIA3 TCGA/GTEx analysis uses p 1; human DE uses p = 1). Because multiple testing is substantial in all of these analyses, I recommend reporting FDR-adjusted values consistently (and being explicit about whether figures/tables show raw or adjusted p-values).

      -We have now used FDR-adjusted values for the TCGA/GTEx analysis and have updated Fig. 1C (top left), Results, and Methods accordingly. PARP11 remains significant after FDR correction.

      For the human bulk RNA-seq, very few genes pass an adjusted p 2FC| > 1 across all four differential expression analyses and updated the corresponding description in Methods.

      __ Do the data support the macrophage-to-CD8 suppression claim? The CellChat PD-L1/PD-1 network figures are suggestive (Fig. 3C/E), but ligand-receptor inference is not the same as demonstrating functional T-cell inhibition. At minimum, I would like to see one orthogonal readout (flow or immunostaining) showing that PD-L1__ protein on myeloid cells and PD-1 on CD8 T cells change in the expected directions after MR3, and that CD8 T cells show an activation/effector signature at the protein level.

      -We agree this would be clearly the next step in functional studies, but the current manuscript is focused on transcriptomic analysis and method building, so we have toned down any claims at the functional level.

      In addition, we have observed that T cells after MR3 treatment show upregulation of cytotoxicity- and IFN-response-related genes consistent with enhanced effector function at the transcriptional level. We have added new Fig. S6A and explanation in Result.

      __ PARP11: mediator vs marker The cross-species PARP11 result is the most convincing and potentially generalizable finding in the manuscript (Fig. 2B). However, in the specific context of this study, PARP11 is still best supported as a conserved MR3-responsive candidate rather than a demonstrated causal driver of PD-L1-mediated suppression. If the authors want to argue PARP11 is an effector of the pathway (rather than a marker), they should either soften the language or add a minimal functional linkage experiment within the existing scope (see "Optional" experiments below).__

      -We have softened the overall language throughout the manuscript to emphasize the correlation and PARP11 as a marker and to reflect the bioinformatic nature of the study. As this paper's main goal is method development and resource building, with already 11 figures, we think functional experiments could be done in another paper.

      __ Reproducibility and clarity of methods I appreciate that the authors provide a code/data portal (MiroScape) and a GitHub link. To make the study as reproducible as possible, I recommend: • Deposit raw sequencing reads for both mouse and human datasets (GEO/SRA) and include accession numbers in the manuscript.__

      -We have just deposited all raw data. Accession numbers will be provided once it is public.

      • __ Provide a short, consolidated "computational reproducibility" note with software versions and key parameters (Seurat, CellChat, STAR, DESeq2, etc.).__ -Added

      • __ Clarify pseudo-bulk construction (what is aggregated, at what level, and how many biological replicates contribute to each pseudo-bulk comparison).__ -Added

      • __ Add a brief summary of MR3 provenance/validation and what "MIRO1-binding" means operationally in the context of these experiments (especially for readers outside the MIRO1 field).__ -We have added this in Introduction.

      Experiments requested (kept within the existing claims) I am intentionally not suggesting new lines of experimentation. The experiments below are aimed only at supporting the paper's current central claims. I separate them into items I consider essential vs optional, depending on how strongly the authors want to phrase mechanistic conclusions.

      -We thank the Reviewer. We have toned down the claims to reflect the bioinformatic nature of the paper. We will perform suggested experiments below in another paper.

      Essential if the title/abstract continue to use strong causal language • Protein-level validation of the PD-L1/PD-1 axis and CD8 activation in the GL261 model. A focused flow cytometry panel (myeloid PD-L1; CD8 PD-1 plus one or two effector markers such as GZMB/IFNG/Ki67) or multiplex IF/IHC on tumor sections would substantially strengthen the central MAC1 ____→____ CD8 claim. • An orthogonal measure of tumor burden in the same treatment paradigm. The manuscript currently treats the drop in the fraction of nuclei annotated as tumor (Fig. 1E) as a reduction in tumor burden; I recommend including IVIS longitudinal data and/or histologic tumor area/volume at harvest to support this statement. • If feasible, modestly increase in vivo biological replication (the snRNA-seq analysis currently has n=2 treated after QC). Even adding one additional treated animal that passes QC would help. Feasibility (rough guidance only; core pricing varies widely by institution): a repeat GL261 cohort to harvest tumors for flow and/or histology typically takes ~3-6 weeks end-to-end. A small flow panel plus core time is often on the order of a few thousand USD (antibodies and cytometry), while basic histology/IF quantification might be in the hundreds to low-thousands. If the authors already have stored tissue from the existing cohort, some of this could be faster/cheaper. Optional (only if the authors want the MAC4 ____→____ PGE2 ____→____ Parp11 mechanism to be more than a model) • Measure PGE2 (ELISA or targeted lipidomics) in tumor lysates/conditioned media from control vs MR3-treated samples, or provide a closer proxy for PGE2 pathway engagement in the relevant clusters. Optional (only if the authors want to argue PARP11 is an effector) • A minimal functional linkage experiment (in vitro) testing whether PARP11 perturbation phenocopies the relevant aspect of MR3 in macrophages (e.g., PD-L1 levels and/or the ability to suppress CD8 activation in a co-culture). This could be done with a PARP11 inhibitor or knockdown. I do not think in vivo genetics are required for this manuscript, but some functional tie would prevent overinterpretation.

      __ Minor comments A. Analysis/experimental clarifications that seem straightforward • Human DESeq2: please clarify whether the DESeq2 design was paired by patient (i.e., patient as a blocking factor).__

      -See above. We re-ran the human differential expression analysis using a paired design with patient as a blocking factor and explained in Methods.

      • __ snRNA-seq DE: please clarify whether any sample-aware method was used for the key DE conclusions (especially Parp11/Cd274 changes) rather than per-cell statistics alone.__ -See above. The key DE results are based on sample-level pseudobulk (each mouse as one replicate). The two MR3-treated samples cluster together in pseudobulk PCA (new Fig. S2C), and the tumor reduction is seen in both animals (new Fig. S2F), supporting robustness to animal variability.

      • __ CellChat: because min.cells filtering is used (min.cells = 20), please note this explicitly in figure legends where subclusters appear only in one condition, so readers understand why certain labels are missing.__ -We have edited the Fig 3 legend accordingly.

      __ Figure and text consistency issues I noticed several figure/legend/citation issues that look like simple fixes: • Fig. 3 legend panel labeling: the legend text refers to the PD-L1/PD-1 chord plot as (C) MR3− and (D) MR3+, but (D) is the heatmap panel; the chord plots are (C) and (E). This should likely read (C) MR3− and (E) MR3+.__

      -Yes, and corrected.

      • __ Fig. 5 panel reference: the Results text refers to the Cross Species module as Fig. 5F, but the Fig. 5 legend defines panels (A-E) and labels (E) as "Cross Species module." Please reconcile (either change the text to Fig. 5E or add a panel F).__ -Changed to "E".

      • __ Discussion figure citation: the Discussion cites Ptges3/PGE2 evidence as "(Figure 3)," but Ptges3 is shown in Fig. 4A and the model is in Fig. 4B.__ -Added "Figure 4A-B" there.

      • __ Fig. 1D numbers: the Results text states 509/1,602 (mouse) and 15/106 (human) "rescued" genes (Fig. 1D), but the Fig. 1D pie charts are labeled with different totals (mouse total 3490; human total 104). Please reconcile the denominators and ensure the figure matches the text and analysis choice (bulk vs snRNA vs filtered gene sets).__ -For the cross-species analysis, we only counted genes with human-mouse orthologs so that the two datasets were compared in the same gene space. This avoids inflation from species-specific genes. We have added a clarification in the figure legend.

      • __ Fig. 2 legend: there is a stray quote in "lymphoid subclusters" (appears as subclusters").__ -removed.

      __ Presentation and framing • Tone down or carefully qualify statements equating snRNA-seq composition shifts with reduced tumor burden (or add an orthogonal tumor-burden measurement as suggested above).__

      -We have removed "tumor burden" throughout the manuscript.

      • __ Where possible, tie mechanistic language explicitly to the level of evidence ("consistent with," "suggests," "model proposes") so readers do not over-interpret the transcriptomic inference.__ -done.

      • __ Consider adding a small schematic in the Results or a short "interpretation" sentence in the figure legends explaining what the CellChat plots do and do not show, since non-specialists can misread these as direct interaction measurements.__ -We have added explanations in Fig 3 legends for CellChat and emphasized the transcriptomic nature of the data.

      __ Prior literature The PARP11 immunotherapy literature is cited appropriately. For the PGE2 angle, it may help readers if the authors add one or two glioma-focused references on PGE2-mediated myeloid/T-cell suppression (if not already in the full reference list).__

      -We have added two more papers showing PGE2 may induce MDSCs and immunosuppresion in glioma (3) (4).

      Significance

      Nature and significance of the advance The advance here is primarily conceptual and resource-oriented. Conceptually, the work connects a mitochondrial regulator (MIRO1) to a specific, testable immunosuppressive circuit in the glioma TME. Technically, the cross-species perturbation framework and the accompanying MiroScape portal should be useful to groups looking for conserved, drug-responsive immune programs.

      Context within the existing literature Immunosuppression in glioma and the importance of tumor-associated myeloid populations are well established, as is the limited success of checkpoint blockade in GBM. The manuscript's proposed MAC4/MAC1 paracrine model and its emphasis on PD-L1/PD-1 signaling adds a focused, hypothesis-generating view of how particular macrophage states might sustain CD8 dysfunction. The identification of PARP11 as a conserved MR3-responsive gene also fits with emerging work implicating PARP11 in immunoregulatory programs and response to immunotherapy.

      Audience • Neuro-oncology and glioma TME researchers (myeloid heterogeneity, immune suppression). • Tumor immunology groups interested in myeloid-driven checkpoint resistance. • Researchers working on mitochondrial stress signaling and immunometabolism. • Computational biologists building cross-species or multi-modal integration frameworks. Reviewer expertise and limitations Keywords: glioma microenvironment; macrophage/microglia biology; tumor immunology; single-cell/nucleus transcriptomics; computational ligand-receptor inference. Limitations: I am not a medicinal chemist, so I cannot deeply evaluate MR3 chemistry, PK/PD, or specificity beyond what is presented. I also did not evaluate the full web-portal implementation beyond the manuscript description.

      Reviewer #2

      Evidence, reproducibility and clarity

      The authors study responses to MIRO1 inhibition in a mouse model of GL261 GBM and in human tissue pieces treated ex vivo. They provide an interesting link between mitochondrial function and potential therapeutic outcomes in a tumor type that is typically challenging to treat. The manuscript is written clearly, in correct English language and figures are well structured and easy to interpret. -We thank the Reviewer for the positive comments. We want to clarify that the compound binds to Miro1 and doesn't inhibit Miro1's GTPase activity (1). We have now added explanation in Introduction.

      __ Major critique: 1. However, I need to stress that study is based of few experiments with low robustness. The predominant experiment is single-nuclei RNAseq analysis of GL261 tumors implanted into mice, constituting 3 CTRL and 2 treated mice, due to removal of 3rd animal following sequencing (low recovery of high quality nuclei). Therefore, the sample group is small. This is understandable for snRNA-seq experiment (although 3 animals in treated group is somewhat necessary), but the efficiency of treatment with MR3 should be better documented in a larger cohort of animals. Crucial changes in distribution of cell types or polarisation of myeloid cells should be confirmed with flow cytometry, which is more feasible on a larger cohort.__

      -We agree. As explained to Rev 3, the current paper is focused on conceptual and methodical advances and providing a resource to the community, which is already big with 11 figures. As Rev 1 mentioned, our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Importantly, PCA analysis of the mice at the animal level showed clear separation of treated from untreated groups and the reduction in tumor cell proportion was also observed in both treated animals (new Fig. S2C, F), supporting technical reproducibility despite a small n. Thus, our methods and resource should be still valid and useful to the community. Exploring the tumor-reducing efficacy of MR3 or combined treatments (e.g. with anti-PD-L1 or PARP11 inhibitor) in larger cohorts is an exciting next step.

      __ Human model does not seem robust (also, only 3 patients). Very few genes are affected by treatment (incomparably less than in mice), which poses a question if the model is sufficient to study the effect of the treatment. This should be at least discussed and arguments should be stated why such model is suitable.__

      -We agree and the observed variability in treatment response is actually expected and consistent with the well-established molecular and phenotypic heterogeneity of human glioma. Importantly, despite this diversity, we identified one gene (PARP11) consistently altered across all patient's samples and mouse model. This cross-species reproducibility supports the biological and translational relevance of the finding of PARP11. We have now added this to Discussion.

      In addition, we reanalyzed the human bulk RNA-seq using a paired design with patient as a blocking factor as suggested by another reviewer, which increased the number of DE genes (new Fig. 1C).

      __ Fig. S1E shows that actually few genes are commonly affected between human and mouse experiments. So conclusion about "conserved" modulation by MR3 seem an overstatement.__

      -We meant "Parp11" is conserved. We have deleted "conserved" throughout the manuscript when we didn't refer specifically Parp11 to avoid confusion.

      __ Mechanistic conclusions about PARP11, PGE, PD-L1 etc are not documented by any wet lab experiments, just by bioinformatic modelling.__ -We have scrutinized the Main Text to emphasize this.

      Minor: 1. Authors should discuss choice of GL261 model. It is immunogenic and does not resemble human GBM ideally, so the choice should be explained.

      -Although GL261 model demonstrates higher immunogenicity compared to human GBM, this feature enables evaluation of immune-modulating therapies and mechanisms in an immune-competent setting. This model still preserves critical aspects of glioma biology, including immunosuppressive TME, invasive behavior, and intracranial growth (5). Thus, this model provides a suitable platform for our study of mechanistic investigation of immune cells in the TME. We have now added this to Method.

      __ In clustering of mouse snRNAseq data, T cells seem underclustered, e.g. Treg cluster clearly constitutes half of Il2ra-positive and negative cells, the latter probably being conventional CD4+ T cells (usually CD4+ T cells in GL261 are 50:50 Treg and conventional). This can affect further conclusions on cell:cell interactions.__

      -We thank the reviewer for this important observation. We agree that in the former annotation, it was improper to annotate all the CD4+ T cells as Treg cells, given the limited expression of Foxp3, Il2ra and other Treg marker genes. Consequently, the previously annotated "Treg cluster" likely includes both regulatory-like and conventional CD4+ T cells.

      We have further clustered the CD4+ T cell population and found that if we divided CD4+ T cells into conventional CD4+ T and Treg cells, it yielded few Treg cells for downstream analysis (~50). This would compromise the robustness and reliability of our following analysis (CellChat/DEA/etc).

      To address this, we have revised our annotation and now refer to this population more conservatively as "regulatory-like CD4+ T cells" rather than bona fide Tregs. Importantly, this subset still exhibits elevated expression of immunoregulatory molecules and is associated with CD8+ T cell dysfunction, preserving the main conclusions regarding immune suppression within the tumor microenvironment. We have updated the Results, Figures, and Discussion accordingly to clarify this revised annotation and its implications for cell-cell interactions.

      Please refer to following new figures for the updated annotation and associated results:

      Fig. 2G-H, Fig. 3A-G, Fig. S4C-D,G, Fig. S5-B-G, Fig. S6A.

      Significance

      The study provides an interesting conclusion and potentially relevant discovery. However, in opinion of this reviewer, the performed experiments do not strengthen this sufficiently, especially in terms of mechanical insights and weak data on human samples. In the line of general literature on new treatments of GBM and testing thereof in mouse model, this study lacks mechanistic insights and solid data on therapeutic efficiency.

      -As mentioned above, the goal of this paper is to provide novel methods to integrate datasets, resource building, and identify markers in the glioma TME. It will serve as useful resources to the community and form the foundation for future therapeutic validation in larger cohorts. We have acknowledged the limitations in the revised manuscript.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): ____

      The authors of "Cross-Species Transcriptomic Integration Reveals a Conserved, MIRO1-Mediated Macrophage-to-T-Cell Signaling Axis Driving Immunosuppression in Glioma" present transcriptomic, both bulk RNA Seq and single nucleus RNA Seq, from GL261 murine gliomas treated with the Miro1 targeting compound MR3. RNA Seq data from human tumor explants treated with MR3 is also presented. The authors compared DEGs from their treated tissues with publicly available RNA Seq data sets comparing DEGs from normal tissue and Glioma tumors. The goal being to identify genes modulated by MR3 that may be underlying glioma growth, TME changes, and immunosuppression. There is a significant amount of data presented, with in-depth analysis conducted on the sequencing data sets. The manuscript is lacking in mechanistic depth and this reviewer feels that the results are over-interpreted, especially without any additional conformational assays run to confirm the interpretation of the sequencing data. There were many bold statements made (lines 109-110, 117, 130-131, 142-144, 163-165) that I felt did not have enough evidence to back up their claims. __

      -We have toned down these places mentioned above:

      Line 109-110: Deleted now

      Line 117: Deleted now

      Line 130-131: Deleted now

      Line 142-144: Deleted: "highly differentially expressed", the rest of the sentence is supported by our data.

      Line 163-165: Deleted now

      As explained later, our paper is focused on bioinformatic analysis and resource and method building. In-depth functional studies will be performed in another paper.

      __A significant concern is the lack of conformation that MR3 is targeting Miro1 in these models. __

      -We have done this in another manuscript where we show that in cellular glioma models, Miro1 is the target of MR3 and MR3 exerts its functions via directly binding to Miro1.

      __Previous publications from the authors have shown evidence that MR3 reduces Miro1 expression in cell and fly models. Sometimes this requires the co application of FCCP or antimycin A. Thus, the results attributed within cannot be attributed to Miro1 changes but rather any on or off-target effect of MR3. __

      -We originally discovered MR3 by ligand-based in silico modeling and thermal shift direct binding assay (1, 2). Thus, MR3 is a Miro1 binder (stated in Abstract and Introduction too, now we have added more background in Introduction). Indeed, sometimes we saw MR3 reduced Miro1 protein levels under certain conditions, for example, in vivo in flies after days of feeding (1, 2), or in PD cells upon Antimycin A or CCCP treatment (1, 2, 6, 7). MR3 mostly likely exerts its function via altering Miro1 protein-protein interactions (8) and Miro1 protein is subsequently degraded in proteasomes following complex dissociation or after posttranslational modifications (1, 2) (8). We have stated this hypothesis in Result section (page 10, possible model).

      In our other papers we have excluded off-target effect of MR3 by examining other mitochondrial GTPases (1, 2) including Miro2, and by showing Miro1 KD glioma cells phenocopied the effects of MR3 and drug-resistant Miro1 mutant in glioma cells rendered insensitivity to MR3. These data show Miro1 is the main target of MR3.

      We have added more explanations to the Introduction.

      __Understanding that mouse studies are expensive and time-consuming, and the acquisition of human tissue is not trivial, the sample sets are still small. Further confirmation of findings in cell models, organoids etc. would strengthen the findings and justify the smaller sample size of mice and human tissue. __

      -We agree and we have another in-depth study. However, the current paper is focused on conceptual and methodical advances and providing a resource to the community, which is already big with 11 figures. As Rev 1 mentioned, our paper's significance is to transcriptomically link Miro1 to well-known immune suppression factors in glioma TME and integrate 3 glioma databases which will facilitate researchers in the field to advance their own research. Thoroughly understanding Miro1's role in glioma TME is our next goal as stated in Discussion and is beyond the scope of the current study.

      __The website MiroScape will be a very useful tool in the proper hands. ____

      1. Confirm activity of MR3 on Miro1 in relevant samples. Direct downregulation? Modulation of other targets known to be altered by MR3? __

      -As mentioned above, we have shown in tumor cells, MR3 disrupts pathogenic Miro1-protein interactions without the need to reduce Miro1 protein. There is currently no other target known to be altered by MR3, not even Miro2, demonstrated before (1, 2). We have added more explanations in Main Text.

      __ Conduct further mechanistic work to validate claims inferred by differentially expressed genes.__

      -As mentioned above, our current paper is focused on bioinformatic methods and resource building. Further mechanistic work will be performed in another paper.

      __ Significantly temper claims related cell targeting, direct communication between cells and overarching responses inferred from Sequencing data. -Done. See above and Main Text.

      Reviewer #3 (Significance (Required)):

      My laboratories expertise lies in signaling related to mitochondrial structure and function. We have investigated the Miro1 protein and effects on cellular responses related to Miro1 expression. We have tested the MR3 compound in our own systems with limited success. Therefore my major concerns lie in validating the on-target activity of the compound in their models. __-As explained above, in our other papers we have thoroughly examined on-target activity of MR3 by courter-screening other Miro1 related/similar proteins (1, 2, 6, 7) and by using Miro1 KD cells. We have now added more explanations in Main Text.

      __ With additional mechanistic validation this could be a very significant study. Using advanced model systems as the authors do allows for a comprehensive understanding of tissue responses. This is far advanced from simple single cell line culture studies but also adds significant complexity to the interpretation of the data. I am a strong believer that Sequecing data must be validated with functional assays.__

      -We agree and are actively conducting those studies. However, bioinformatic analysis and method and resource building are sometimes too comprehensive to combine with functional data which may take years to obtain. We think our paper's method, markers identified in TME, and resources will be very useful to the community.

      References

      1. Hsieh CH, Li L, Vanhauwaert R, Nguyen KT, Davis MD, Bu G, Wszolek ZK, Wang X. Miro1 Marks Parkinson's Disease Subset and Miro1 Reducer Rescues Neuron Loss in Parkinson's Models. Cell metabolism. 2019;30(6):1131-40 e7. Epub 2019/10/01. doi: 10.1016/j.cmet.2019.08.023. PubMed PMID: 31564441; PMCID: PMC6893131.
      2. Li L, Conradson DM, Bharat V, Kim MJ, Hsieh CH, Minhas PS, Papakyrikos AM, Durairaj AS, Ludlam A, Andreasson KI, Partridge L, Cianfrocco MA, Wang X. A mitochondrial membrane-bridging machinery mediates signal transduction of intramitochondrial oxidation. Nat Metab. 2021. Epub 2021/09/11. doi: 10.1038/s42255-021-00443-2. PubMed PMID: 34504353.
      3. Mi Y, Guo N, Luan J, Cheng J, Hu Z, Jiang P, Jin W, Gao X. The Emerging Role of Myeloid-Derived Suppressor Cells in the Glioma Immune Suppressive Microenvironment. Front Immunol. 2020;11:737. Epub 2020/05/12. doi: 10.3389/fimmu.2020.00737. PubMed PMID: 32391020; PMCID: PMC7193311.
      4. Dean PT, Hooks SB. Pleiotropic effects of the COX-2/PGE2 axis in the glioblastoma tumor microenvironment. Front Oncol. 2022;12:1116014. Epub 20230126. doi: 10.3389/fonc.2022.1116014. PubMed PMID: 36776369; PMCID: PMC9909545.
      5. Mathios D, Kim JE, Mangraviti A, Phallen J, Park CK, Jackson CM, Garzon-Muvdi T, Kim E, Theodros D, Polanczyk M, Martin AM, Suk I, Ye X, Tyler B, Bettegowda C, Brem H, Pardoll DM, Lim M. Anti-PD-1 antitumor immunity is enhanced by local and abrogated by systemic chemotherapy in GBM. Science translational medicine. 2016;8(370):370ra180. Epub 2016/12/23. doi: 10.1126/scitranslmed.aag2942. PubMed PMID: 28003545; PMCID: PMC5724383.
      6. Bharat V, Durairaj AS, Vanhauwaert R, Li L, Muir CM, Chandra S, Kwak CS, Le Guen Y, Nandakishore P, Hsieh CH, Rensi SE, Altman RB, Greicius MD, Feng L, Wang X. A mitochondrial inside-out iron-calcium signal reveals drug targets for Parkinson's disease. Cell Rep. 2023;42(12):113544. Epub 2023/12/07. doi: 10.1016/j.celrep.2023.113544. PubMed PMID: 38060381.
      7. Bharat V, Hsieh CH, Wang X. Mitochondrial Defects in Fibroblasts of Pathogenic MAPT Patients. Front Cell Dev Biol. 2021;9:765408. Epub 2021/11/23. doi: 10.3389/fcell.2021.765408. PubMed PMID: 34805172; PMCID: PMC8595217.
      8. Kwak CS, Du Z, Creery JS, Wilkerson EM, Major MB, Elias JE, Wang X. Optogenetic Proximity Labeling Maps Spatially Resolved Mitochondrial Surface Proteomes and a Locally Regulated Ribosome Pool. bioRxiv. 2025. Epub 2026/01/07. doi: 10.64898/2025.12.21.693523. PubMed PMID: 41497653; PMCID: PMC12767525.
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The authors study responses to MIRO1 inhibition in a mouse model of GL261 GBM and in human tissue pieces treated ex vivo. They provide an interesting link between mitochondrial function and potential therapeutic outcomes in a tumor type that is typically challenging to treat. The manuscript is written clearly, in correct English language and figures are well structured and easy to interpret.

      Major critique:

      1. However, I need to stress that study is based of few experiments with low robustness. The predominant experiment is single-nuclei RNAseq analysis of GL261 tumors implanted into mice, constituting 3 CTRL and 2 treated mice, due to removal of 3rd animal following sequencing (low recovery of high quality nuclei). Therefore, the sample group is small. This is understandable for snRNA-seq experiment (although 3 animals in treated group is somewhat necessary), but the efficiency of treatment with MR3 should be better documented in a larger cohort of animals. Crucial changes in distribution of cell types or polarisation of myeloid cells should be confirmed with flow cytometry, which is more feasible on a larger cohort.
      2. Human model does not seem robust (also, only 3 patients). Very few genes are affected by treatment (incomparably less than in mice), which poses a question if the model is sufficient to study the effect of the treatment. This should be at least discussed and arguments should be stated why such model is suitable.
      3. Fig. S1E shows that actually few genes are commonly affected between human and mouse experiments. So conclusion about "conserved" modulation by MR3 seem an overstatement.
      4. Mechanistic conclusions about PARP11, PGE, PD-L1 etc are not documented by any wet lab experiments, just by bioinformatic modelling.

      Minor:

      1. Authors should discuss choice of GL261 model. It is immunogenic and does not resemble human GBM ideally, so the choice should be explained.
      2. In clustering of mouse snRNAseq data, T cells seem underclustered, e.g. Treg cluster clearly constitutes half of Il2ra-positive and negative cells, the latter probably being conventional CD4+ T cells (usually CD4+ T cells in GL261 are 50:50 Treg and conventional). This can affect further conclusions on cell:cell interactions.

      Significance

      The study provides an interesting conclusion and potentially relevant discovery. However, in opinion of this reviewer, the performed experiments do not strengthen this sufficiently, especially in terms of mechanical insights and weak data on human samples. In the line of general literature on new treatments of GBM and testing thereof in mouse model, this study lacks mechanistic insights and solid data on therapeutic efficiency.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary of findings and key conclusions

      This manuscript asks how pharmacologic targeting of the outer mitochondrial membrane protein MIRO1 (RHOT1) with a MIRO1-binding compound (MR3) reshapes immunosuppressive programs in the glioma tumor microenvironment (TME). The core of the paper is a cross-species transcriptomic comparison that combines an in vivo mouse dataset with an ex vivo human perturbation dataset.

      Model systems and approach (as described):

      • Mouse in vivo: GL261-Luc intracranial glioma in C57BL/6J mice; MR3 is administered intracranially at the implantation site (10 µM in 5 µL DMSO) on days 11 and 18, and tumors are harvested on day 22 for single-nucleus RNA-seq (snRNA-seq).
      • Mouse snRNA-seq: NeuN-based nuclei sorting, 10x Genomics v3.1; alignment to mm10; Seurat-based integration and annotation. Tumor-cell calling is supported by CNV inference (SCEVAN/CopyKAT). One MR3-treated sample is excluded after QC, leaving 3 control vs 2 MR3-treated samples (11,940 NeuN− nuclei).
      • Human ex vivo: freshly resected glioma cores from 3 patients are cultured with 10 µM MR3 or DMSO for 24 h, followed by bulk RNA-seq (STAR alignment to hg19; DESeq2 for differential expression).
      • Cross-species integration: the analysis is restricted to 1:1 orthologs and protein-coding genes shared across datasets; inferred cell-cell signaling is explored with CellChat.

      Main findings (as presented):

      • MR3 shifts expression of a subset of glioma-associated genes toward a non-tumor-like direction ("rescued genes") and is associated with large changes in inferred cell-type composition in the mouse snRNA-seq dataset (including a marked drop in the fraction of nuclei annotated as tumor: 44.5% to 4.3%; Fig. 1E).
      • Across TCGA-vs-GTEx (glioma-upregulated genes) and three MR3 response analyses (mouse snRNA-seq, mouse pseudo-bulk, and human bulk RNA-seq), PARP11/Parp11 is reported as the only gene that is consistently upregulated in glioma and consistently downregulated by MR3 (Fig. 2B).
      • Within the mouse myeloid compartment, Parp11 is most enriched in MAC4 and MAC1, while MAC1 shows high Cd274 (Pdl1/PD-L1). MR3 reduces Parp11 in MAC4/MAC1 and reduces Cd274 in MAC1 (Fig. 2H).
      • CellChat analysis suggests that in controls MAC1 is the dominant sender of PD-L1/PD-1 signaling to CD8+ T cells (Fig. 3C), and that this PD-L1/PD-1 interaction is strongly diminished after MR3 (Fig. 3E).
      • The authors propose a paracrine model in which MAC4-derived PGE2 (via Ptges3) sustains Parp11 expression in MAC1 through cAMP/PKA/CREB, promoting PD-L1-mediated T-cell suppression; MR3 disrupts this circuitry (Fig. 4).

      Major comments

      1. Strength of the conclusions Two parts of the story felt well supported by the data as shown. First, the cross-species convergence on PARP11/Parp11 is a clear and potentially useful result (Fig. 2B). Second, the myeloid subclustering plus CellChat analysis makes a coherent case that PD-L1/PD-1 signaling in this model is dominated by a specific macrophage subset (MAC1) and changes after MR3 (Fig. 2H, Fig. 3). Where I was less convinced is when the manuscript moves from "transcriptomic and modeling evidence" to causal statements such as "MIRO1-mediated axis driving immunosuppression" and "MR3 reduces tumor burden by reactivating immunity." At the moment, several central inferences remain indirect:
        • Causality is inferred primarily from transcriptomic shifts and ligand-receptor inference rather than functional immune readouts.
        • On-target attribution to MIRO1 hinges on MR3 being a MIRO1 binder; the study does not include a genetic MIRO1 perturbation or a target-engagement/epistasis test in the relevant immune compartments (and the authors acknowledge this limitation in the Discussion).
        • The very large reduction in "tumor cell proportion" (Fig. 1E) is striking but is still a composition measure of recovered nuclei; it is not, on its own, a direct measurement of tumor size/burden and could be sensitive to differential nuclei recovery or cell loss during processing. I think the paper can go forward in its current scope, but the strength of the claims should match the level of evidence. If the authors want to keep strong, causal language in the title/abstract ("driving immunosuppression," "reduces tumor burden"), then I consider one or two targeted validation experiments essential (see below). Alternatively, the authors can temper the language and position the mechanistic model more explicitly as a hypothesis generated from the transcriptomic analysis.
      2. Statements that should be labeled as preliminary/speculative (unless additional validation is added)
        • MAC4-derived PGE2 as the upstream driver of MAC1 Parp11/PD-L1: plausible and nicely consistent with Ptges3 being MAC4-high in controls and reduced with MR3 (Fig. 4A), but not demonstrated.
        • MIRO1 → mtDNA → cGAS/STING → Ptges3 as a mechanistic chain: interesting, but currently framed largely by pathway knowledge plus modest expression changes (Supplementary Fig. S5).
        • "MR3 reactivates anti-tumor immunity to reduce tumor burden": the gene set enrichment and CellChat shifts are consistent with immune activation, but immune-mediated tumor control is not directly tested.
      3. Replication and statistics Mouse snRNA-seq replication is limited after QC (3 control vs 2 MR3-treated animals). With n=2 treated, it is hard to know whether some of the biggest composition and cluster-level changes are robust to animal-to-animal variability. Relatedly, the snRNA-seq differential expression is performed with Seurat FindMarkers (Wilcoxon rank-sum). Per-cell testing can inflate significance if biological replicate structure is not accounted for (pseudoreplication). I suggest the authors clarify exactly how they handled sample-level replication for the key DE results and, where possible, re-run the main DE comparisons using a sample-aware approach (e.g., pseudo-bulk within cell types/subclusters). For the human bulk RNA-seq, the methods indicate 3 patient tissues split across MR3 vs DMSO for 24 h. In DESeq2, a paired design (including patient as a blocking factor) would be important to avoid patient-to-patient variability dominating the treatment signal; the manuscript should confirm whether the design formula accounted for this. Finally, several places in the Methods define significance using p-value cutoffs (e.g., GEPIA3 TCGA/GTEx analysis uses p < 0.05 and |log2FC| > 1; human DE uses p < 0.05 and log2FC >= 1). Because multiple testing is substantial in all of these analyses, I recommend reporting FDR-adjusted values consistently (and being explicit about whether figures/tables show raw or adjusted p-values).
      4. Do the data support the macrophage-to-CD8 suppression claim? The CellChat PD-L1/PD-1 network figures are suggestive (Fig. 3C/E), but ligand-receptor inference is not the same as demonstrating functional T-cell inhibition. At minimum, I would like to see one orthogonal readout (flow or immunostaining) showing that PD-L1 protein on myeloid cells and PD-1 on CD8 T cells change in the expected directions after MR3, and that CD8 T cells show an activation/effector signature at the protein level.
      5. PARP11: mediator vs marker The cross-species PARP11 result is the most convincing and potentially generalizable finding in the manuscript (Fig. 2B). However, in the specific context of this study, PARP11 is still best supported as a conserved MR3-responsive candidate rather than a demonstrated causal driver of PD-L1-mediated suppression. If the authors want to argue PARP11 is an effector of the pathway (rather than a marker), they should either soften the language or add a minimal functional linkage experiment within the existing scope (see "Optional" experiments below).
      6. Reproducibility and clarity of methods I appreciate that the authors provide a code/data portal (MiroScape) and a GitHub link. To make the study as reproducible as possible, I recommend:
        • Deposit raw sequencing reads for both mouse and human datasets (GEO/SRA) and include accession numbers in the manuscript.
        • Provide a short, consolidated "computational reproducibility" note with software versions and key parameters (Seurat, CellChat, STAR, DESeq2, etc.).
        • Clarify pseudo-bulk construction (what is aggregated, at what level, and how many biological replicates contribute to each pseudo-bulk comparison).
        • Add a brief summary of MR3 provenance/validation and what "MIRO1-binding" means operationally in the context of these experiments (especially for readers outside the MIRO1 field). Experiments requested (kept within the existing claims) I am intentionally not suggesting new lines of experimentation. The experiments below are aimed only at supporting the paper's current central claims. I separate them into items I consider essential vs optional, depending on how strongly the authors want to phrase mechanistic conclusions. Essential if the title/abstract continue to use strong causal language
        • Protein-level validation of the PD-L1/PD-1 axis and CD8 activation in the GL261 model. A focused flow cytometry panel (myeloid PD-L1; CD8 PD-1 plus one or two effector markers such as GZMB/IFNG/Ki67) or multiplex IF/IHC on tumor sections would substantially strengthen the central MAC1 → CD8 claim.
        • An orthogonal measure of tumor burden in the same treatment paradigm. The manuscript currently treats the drop in the fraction of nuclei annotated as tumor (Fig. 1E) as a reduction in tumor burden; I recommend including IVIS longitudinal data and/or histologic tumor area/volume at harvest to support this statement.
        • If feasible, modestly increase in vivo biological replication (the snRNA-seq analysis currently has n=2 treated after QC). Even adding one additional treated animal that passes QC would help. Feasibility (rough guidance only; core pricing varies widely by institution): a repeat GL261 cohort to harvest tumors for flow and/or histology typically takes ~3-6 weeks end-to-end. A small flow panel plus core time is often on the order of a few thousand USD (antibodies and cytometry), while basic histology/IF quantification might be in the hundreds to low-thousands. If the authors already have stored tissue from the existing cohort, some of this could be faster/cheaper. Optional (only if the authors want the MAC4 → PGE2 → Parp11 mechanism to be more than a model)
        • Measure PGE2 (ELISA or targeted lipidomics) in tumor lysates/conditioned media from control vs MR3-treated samples, or provide a closer proxy for PGE2 pathway engagement in the relevant clusters. Optional (only if the authors want to argue PARP11 is an effector)
        • A minimal functional linkage experiment (in vitro) testing whether PARP11 perturbation phenocopies the relevant aspect of MR3 in macrophages (e.g., PD-L1 levels and/or the ability to suppress CD8 activation in a co-culture). This could be done with a PARP11 inhibitor or knockdown. I do not think in vivo genetics are required for this manuscript, but some functional tie would prevent overinterpretation.

      Minor comments

      A. Analysis/experimental clarifications that seem straightforward

      • Human DESeq2: please clarify whether the DESeq2 design was paired by patient (i.e., patient as a blocking factor).
      • snRNA-seq DE: please clarify whether any sample-aware method was used for the key DE conclusions (especially Parp11/Cd274 changes) rather than per-cell statistics alone.
      • CellChat: because min.cells filtering is used (min.cells = 20), please note this explicitly in figure legends where subclusters appear only in one condition, so readers understand why certain labels are missing.

      B. Figure and text consistency issues

      I noticed several figure/legend/citation issues that look like simple fixes: - Fig. 3 legend panel labeling: the legend text refers to the PD-L1/PD-1 chord plot as (C) MR3− and (D) MR3+, but (D) is the heatmap panel; the chord plots are (C) and (E). This should likely read (C) MR3− and (E) MR3+. - Fig. 5 panel reference: the Results text refers to the Cross Species module as Fig. 5F, but the Fig. 5 legend defines panels (A-E) and labels (E) as "Cross Species module." Please reconcile (either change the text to Fig. 5E or add a panel F). - Discussion figure citation: the Discussion cites Ptges3/PGE2 evidence as "(Figure 3)," but Ptges3 is shown in Fig. 4A and the model is in Fig. 4B. - Fig. 1D numbers: the Results text states 509/1,602 (mouse) and 15/106 (human) "rescued" genes (Fig. 1D), but the Fig. 1D pie charts are labeled with different totals (mouse total 3490; human total 104). Please reconcile the denominators and ensure the figure matches the text and analysis choice (bulk vs snRNA vs filtered gene sets). - Fig. 2 legend: there is a stray quote in "lymphoid subclusters" (appears as subclusters").

      C. Presentation and framing

      • Tone down or carefully qualify statements equating snRNA-seq composition shifts with reduced tumor burden (or add an orthogonal tumor-burden measurement as suggested above).
      • Where possible, tie mechanistic language explicitly to the level of evidence ("consistent with," "suggests," "model proposes") so readers do not over-interpret the transcriptomic inference.
      • Consider adding a small schematic in the Results or a short "interpretation" sentence in the figure legends explaining what the CellChat plots do and do not show, since non-specialists can misread these as direct interaction measurements.

      D. Prior literature The PARP11 immunotherapy literature is cited appropriately. For the PGE2 angle, it may help readers if the authors add one or two glioma-focused references on PGE2-mediated myeloid/T-cell suppression (if not already in the full reference list).

      Significance

      Nature and significance of the advance

      The advance here is primarily conceptual and resource-oriented. Conceptually, the work connects a mitochondrial regulator (MIRO1) to a specific, testable immunosuppressive circuit in the glioma TME. Technically, the cross-species perturbation framework and the accompanying MiroScape portal should be useful to groups looking for conserved, drug-responsive immune programs.

      Context within the existing literature

      Immunosuppression in glioma and the importance of tumor-associated myeloid populations are well established, as is the limited success of checkpoint blockade in GBM. The manuscript's proposed MAC4/MAC1 paracrine model and its emphasis on PD-L1/PD-1 signaling adds a focused, hypothesis-generating view of how particular macrophage states might sustain CD8 dysfunction. The identification of PARP11 as a conserved MR3-responsive gene also fits with emerging work implicating PARP11 in immunoregulatory programs and response to immunotherapy.

      Audience

      • Neuro-oncology and glioma TME researchers (myeloid heterogeneity, immune suppression).
      • Tumor immunology groups interested in myeloid-driven checkpoint resistance.
      • Researchers working on mitochondrial stress signaling and immunometabolism.
      • Computational biologists building cross-species or multi-modal integration frameworks.

      Reviewer expertise and limitations

      Keywords: glioma microenvironment; macrophage/microglia biology; tumor immunology; single-cell/nucleus transcriptomics; computational ligand-receptor inference. Limitations: I am not a medicinal chemist, so I cannot deeply evaluate MR3 chemistry, PK/PD, or specificity beyond what is presented. I also did not evaluate the full web-portal implementation beyond the manuscript description.

    1. Reviewer #2 (Public review):

      I have reviewed both the original and revised version of this manuscript and while no additional experiments were added, I find the interpretations and discussion of the limitations of the study have improved. This is appreciated.

      My original concern regarding the mixture treatments largely remains. Figure 4 nicely shows that the mixtures are more potent than the average of all single compounds. However, Fig S3 shows that the effects of mixtures are not significantly different from effects of at least one, single N,S compound (voruscharin or uscharin) across all measured growth/sequestration responses. Essentially, the effects of single N,S compounds is similar to mixtures (which also contain N,S compounds).

      While the current results are certainly interesting as presented, in my view the main takeaway of the manuscript would be more compelling if it could be demonstrated that it isn't simply the presence of N,S compounds in the mixtures driving the observations. For example, does a mixture of all compounds except voruscharin or uscharin still have stronger growth/sequestration effects compared to single non-N,S compounds?

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the ecological interactions between wild plants and specialized herbivorous insects, structural innovation-based diversification of secondary metabolites often occurs. In this study, Agrawal et al. utilized two milkweed species (Asclepias curassavica and Asclepias incarnata) and the specialist Monarch butterfly (Danaus plexippus) as a model system to investigate the effects of two N,S-cardenolides - formed through structural diversification and innovation in A. curassavica-on the growth, feeding, and chemical sequestration of D. plexippus, compared to other conventional cardenolides. Additionally, the study examined how cardenolide diversification resulting from the formation of N,S-cardenolides influences the growth and sequestration of D. plexippus. On this basis, the research elucidates the ecophysiological impact of toxin diversity in wild plants on the detoxification and transport mechanisms of highly adapted herbivores.

      Strengths:

      The study is characterized by the use of milkweed plants and the specialist Monarch butterfly, which represent a well-established model in chemical ecology research. On one hand, these two organisms have undergone extensive co-evolutionary interactions; on the other hand, the butterfly has developed a remarkable capacity for toxin sequestration. The authors, building upon their substantial prior research in this field and earlier observations of structural evolutionary innovation in cardenolides in A. curassavica, proposed two novel ecological hypotheses. While experimentally validating these hypotheses, they introduced the intriguing concept of a "non-additive diversity effect" of trace plant secondary metabolites when mixed, contrasting with traditional synergistic perspectives, in their impact on herbivores.

      Weaknesses:

      The manuscript has two main weaknesses. First, as a study reliant on the control of compound concentrations, the authors did not provide sufficient or persuasive justification for their selection of the natural proportions (and concentrations) of cardenolides. The ratios of these compounds likely vary significantly across different environmental conditions, developmental stages, pre- and post-herbivory, and different plant tissues. The ecological relevance of the "natural proportions" emphasized by the authors remains questionable. Furthermore, the same compound may even exert different effects on herbivorous insects at different concentrations. The authors should address this issue in detail within the Introduction, Methods, or Discussion sections.

      Second, the study was conducted using leaf discs in an in vitro setting, which may not accurately reflect the responses of Monarch butterflies on living plants. This limitation undermines the foundation for the novel ecological theory proposed by the authors. If the observed phenomena could be validated using specifically engineered plant lines-such as those created through gene editing, knockdown, or overexpression of key enzymes involved in the synthesis of specific N,S-cardenolides - the findings would be substantially more compelling.

      Reviewer #2 (Public review):

      This study examined the effects of several cardenolides, including N,S-ring containing variants, on sequestration and performance metrics in monarch larvae. The authors confirm that some cardenolides, which are toxic to non-adapted herbivores, are sequestered by monarchs and enhance performance. Interestingly, N,S-ring-containing cardenolides did not have the same effects and were poorly sequestered, with minimal recovery in frass, suggesting an alternate detoxification or metabolic strategy. These N,S-containing compounds are also known to be less potent defences against non-adapted herbivores. The authors further report that mixtures of cardenolides reduce herbivore performance and sequestration compared to single compounds, highlighting the important role of phytochemical diversity in shaping plant-herbivore interactions.

      Overall, this study is clearly written, well-conducted and has the potential to make a valuable contribution to the field. However, I have one major concern regarding the interpretations of the mixture results. From what I understand of the methods, all tested mixtures contain all five compounds. As such, it is not possible to determine whether reduced performance and sequestration result from the complete mixture or from the presence of a single compound, such as voruscharin for performance and uscharin for sequestration. For instance, if all compounds except voruscharin (or uscharin) were combined, would the same pattern emerge? I suspect not, since the effects of the individual N,S-containing compounds alone are generally similar to those of the full mixture (Figure S3). By taking the average of all single compounds, the individual effects of the N,S-containing ones are being inflated by the non-N,S-containing ones (in the main text, Figure 4). In the mix, of course, they are not being 'diluted', as they are always present. This interpretation is further supported by the fact that in the equimolar mix, the relative proportion of voruscharin decreases (from 50% in the 'real mix'), and the target measurements of performance and sequestration tend to increase in the equimolar mix compared to the real mix.

      Despite this issue, the discussion of mixtures in the context of plant defence against both adapted and non-adapted herbivores is fascinating and convincing. The rationale that mixtures may serve as a chemical tool-kit that targets different sets of herbivores is compelling. The non-N,S cardenolides are effective against non-adapted herbivores and the N,S-containing cardenolides are effective against adapted herbivores. However, the current experiments focus exclusively on an adapted species. It would be especially interesting to test whether such mixtures reduce overall herbivory when both adapted and non-adapted species are present.

      It remains possible that mixtures, even in the absence of voruscharin or uscharin, genuinely reduce sequestration or performance; however, this would need to be tested directly to address the abovementioned concern.

      Thanks for these insightful reviews and your summary assessment. We certainly agree that ours was a laboratory study with a single specialized insect, and both mixtures types had all five compounds (controlling for total toxin concentration). Thus, our conclusion that combined effects of naturally occurring toxins (within the cardenolide class) have non-additive effects for the specialized sequestering monarch are constrained by our experimental conditions. In our assay we used two mixture types, equimolar and “natural” proportions. We acknowledge that the natural proportions will vary with plant age, damage history, etc. of the host plant, Asclepias curassavica. Our proportions were based on growing the plants a few different times under variable conditions. Although we did not conduct these experiments on non-adapted insects, we discuss a related experiment that was conducted with wild-type and genetically engineered Drosophila (Lopez-Goldar et al. 2024, PNAS). In sum, we appreciate the reviewers’ comments.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (i) More convincingly justify the choice and ecological relevance of the "natural" cardenolide ratios, (ii) Clarify the interpretation of mixture effects, and (iii) more explicitly discuss the limitations of leaf-disc assays and the absence of non-adapted herbivores in light of the broader coevolutionary claims.

      Thank you for these suggestions. We have added several sentences of text to the Discussion section to make these points.

      Reviewer #1 (Recommendations for the authors):

      (1) Statistical analysis is missing from Figure 3 and Figure S3, making it difficult to assess the significance of the data.

      Much of the data in Fig. 3 is meant for descriptive presentation, with the main statistical analysis (contrast between N,S and non-N,S cardenolides given in the main text of the results. We have added treatment differences between the sequestration efficiencies to the figure as well.

      (2) To help readers intuitively understand how certain results (such as ECD and sequestration efficiency) were calculated, the authors can provide the equations used for these computations.

      Thank you, this was given in the methods and we have added it to the Result on first mention as well.

      (3) For Figure 4, we suggest presenting the results of the equal mixture treatment and the realistic mixture treatment separately, rather than averaging the results from these two types of treatments.

      We understand and appreciate this comment – all of the treatment means are given in Fig. S3. For this particular figure we have opted to stick with the binary comparison (singles vs. mixed) to maximize replication for statistical tests (typically n = 25 vs. 10).

      Reviewer #2 (Recommendations for the authors):

      Given the interpretations and discussion generally, I feel the manuscript would benefit from either additional experiments (mixtures w/o N-S compounds), inclusion of non-adapted herbivore performance, or reframing of the explicit interpretations from your findings.

      We have added some caveats to the text but not added any additional experiments.

      Also, for all treatments/mixtures are concentrations above the IC50? Perhaps this could be calculated from the information presented, but it may be best to explicitly mention this.

      This is an interesting question. IC50’s are estimated from in vitro assays (with the enzyme and toxins in microplate wells) and so are not translatable to foliar concentrations. As indicated in the text, we chose cardenolide levels based on foliar concentrations to match A. curassavica.

      Some minor points:

      (1) Although the intact N,S-ring-containing compounds are recovered in low amounts in frass (and not sequestered), is there evidence of N,S-ring components being otherwise traceable in the frass? For example, can excess S or N be detected in frass? This could provide insight into differential detoxification or reincorporation of these elements, potentially explaining variation between voruscharin and uscharin.

      Great question! We have not been able to detect breakdown projects. In other experiments we have conducted mass spectrometric analysis of bodies and frass, but have not been able to find the features representing breakdown products. Nonetheless, as mentioned below, the main conversion products are evident and measurable, as in this study.

      (2) As a point of curiosity, is there evidence of interconversion between such compounds? For instance, if monarchs are fed only voruscharin, can other cardenolides be detected in their tissues?

      Yes, we have tried to make this more clear in the text. Both uscharin and voruscharin are converted to calotropin and calactin.

    1. Tabel 3

      Ik zou hier niet zozeer op een CIM of LGM element willen instappen, maar juist op de expliciete ontologische verbindingen die worden gemaakt, en wat dat voor gevolgen heeft.

      Ik zie in ons model de volgende ontologische verbindingen: 1. Onderscheid instantie & type; 2. Onderscheid tussen individuen en categorieën; 3. Onderscheid in domeinobject en gegeven; 4. Onderscheid tussen domeinobject en kenmerk van object 5. Onderscheid tussen domeinobject en relatie tussen domeinobjecten 6. Onderscheid tussen gegevensobject en gegeven

      Een verbinding die we juist NIET leggen is: 1. Geen onderscheid tussen blote feiten (een auto) en institutionele feiten (een huwelijk, een eigendom, etc)

    1. Reviewer #1 (Public review):

      Summary:

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments:

      (1) Structural overhaul and figure reorganization<br /> The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.<br /> - Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.<br /> - Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).<br /> - Figure 4I: This panel is currently unclear and should be drastically simplified.<br /> I recommend to reorganize figures as follows:<br /> - Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.<br /> - New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.<br /> - New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.<br /> - New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      (2) Specificity and redundancy of actin perturbation<br /> To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:<br /> - Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.<br /> - Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X. laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.

      (3) Expansion of PANEM functional analysis<br /> To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact):<br /> - Test PANEM function in at least one additional cell line that displays PANEM to rule out cell-line-specific effects.<br /> - Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.<br /> - Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.<br /> - Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).<br /> - Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.<br /> - Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.

      (4) Conceptual integration in Introduction and Discussion<br /> The manuscript should better situate its findings within the context of early mitotic chromosome movements:<br /> - Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.<br /> - Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).<br /> - Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.

      Significance:

      Advance:<br /> This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature.

      Audience:<br /> Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise:<br /> My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

    2. Reviewer #3 (Public review):

      Summary:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochore-microtubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major Comments:

      (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (e.g. Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence and ask how they recover in the presence and absence of azBB.

      (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4) ?In figure 3D, are they separated? If not, would it look different?

      (3) Uncongressed polar chromosomes (e.g., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.

      (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.

      Significance:

      The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.

    3. Author response:

      General Statements

      Our study provides important mechanistic insights into how the perinuclear actomyosin network PANEM facilitates the interaction of unfavorably positioned chromosomes, i.e. peripheral and polar chromosomes, with the mitotic spindle in early mitosis to ensure their correct segregation in subsequent anaphase. All reviewers agree that our study makes important contribution to the field of mitosis and chromosome segregation. They make positive comments on our manuscript, for example, ‘The work highlights the PANEM as a key spatial and temporal element of chromosome congression’, ‘The work is an excellent addition to the field’, and ‘the concept of PANEM could be integrated into textbooks and models of chromosome congression’. All three reviewers also acknowledge the high quality of the data, rigorous and accurate analyses, and convincing quantification in our study. Reviewers 1 and 3 give several comments and suggestions for revision of our manuscript. Please find our point-by-point revision plan of the manuscript from page 3.

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Figure 4I: This panel is currently unclear and should be drastically simplified.

      We will follow this suggestion and simplify this figure. For example, we plan to remove the column of “Start” because it is obvious and does not provide much new information.

      I recommend to reorganize figures as follows:

      Figure I: Keep as single figure but simplify. Figure 1D and 1E could be combined, move unnormalized SCV to supplementary materials. Same goes for 1F.

      We will follow this suggestion and reorganize Figure 1 accordingly.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      As suggested, we will conduct new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We will then combine the new results with Figure S7 to make the new Figure 8.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      C. Expansion of PANEM functional analysis

      To strengthen the conclusions and broaden the study beyond the group's previous work, PANEM function should be tested in additional contexts (some may be considered optional but important for broader impact): [underlined by authors]

      Test PANEM function in at least one additional cell line that displays PANEM to rule out cellline-specific effects.

      As suggested, we will study the effect of PANEM contraction in one or two additional cell lines that form PANEM during prophase. For example, we plan to inhibit the PANEM contraction and study the outcome, focusing on the generation of polar chromosomes, which is the major defect after the inhibition of PANEM contraction in U2OS cells.

      Evaluate PANEM contraction role in unsynchronized U2OS cells, where centrosome separation can occur before NEBD in a subset of cells (Koprivec et al., 2025), and in other cell types with variable spindle elongation timing.

      As suggested, we will investigate the outcome (e.g. generation of polar chromosomes) of reduced PANEM contraction in unsynchronized U2OS cells, and address whether the two subsets of cells, where centrosomes’ separation occurs before and after NEBD, show any difference in the outcome.

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      To explain the new interpretation of our results more clearly, we plan to add a new diagram to a supplemental figure in the revised manuscript.

      Minor Comments

      Sixth subheading (currently in Discussion): Move the final paragraph of the Discussion into the Results and expand it with preliminary analyses linking PANEM contraction to congression efficiency across untreated cell types or under mild nocodazole treatment.

      As suggested, we will move the final paragraph of the Discussion to make a new final section in the Results. Moreover, as suggested, we will study the outcome of inhibiting PANEM contraction in cell lines other than U2OS, and add the results to the new final section in the Results.

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed or will address the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is. So, we do not plan a revision based on this reviewer’s comments.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochoremicrotubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (4) The work has high quality manual tracking of objects in early mitosis- if this would be made available to the field, it can help build AI models for tracking. The authors could consider depositing the tracking data and increasing the impact of their work.

      As suggested, we will include kinetochore tracking data as supplemental data in the revised manuscript.

      Minor points

      (2) Discussion point: If cells had not separated their centrosomes before NEBD, would PANEM still be effective? Perhaps the cancer cell lines or examples as shown in Figure 6A have some clues here.

      The same question has been raised by Reviewer #1’s major point. We will undergo new experiments to directly address this question in a revised manuscript. If we do not obtain interpretable results, we will discuss this issue further in the Discussion, as suggested.

      (3) Figure 7 cartoon shows misalignment leading to missegregation. It may be useful to consider this in the context of the centrosome directed kinetochore movements via pivoting microtubules. Is this process blocked in azBB-treated cells?

      This issue is closely relevant to point 2 above. As discussed above, we will first address this issue experimentally. If we do not obtain interpretable results, we will discuss this issue further in the Discussion.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Remove repetitive statements that simply restate that later phenotypes arise as consequences of delayed Phase 1 (applicable to subheadings 3 onward).

      As suggested, we have removed the statement for the delayed start of Phase 2 for peripheral kinetochores in azBB-treated cells (Page 9, second paragraph). We have also simplified the statement for the delayed start of Phase 3 and Phase 4 to avoid repetition (Page 9, third paragraph; Page 10, second paragraph).

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Apply global actin inhibitors (e.g., cytochalasin D, latrunculin A) to disrupt the entire actin cytoskeleton. These perturbations strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as reported previously (Lancaster et al., 2013; Dewey et al., 2017; Koprivec et al., 2025). The minimal effect of global inhibition must be addressed when proposing a localized actomyosin mechanism. Comment if the apparent differences in this approach and one that the authors were using arises due to different cell types.

      We did experiments along this line, using a dominant-negative LINC construct, in our previous study (Booth et al eLife 2019). LINC-DN should more specifically remove/reduce PANEM than the global actin inhibitors mentioned above. LINC-DN attenuated the reduction of CSV soon after NEBD and increased the number of polar chromosomes (Booth et al eLife 2019); i.e. in this regard, the outcome was similar to azBB treatment in the current study. One can expect that global actin inhibitors would also inhibit the PANEM formation and show effects similar to LINC-DN. By contrast, the indicated references reported that global actin inhibitors strongly affect mitotic rounding and cytokinesis but only modestly influence early chromosome movements, as pointed out by the reviewer. Such a difference may have arisen due to different cell types (e.g. some cells form the PANEM and others do not: Figure S7), a different extent in the inhibition of PANEM formation, and/or the inhibition of cell rounding and cytokinesis (e.g. if cytokinesis is more sensitive to inhibitors than is the PANEM formation, we may not observe the possible effects on early chromosome movements due to PANEM inhibition while cytokinesis is still affected). As suggested, we discussed this topic in the Discussion (page 15, second paragraph). 

      Clarify why spindle-associated actin, especially near centrosomes, as reported in prior studies using human cultured cells (Kita et al., 2019; Plessner et al., 2019; Aquino-Perez et al., 2024), was not observed in this study. The Myosin-10 and actin were also observed close to centrosomes during mitosis in X.laevis mitotic spindles (Woolner et al., 2008). Possible explanations include differences in fixation, probe selection, imaging methods, or cell type. Note that some actin probes (e.g., phalloidin) poorly penetrate internal actin, and certain antibodies require harsh extraction protocols. Comment on possibility that interference with a pool of Myo10 at the centrosomes is important for effects on congression.

      As the reviewer implies, we cannot rule out that we could not detect actin associated with the spindle or centrosomes because of the difference in methods or cell lines between the current study and the literature mentioned by the reviewer. We have therefore moderated our claim in the Discussion that ‘we did not detect any actin network inside the nucleus, on the spindle or between chromosomes’ by adding ‘at least, using the method and the cell line in the current study’ to this statement (Page 13, second paragraph). We have also cited the three references mentioned by the reviewer in the Discussion (Page 13, second paragraph). Regarding Myosin10, azBB (blebbistatin variant) should have negligible effects on class-X myosin, including Myosin-10 (Limouze et al 2004 [PMID 15548862]). It is therefore unlikely that the effects of azBB that we observed in the current study are due to the inhibition of Myosin-10. We have cited Woolner et al 2008 and another paper and discussed this topic in the Discussion (Page 13, second paragraph).

      C. Expansion of PANEM functional analysis

      Quantify not only the percentage of affected cells after azBB but also the number of chromosomes per cell with congression defects in the current and future experiments.

      It is tricky to count the number of chromosomes because they frequently overlap. Counting kinetochores is more feasible, but kinetochore signals show some non-specific background (e.g. those outside of the nucleus in prophase). We therefore quantified the chromosome volume at polar regions in azBB-treated cells (Figure 6C).

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Clearly state in the Introduction and elaborate in the Discussion that initiation of congression is coupled to biorientation (Vukušić & Tolić, 2025). This provides essential context for how PANEM-mediated nuclear volume reduction supports efficient congression of polar chromosomes.

      It has been a widely accepted view in the field that chromosome congression precedes biorientation, since the publication in 2006 (Kapoor et al Science 2006). Very recently, this view has been challenged by the new publication (Vukušić & Tolić, Nat comm 2025), as indicated by this reviewer. We have mentioned this new model and discussed the new interpretation of our results based on this new model, in the Discussion (page 14; ‘It has been a widely accepted view…’).

      To explain the new interpretation of our results more clearly, we plan to add a new diagram to a supplemental figure in the revised manuscript.

      Explain that PANEM is most critical for polar chromosomes because their peripheral positions are unfavorable for rapid biorientation (Barišić et al., 2014; Vukušić & Tolić, 2025).

      We have included such a statement in the Discussion, as a part of the new interpretation of our results based on the new model that chromosome biorientation precedes congression (see above). We have also cited the indicated two papers.

      Discuss how cell lines lacking PANEM (e.g., HeLa and others) nonetheless achieve efficient congression, and what alternative mechanisms compensate in the absence of PANEM. For example, it is well established that cells congress chromosomes after monastrol or nocodazole washout, which essentially bypasses the contribution of PANEM contraction.

      Following this suggestion, we discussed three possible mechanisms that could compensate for a lack of PANEM and facilitate kinetochore-MT interaction and chromosome congression, based on previous literature (Page 16): 1) the enhanced assembly rate of spindle MTs may facilitate kinetochore-MT interactions in N-CIN+ cancer cells, 2) chromosome biorientation may precede congression more frequently to promote the congression towards the spindle midplane, and 3) the balance between CENP-E, Dynein and chromokinesin’s activities may incline to greater chromosome-arm ejection forces towards the spindle midplane.

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Introduction

      Remove the reference to Figure 1A in the Introduction. The portion of Figure 1 and related text that recapitulates the authors' previous work should be incorporated into the Introduction, not the Results.

      As suggested in the second sentence of this comment, we have moved most of the second paragraph of the first section of Results to Introduction (Page 4) and cited Figure 1A and 1B in Introduction. We would like to keep the reference to Figure 1A in the Introduction, because showing the PANEM images at the beginning of the manuscript would help readers’ understanding of our study. In addition, citing Figure 1A in the Introduction is more consistent with the suggestion in the second sentence of this comment.

      Results (by subheading)

      First subheading: When introducing the ~8-minute early mitotic interval, cite additional studies that have characterized this period: Magidson et al., 2011 (Cell); Renda et al., 2022 (Cell Reports); Koprivec et al., 2025 (bioRxiv); Vukušić & Tolić, 2025 (Nat Commun); Barišić et al., 2013 (Nat Cell Biol).

      As suggested, we cited these references at the indicated part of the first section of the Results (page 5).

      Second subheading: Cite key reviews and foundational research on kinetochore architecture and sequential chromosome movement during early mitosis: Mussachio & Desai, 2017

      (Biology); Itoh et al., 2018 (Sci Rep); Magidson et al., 2011 (Cell); Vukušić & Tolić, 2025 (Nat Commun); Koprivec et al., 2025 (bioRxiv); Rieder & Alexander, 1990 (J Cell Biol); Skibbens et al., 1993 (J Cell Biol); Kapoor et al., 2006 (Science); Armond et al., 2015 (PLoS Comput Biol); Jaqaman et al., 2010 (J Cell Biol).

      Rieder & Alexander, 1990 (J Cell Biol) and Kapoor et al., 2006 (Science) have already been cited in the second section of the Results in the original manuscript. We agree that all other references should be cited in this manuscript, and they are now cited in the Introduction and/or Discussion where they fit best (e.g. Mussachio & Desai 2017 reviews the kinetochore in general and is therefore best cited in the Introduction).

      Third subheading: Clarify why some kinetochores on Figure 3A appear outside the white boundaries if these boundaries are intended to represent the nuclear envelope.

      We interpret that these are background signals in the cytoplasm, which do not come from kinetochores, because 1) before NEBD, they were outside of the nucleus, and 2) after NEBD, they did not show any characteristic kinetochore motions such as those towards a spindle pole (Phase 2) and the spindle mid-plane (Phase 4). We have commented on these background signals in the legend for Figure 3A.

      Fifth subheading: Cite studies on polar chromosome movements: Klaasen et al., 2022 (Nature); Koprivec et al., 2025 (bioRxiv). Clarify that Figure 5F displays only those kinetochores that initiated directed congression movements.

      These two references have already been cited and discussed in this Result section of our original manuscript. However, considering this suggestion, we have discussed more about polar chromosome movements reported by Koprivec et al (page 11). Meanwhile, the reviewer is correct about Figure 5F, and we have clarified this point in the Figure 5F legend.

      Discussion

      When discussing cortical actin, cite key reviews on its presence and function during mitosis:

      Kunda & Baum, 2009 (Trends Cell Biol); Pollard & O'Shaughnessy, 2019 (Annu Rev Biochem); Di Pietro et al., 2016 (EMBO Rep).

      As suggested, we have cited all these review papers in the Discussion (page 15), and mentioned the role of the cortical actin on the spindle orientation and positioning (Kunda & Baum, 2009; Di Pietro et al., 2016), as well as the function of the actomyosin ring on cytokinesis (Pollard & O'Shaughnessy, 2019).

      Significance

      Advance

      This study's main strength is its novel and potentially important demonstration that contraction of PANEM, a peripheral actomyosin network that operates contracts early mitosis, contributes to the timely initiation of chromosome congression, especially for polar chromosomes. While PANEM itself was previously described by this group, this manuscript provides new mechanistic evidence, improved perturbations, and detailed chromosome tracking. To my knowledge, no prior studies have mechanistically connected this contraction to polar chromosome congression in this level of detail. The work complements dominant microtubule-centric models of chromosome congression and introduces actomyosin-based forces as a cooperating system during very early mitosis. However, the impact of the study is currently limited by major organizational issues, insufficient controls, and incomplete contextualization within existing literature. Addressing these issues will substantially improve clarity and credibility. [underlined by authors]

      We have addressed or will address the underlined criticisms as detailed above.

      Audience

      Primary audience of this study will be researchers working in cell division, mitosis, cytoskeleton dynamics, and motor proteins. The findings may interest also the wider cell biology community, particularly those studying chromosome segregation fidelity, spindle mechanics, and cytoskeletal crosstalk. If validated and clarified, the concept of PANEM could be integrated into textbooks and models of chromosome congression and could inform studies on mitotic errors and cancer cell mechanics.

      Expertise

      My expertise lies in kinetochore-microtubule interactions, spindle mechanics, chromosome congression, and mitotic signaling pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this manuscript, Sheidaei et al. reported on their study of chromosome congression during the early stages of mitotic spindle assembly. Building on their previous study (ref. #15, Booth et al., Elife, 2019), they focused on the exact role of the actin-myosin-based contraction of the nuclear envelope. First, they addressed a technical issue from their previous study, finding a way to specifically impair the actomyosin contraction of the nuclear membrane without affecting the contraction of the plasma membrane. This allowed them to study the former more specifically. They then tracked individual kinetochores to reveal which were affected by nuclear membrane contraction and at what stage of displacement towards the metaphase plate. The investigation is rigorous, with all the necessary controls performed. The images are of high quality. The analyses are accurate and supported by convincing quantifications. In summary, they found that peripheral chromosomes, which are close to the nuclear membrane, are more influenced by nuclear membrane contraction than internal chromosomes. They discovered that nuclear membrane contraction primarily contributes to the initial displacement of peripheral chromosomes by moving them towards the microtubules. The microtubules then become the sole contributors to their motion towards the pole and subsequently the midplane. This step is particularly critical for the outermost chromosomes, which are located behind the spindle pole and are most likely to be missegregated.

      Significance

      While the conclusions are somewhat intuitive and could be considered incremental with regard to previous works, they are solid and improve our understanding of mitotic fidelity. The authors had already reported the overall role of nuclear membrane contraction in reducing chromosome missegregation in their previous study, as mentioned fairly and transparently in the text. However, the reason for this is now described in more detail with solid quantification. Overall, this is good-quality work which does not drastically change our understanding of chromosome congression, but contributes to improving it. Personally, I am surprised by the impact of such a small contraction (of around one micron) on the proper capture of chromosomes and wonder whether the signalling associated with the contraction has a local impact on microtubule dynamics. However, investigating this point is clearly beyond the scope of this study, which can be published as it is. [underlined by authors]

      The suggested topic (underlined) is intriguing. However, we agree with the reviewer that it is beyond the scope of this paper. The reviewer recommends publication of our manuscript as it is. So, we do not plan a revision based on this reviewer’s comments.

      Reviewer #3:

      Sheidaei et al., report how chromosomes are brought to positions that facilitate kinetochoremicrotubule interactions during mitosis. The study focusses on an important early step of the highly orchestrated chromosome segregation process. Studying kinetochore capture during early prophase is extremely difficult due to kinetochore crowding but the team has taken up the challenge by classifying the types of kinetochore movements, carefully marking kinetochore positions in early mitosis and linking these to map their fate/next-positions over time. The work is an excellent addition to the field as most of the literature has thus far focussed on tracking kinetochore in slightly later stages of mitosis. The authors show that the PANEM facilitates chromosome positioning towards the interior of the newly forming spindle, which in turn facilitates chromosome congression - in the absence of PANEM chromosomes end up in unfavourable locations, and they fail to form proper kinetochore-microtubule interactions. The work highlights the perinuclear actomyosin network in early mitosis (PANEM) as a key spatial and temporal element of chromosome congression which precedes the segregation process.

      Major points

      (1) The complexity of tracking has been managed by classifying kinetochore movements into 4 categories, considering motions towards or away from the spindle mid-plane. While this is a very creative solution in most cases, there may be some difficult phases that involve movement in both directions or no dominant direction (eg Phase3-like). It is unclear if all kinetochores go through phase1, 2, 3 and 4 in a sequential or a few deviate from this pattern. A comment on this would be helpful. Also, it may be interesting to compare those that deviate from the sequence, and ask how they recover in the presence and absence of azBB.

      To respond to this comment, we would like to first clarify how we selected kinetochores for our analysis. We selected kinetochores that can be individually tracked. If kinetochore tracking was difficult (before the start of Phase 4 in control and azBB-treated cells or before observing the extended Phase 3 in azBB-treated cells) because of kinetochore crowding, we did not choose such kinetochores. We also did not include kinetochores close to spindle poles (within 4 µm) at NEBD in our analysis for the following two reasons: First, these kinetochores often did not show clear and rapid movements towards a spindle pole, which we used to define Phase 2. Second, although we referred to kinetochore co-localization with a microtubule signal for the start of Phase 2, this was difficult for kinetochores close to spindle poles because of a high density of microtubules. As requested, we have added this comment to the Method section (page 23).

      With the above selection, all selected kinetochores without azBB treatment (control) showed the poleward motion (Phase 2) and congression (Phase 4) in this order, though their extents were varied among kinetochores. All selected kinetochores with azBB treatment also showed the poleward motion (Phase 2), and some of them showed congression (Phase 4) after Phase 2. Then, Phase 1 and Phase 3 were defined as intervals between NEBD and Phase 2 and between Phase 2 and Phase 4, respectively. If no Phase 4 was observed with azBB, we judged that Phase 3 continued till the end of tracking. We have added this comment to the Method section (page 23-24).

      (2) Would peripheral kinetochore close to poles behave differently compared to peripheral kinetochore close to the midplane (figure S4)? In figure 3D, are they separated? If not, would it look different?

      Since we did not include kinetochores close to spindle poles (at NEBD), for which it was difficult to define Phase 2 (see our response to the above major point 1), in our analysis, the suggested comparison is not feasible.

      (3) Uncongressed polar chromosomes (eg., CENPE inhibited cells) are known to promote tumbling of the spindle. In figure 5B with polar chromosomes, it will be helpful to indicate how the authors decouple spindle pole movements from individual kinetochore movements.

      In contrast to CENPE-inhibited cells, azBB-treated cells did not show much tumbling of the spindle, though both cells showed uncongressed polar chromosomes. The reason for this difference may be fewer uncongressed polar chromosomes in azBB-treated cells. There were still modest spindle motions in azBB-treated cells. However, because kinetochore motions were assessed relative to a spindle pole (and other reference points on the spindle) in our study (Figure 2A, C), the modest spindle motions were offset in our analyses of kinetochore motions. We have clarified the underlined part in the Method section (page 22).

      Minor points

      (1) It will be helpful for readers to see how many kinetochores/cell were considered in the tracking studies. Figure legends show kinetochore numbers but not cell numbers.

      As suggested, we have now mentioned the number of cells, where the kinetochore motions were analyzed, in the legends for Figures 3, 4, 5, S4 and S5.

      (4) Are all the N-CIN- lines with PANEM highly sensitive to azBB? In other words, is PANEM essential for normal congression in some of these lines.

      We checked the sensitivity of cell lines in Figure S7B to blebbistatin (the original form of azBB) on DepMap. There was no plausible difference between PANEM+ and PANEM- cell lines, although the blebbistatin sensitivity data were available only for 4 cell lines (HCT116, MCF7, U2OS and HT29) in Figure S7B. Nonetheless, because blebbistatin could kill cells by inhibiting cytokinesis, the blebbistatin sensitivity may not necessarily reflect how essential the PANEM contraction is for chromosome congression.

      (5) Are congression times delayed in lines that naturally lack PANEM?

      For example, it takes 10-20 min for HeLa cells (lacking PANEM) to complete chromosome congression after the NEBD (Bancroft et al 2025: https://doi.org/10.1242/jcs.163659). This is not significantly different from the time (8-18 min) for chromosome congression we observed in U2OS cells (forming PANEM). We assume that cells lacking PANEM have developed a compensatory mechanism for efficient chromosome congression – we have newly discussed possible compensatory mechanisms in the last paragraph of the Discussion (page 16).

      (6) Page 23 "we first identified the end of congression" how does this relate to kinetochore oscillations that move kinetochores away from the metaphase plate?

      The start of kinetochore oscillation was defined as the end of Phase 4 if we could track the kinetochore until that point. In some cases where the kinetochore became close to the midplane (< 2.5 µm), it was not possible to track it further due to kinetochore crowding around the spindle mid-plane – in such cases, the end of Phase 4 was assigned as the end of tracking. In the original manuscript, it was not clear that the end of Phase 4 was defined in the same way for both non-polar and polar kinetochores, while the start of Phase 4 was defined differently for the two groups. This was confusing in the original manuscript. We have now clarified these points in the Method section (page 23).

      (7) Are spindle pole distances (spindle sizes) different in early and late mitotic cells (4min vs 6min after NEBD) in control vs azBB-treated cells? Please comment on Figure S2E (mean distance) in the context of when phase 4 is completed. Does spindle size return to normal after congression?

      In Figure S2E, we did not observe a significant difference in the spindle-pole distance (the spindle size) between control and azBB-treated cells at any individual time points. The smallest p-value was 0.094 at 6.0 min. As suggested, we have explained this in the legend for Figure S2E.

      Significance:

      The current work builds upon their previous work, in which the authors demonstrated that an actomyosin network forms on the cytoplasmic side of the nuclear envelope during prophase. This work explains how the network facilitates chromosome capture and congression by tracking motions of individual kinetochores during early mitosis. The findings can be broadly useful for cell division and the cytoskeletal fields.

      Description of analyses that authors prefer not to carry out

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      Sheidaei and colleagues report a novel and potentially important role for an early mitotic actomyosin-based mechanism, PANEM contraction, in promoting timely congression of chromosomes located at the nuclear periphery, particularly those in polar positions. The manuscript will interest researchers studying cell division, cytoskeletal dynamics, and motor proteins. Although some data overlap with the group's prior work, the authors extend those findings by optimizing key perturbations and performing more detailed analyses of chromosome movements, which together provide a clearer mechanistic explanation. The study also builds naturally on recent ideas from other groups about how chromosome positioning influences both early and later mitotic movements.

      In its current form, however, the manuscript is not acceptable for publication. It suffers from major organizational problems, an overcrowded and confusing Results section and figures, and a lack of essential experimental controls and contextual discussion. These deficiencies make it difficult to evaluate the data and the authors' conclusions. A substantial structural revision is required to improve clarity and persuasiveness. In addition, several key control experiments and more conceptual context are needed to establish the specificity and relevance of PANEM relative to other microtubule- and actin-based mitotic mechanisms. Testing PANEM in additional cell lines or contexts would also strengthen the claim. I therefore recommend Major Revision, addressing the structural, conceptual, and experimental issues detailed below.

      Major Comments

      A. Structural overhaul and figure reorganization

      The Results section is overly dense, lacks clear structure, and includes descriptive content that belongs in the Methods. Many figure panels should be moved to Supplementary Materials. A substantial reorganization is required to transform the manuscript into a focused, "Reports"-type article.

      Move methodological and descriptive details (e.g., especially from the second Results subheading and Figure 2) to the Methods or Supplementary Materials.

      In these parts, we define four phases of kinetochore motion in early mitosis. Without such a description in the main text, readers would be confused about subsequent analyses. Figure 2 is also important to show examples of how the four phases develop. Although we respect this suggestion from the reviewer, we would like to keep these parts in the main text and main figure.

      New Figure 2: Combine current Figures 2A, 3A, 3C, 3D, 4C, 4F, and 4H to illustrate how PANEM contraction facilitates initial interactions of peripheral chromosomes with spindle microtubules which increases speed of congression initiation.

      If we were to follow this suggestion, we would lose Figure 2B, D, Figure 3B and Figure 4A, where examples of kinetochore motions are shown in images and 3D diagrams. The new Figure would mostly consist of only graphs. Without examples of images and 3D diagrams, readers would have difficulty understanding the study. Although we respect this suggestion from the reviewer, we would like to keep Figures 2, 3 and 4, as they are (except for making Figure 4I simpler; see above).

      New Figure 3: Combine current Figures 5A, 5C, 5D, 5F, 6B, 6C, and lower panels of 4H to show how PANEM contraction repositions polar chromosomes and reduces chromosome volume in early mitosis to enable rapid initiation of congression.

      If we were to follow this suggestion, we would lose Figure 5B and Figure 6A, where examples of kinetochore/chromosome dynamics are shown in images and 3D diagrams. For the same reason as above, we would like to keep Figure 5 and 6 as they are, although we respect this suggestion from the reviewer.

      New Figure 4: Combine Figures 7A, 7B, 7D, 7E, 7F, expanded Supplementary Figure S7, and new data to demonstrate that PANEM actively pushes peripheral chromosomes inward which is important for efficient chromosome congression in diverse cellular contexts.

      As suggested, we will conduct new experiments to demonstrate the role of PANEM in diverse cellular contexts, as detailed below. We will then combine the new results with Figure S7 to make the new Figure 8.

      On the other hand, in our view, combining Figure 7A-E and the extended Figure S7 would be confusing because the two parts address different topics. Although we respect this suggestion from the reviewer, we would like to keep Figure 7 and the extended Figure S7 (i.e. Figure 8) separate.

      B. Specificity and redundancy of actin perturbation

      To establish the specificity and relevance of PANEM, the authors should include or discuss appropriate controls:

      Examine higher-ploidy or binucleated cells to determine whether multiple PANEM contractions are coordinated and if PANEM contraction contributes more in cells of higher ploidies or specific nuclear morphologies.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Investigate dependency on nuclear shape or lamina stiffness; test whether PANEM force transmission requires a rigid nuclear remnant.

      This is an interesting suggestion, but it takes lots of time to conduct such a study, and it goes beyond the scope of this paper.

      Analyze PANEM's contribution under mild microtubule perturbations that are known to induce congression problems (e.g., low-dose nocodazole).

      In the current study, we found that PANEM contraction affects chromosome motions in Phase 1 and Phase 3 but not Phase 2 or Phase 4. Mild microtubule perturbation itself could affect chromosome motions in all four Phases. We do not think it would be so informative to study what additional effects the reduced PANEM contraction shows when combined with mild microtubule perturbation.

      D. Conceptual integration in Introduction and Discussion

      The manuscript should better situate its findings within the context of early mitotic chromosome movements:

      Minor Comments

      These issues are more easily addressable but will significantly improve clarity and presentation.

      Results (by subheading)

      Fourth subheading: Note that congression speed is lower for centrally located kinetochores because they achieve biorientation more rapidly (Barišić et al., 2013, Nat Cell Biol; Vukušić & Tolić, 2025, Nat Commun).

      We respect this comment. However, if biorientation were established more rapidly for centrally located kinetochores, it would advance the initiation of congression, but would not necessarily change congression speed.

    1. Reviewer #3 (Public review):

      Summary:

      In the manuscript by Shen, Yeung, and colleagues, the authors generate an improved and expanded Mosaic analysis by gRNA-induced crossing-over (MAGIC) toolkit for use in making mosaic clones in Drosophila. This is a clever method by which mitotic clones can be induced in dividing cells by using CRISPR/Cas9 to generate double-strand breaks at specific locations that induce crossing over at those locations. This is conceptually similar to previous mosaic methods in flies that utilized FRT sites that had been inserted near centromeres along with heat-shock inducible FLPase. The advantage of the MAGIC system is that it can be used along with chromosomes lacking FRT sites already introduced, such as those found in many deficiency collections or in EMS mutant lines. It may also be simpler to implement than FRT-based mosaic systems. There are two flavors of the MAGIC system: nMAGIC and pMAGIC. In nMAGIC, the main constituents are a transgene insertion that contains gRNAs that target DNA near the centromere, along with a fluorescent marker. In pMAGIC, the main constituents are a transgenic insertion that contains gRNAs that target DNA near the centromere, along with ubiquitous expression of GAL80. As such, nMAGIC can be used to generate clones that are not labelled, whereas pMAGIC (along with a GAL4 line and UAS-marker) can be used much like MARCM to positively label a clone of cells. This manuscript introduces MAGIC transgenic reagents that allow all 4 chromosomes to be targeted. They demonstrate its use in a variety of tissues, including with mutants not compatible with current FLP/FRT methods, and also show it works well in tissues that prove challenging for FLP/FRT mosaic analyses (such as motor neurons). They further demonstrate that it can be used to generate mosaic clones in non-melanogaster hybrid tissues. Overall, this work represents a valuable improvement to the MAGIC method that should promote even more widespread adoption of this powerful genetic technique.

      Strengths:

      (1) Improves the design of the gRNA-marker by updating the gRNA backbone and also the markers used. GAL80 now includes a DE region that reduces the perdurance of the protein and thus better labeling of pMAGIC clones. The data presented to demonstrate these improvements is rigorous and of high quality.

      (2) Introduces a toolkit that now covers all chromosome arms in Drosophila. In addition, the efficiency of 3 target different sites is characterized for each chromosome arm (e.g., 3 different gRNA-Marker combinations), which demonstrate differences in efficiency. This could be useful to titrate how many clones an experimenter might want (e.g., lower efficiency combinations might prove advantageous).

      (3) The manuscript is well written and easy to follow. The authors achieved their aims of creating and demonstrating MAGIC reagents suitable for mosaic analysis of any Drosophila chromosome arm.

      (4) The MAGIC method is a valuable addition to the Drosophila genetics toolkit, and the new reagents described in this manuscript should allow it to become more widely adopted.

      Comments on revised version:

      The authors have done a great job addressing reviewer concerns with the addition of updated figures, new experiments, and changes to the manuscript. I am supportive of this version and agree with the updated assessment.

    2. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate the reviewers’ constructive comments and have followed their recommendations to improve our manuscript. These improvements include additional experiments, new analyses, and a rewriting of the text. We believe these changes significantly improved the paper and hope the editor and the reviewers agree. The following is a summary of the major changes made and our point-by-point response to reviewers’ comments.

      Summary of major changes:

      (1) Expanded labeling options: We generated a new nMAGIC vector containing miRFP680 as an infrared fluorescent protein (IFP) marker. We used gRNA-40D2(IFP) to demonstrate clones labeled by this marker in the wing imaginal disc (Figure 1M). This vector is available via Addgene for the generation of new gRNA-markers with our recommended or customer-designed gRNA target sequences.

      (2) Validated Gal80 potency: We provide new data in Figure 1E demonstrating complete suppression of pxn-Gal4>CD4-tdTom by tub-GAL80-DE-SV40. The exact transgenes used in the comparisons are clarified in the figure and figure legend.

      (3) Verified clone fitness: We compared the sizes of nMAGIC twin spots in wing discs and found no intrinsic growth or viability bias between marker/marker and WT/WT clones (Figure 1O).

      (4) Methodological Schematics: We added supplemental figures to Figure 1 to illustrate the principle of MAGIC, the difference between pMAGIC and nMAGIC, and an example of pMAGIC crossing scheme.

      (5) Inducible induction: We provide new data (Figure 3J-K’) showing the induction of sparse neuronal clones in the adult brain by heat shock (hs)-Cas9.

      (6) We revised texts to incorporate all other recommendations suggested by the reviewers. We also made other small changes to the manuscript to improve its readability.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Shen et al. have improved upon the mitotic clone analysis tool MAGIC that their lab previously developed. MAGIC uses CRISPR/Cas9-mediated double-stranded breaks to induce mitotic recombination. The authors have replaced the sgRNA scaffold with a more effective scaffold to increase clone frequency. They also introduced modifications to positive and negative clonal markers to improve signal-to-noise and mark the cytoplasm of the cells instead of the nuclei. The changes result in increase in clonal frequencies and marker brightness. The authors also generated the MAGIC transgenics to target all chromosome arms and tested the clone induction efficacy.

      Strengths:

      MAGIC is a mitotic clone generation tool that works without prior recombination to special chromosomes (e.g., FRT). It can also generate mutant clones for genes for which the existing FRT lines could not be used (e.g., the genes that are between the FRT transgene and the centromere).

      This manuscript does a thorough job in describing the method and provides compelling data that support improvement over the existing method.

      Weaknesses:

      It would be beneficial to have a greater variety of clonal markers for nMAGIC. Currently, the only marker is BFP, which may clash with other genetic tools (e.g., some FRET probes) depending on the application. It would be nice to have far-red clonal markers.

      We thank the reviewer for the positive comments about our study. We agree with the reviewer that adding a far-red option for nMAGIC increases the flexibility of this method. We replaced the BFP coding sequence in the nMAGIC cloning vector pAC-U63-QtgRNA2.1-tubBFP(HA) with that of miRFP680-T2A-HO1. We then used the resulting cloning vector to make a gRNA-40D2(IFP) transgene and tested it in the wing disc. Result showing clones in the wing disc are now in Figure 1M. The new cloning vector, along with others reported in our study, are available from Addgene.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors present the latest improvement of their previously published methods, pMAGIC and nMAGIC, which can be used to engineer mosaic gene expression in wild-type animals and in a tissue-specific manner. They address the main limitation of MAGIC, the lack of gRNA-marker transgenes, which has hampered the broader adoption of MAGIC in the fly community. To do so, they create an entire toolkit of gRNA markers for every Drosophila chromosome and test them across a range of different tissues and in the context of making Drosophila species hybrid mosaic animals. The study provides a significant and broadly useful improvement compared to earlier versions, as it broadens the use-cases for transgenic manipulation with MAGIC to virtually any subfield of Drosophila cell biology.

      Strengths:

      Major improvements to MAGIC were made in terms of clone induction efficiency and usability across the Drosophila model system, including wild-type genotypes and the use in non-melanogaster species.

      Notably, mosaic mutants can now be created for genes residing on the 4th chromosome, which is exciting and possibly long-awaited by 4th chromosome gene enthusiasts.

      Selection of the standard set of gRNA markers was done thoughtfully, using non-repetitive conserved and unique sequences.

      The authors demonstrate that MAGIC can be used easily in the context of interspecific hybrids. I believe this is a great advancement for the Drosophila community, especially for evolutionary biologists, because this may allow for easy access to mechanistic, tissue-specific insight into the process of a range of hybrid incompatibilities, an important speciation process that is normally difficult to study at the level of molecular and cell biology.

      In the same way, because it is not limited to usage in any particular genetic background, genome-wide MAGIC can be potentially used in wild-type genotypes relatively easily. This is exciting, especially because natural genetic diversity is rarely investigated more mechanistically and at the scale/resolution of cells or specific tissues. Now, one can ask how a particular naturally occurring allele influences cell physiology compared to another (control) while keeping the global physiological context of the particular genetic background largely intact.

      Weaknesses:

      It is not entirely clear how functionally non-critical regions were evaluated, besides that they are selected based on conservation of sequence between species. It may be useful to directly test the difference in viability or other functionally relevant phenotype for flies carrying different markers. Similarly, the frequency of off-targets could be investigated or documented in a bit more detail, especially if one of the major use-cases is meant for naturally derived, diverse genetic backgrounds. It is, at the moment, unclear how consistently the clones are induced for each new gRNA marker across different WT genetic backgrounds, for example, a set of DGRP genotypes, which could be highly useful information for future users.

      We thank the reviewer for the positive comments about our study. The reviewer raises an excellent point regarding the consistency of clone induction and potential background effects in diverse genetic backgrounds. As a standard step in building the MAGIC kit, we tested all gRNA-marker transgenes with the Cas9-LEThAL assay (Poe et al., Genetics, 2019), in which the gRNA-marker transgene was crossed to lig4 Act5C-Cas9 homozygotes. All crosses led to viable and apparently healthy female progeny, suggesting that ubiquitously mutating the chosen gRNA targeting sites does not cause obvious defects.

      For standard mutant analysis, we recommend researchers to use a well-characterized wildtype chromosome as a negative control. For studies utilizing diverse wildtype backgrounds where a standard control chromosome is inapplicable (e.g., DGRP screens), we recommend an internal validation strategy: researchers should confirm their key phenotypic findings by inducing clones with a second, independent gRNA-marker located on the same chromosomal arm (e.g., comparing clones induced by gRNA-40D2 vs. gRNA-40D4 ). This ensures that any observed phenotypes or variations in clone induction are linked to the selected genetic background rather than an off-target artifact or target-site specific effect.

      We admit that the above approach may not resolve concerns about off-targets. Performing deep sequencing to map empirical off-targets for all 34 gRNA pairs across multiple genetic backgrounds is experimentally prohibitive for a toolkit resource. However, our in silico selection pipeline strictly required target sequences to be unique within the D. melanogaster genome to mathematically minimize off-target probability. In addition, our requirement that target sequences be conserved in closely related Drosophila species acts as a stringent filter against intraspecies variation. Sequences conserved across species are subject to purifying selection, substantially reducing the likelihood that SNPs within the DGRP lines will disrupt the PAM or seed sequences required for Cas9 induction.

      Reviewer #3 (Public review):

      Summary:

      In the manuscript by Shen, Yeung, and colleagues, the authors generate an improved and expanded Mosaic analysis by gRNA-induced crossing-over (MAGIC) toolkit for use in making mosaic clones in Drosophila. This is a clever method by which mitotic clones can be induced in dividing cells by using CRISPR/Cas9 to generate double-strand breaks at specific locations that induce crossing over at those locations. This is conceptually similar to previous mosaic methods in flies that utilized FRT sites that had been inserted near centromeres along with heat-shock inducible FLPase. The advantage of the MAGIC system is that it can be used along with chromosomes lacking FRT sites already introduced, such as those found in many deficiency collections or in EMS mutant lines. It may also be simpler to implement than FRT-based mosaic systems. There are two flavors of the MAGIC system: nMAGIC and pMAGIC. In nMAGIC, the main constituents are a transgene insertion that contains gRNAs that target DNA near the centromere, along with a fluorescent marker. In pMAGIC, the main constituents are a transgenic insertion that contains gRNAs that target DNA near the centromere, along with ubiquitous expression of GAL80. As such, nMAGIC can be used to generate clones that are not labelled, whereas pMAGIC (along with a GAL4 line and UAS-marker) can be used much like MARCM to positively label a clone of cells. This manuscript introduces MAGIC transgenic reagents that allow all 4 chromosomes to be targeted. They demonstrate its use in a variety of tissues, including with mutants not compatible with current FLP/FRT methods, and also show it works well in tissues that prove challenging for FLP/FRT mosaic analyses (such as motor neurons). They further demonstrate that it can be used to generate mosaic clones in non-melanogaster hybrid tissues. Overall, this work represents a valuable improvement to the MAGIC method that should promote even more widespread adoption of this powerful genetic technique.

      Strengths:

      (1) Improves the design of the gRNA-marker by updating the gRNA backbone and also the markers used. GAL80 now includes a DE region that reduces the perdurance of the protein and thus better labeling of pMAGIC clones. The data presented to demonstrate these improvements is rigorous and of high quality.

      (2) Introduces a toolkit that now covers all chromosome arms in Drosophila. In addition, the efficiency of 3 target different sites is characterized for each chromosome arm (e.g., 3 different gRNA-Marker combinations), which demonstrate differences in efficiency. This could be useful to titrate how many clones an experimenter might want (e.g., lower efficiency combinations might prove advantageous).

      (3) The manuscript is well written and easy to follow. The authors achieved their aims of creating and demonstrating MAGIC reagents suitable for mosaic analysis of any Drosophila chromosome arm.

      (4) The MAGIC method is a valuable addition to the Drosophila genetics toolkit, and the new reagents described in this manuscript should allow it to become more widely adopted.

      Weaknesses:

      (1) The MAGIC method might not be well known to most readers, and the manuscript could have benefited from schematics introducing the technique.

      We thank the reviewer for the positive evaluation of our study and for making this kind suggestion. We have added diagrams that explain the principle of MAGIC and the difference between pMAGIC and nMAGIC in Figure 1 - Figure Supplement 1.

      (2) Traditional mosaic analyses using the FLP/FRT system have strongly utilized heat-shock FLPase for inducible temporal control over mitotic clones, as well as a way to titrate how many clones are induced (e.g., shorter heat shocks will induce fewer clones). This has proven highly valuable, especially for developmental studies. A heat-shock Cas9 is available, and it would have been beneficial to determine the efficiency of inducing MAGIC clones using this Cas9 source.

      We thank the reviewer for suggesting this experiment. We agree that demonstrating inducible clone induction in the adult brain is an effective way for people to compare MAGIC with the MARCM method they are probably more familiar with. We used a heat shock Cas9 developed by the Tzumin Lee group (Chen et al., Development, 2020) to experiment with clone induction, and the results are shown in the new Figure 3 (K and J). We show that, with a pan-neuronal Gal4, heat shock during the wandering 3rd instar larval stage induced more clones than during the pupal stage, and the later heat shock readily produced sparsely labeled neurons whose single-cell morphology can be easily visualized.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The following are some consolidated review remarks after discussions amongst all three reviewers:

      The reviewers feel the evidence level could be raised from 'convincing' to 'compelling' if the following key (and partially shared) suggestions by the reviewers are followed adequately:

      (1) Expand labeling options for nMAGIC, which is currently just a BFP marker. This would increase the utility of the method. A far-red marker would be very helpful. Could the authors just do this for one chromosome arm and make the reagent available for others to generate other chromosome arms?

      We agree with the editor and reviewers that adding a far-red option for nMAGIC increases the flexibility of this method. We replaced the BFP coding sequence in the nMAGIC cloning vector pAC-U63-QtgRNA2.1-tubBFP(HA) with that of miRFP680-T2A-HO1. We then used the resulting cloning vector to make a gRNA-40D2(IFP) transgene and tested it in the wing disc. Result showing clones in the wing disc are now in Figure 1M. The new cloning vector, along with others reported in our study, will be available from Addgene.

      (2) Verify that destabilized GAL80 is potent enough to suppress GAL4. Repeat Figure 1C-E with tub-GAL80-DE-SV40.

      We replaced the experiment using gRNA-42A4-tDES, which successfully achieved complete suppression of pxn>CD4-tdTom (Figure 1E).

      (3) Concern about the health of the induced mitotic clones. This is an important consideration, but the reviewers were not sure what the necessary experiments would be. To gauge twin-spot clone sizes? Please address.

      We agree that clone fitness is an important consideration for MAGIC experiments. To test it, we generated WT clones in the wing imaginal disc using nMAGIC and quantified the sizes of the twin spots (BFP/BFP and WT/WT clones). Our results show that there is no statistical difference between these two types of clones. Thus, there is no intrinsic growth disadvantage to either type of mitotic clones generated by MAGIC.

      (4) Include a schematic of the MAGIC method as Figure 1 or add it to Figure 1. Many may not be familiar with the method, so to promote its adoption, the authors should clearly introduce the MAGIC method in this paper (and not rely on readers to go to previous publications). For this paper to become a MAGIC reference paper, it should be self-contained.

      We thank the reviewers for this suggestion. We have added diagrams that explain the principle of MAGIC and the difference between pMAGIC and nMAGIC in Figure 1 - Figure Supplement 1.

      (5) Determine the utility of using a hs-Cas9 line for temporal induction of MAGIC clones. This is a traditional method for mitotic clone induction (with hsFLP/FRTs), and its use with the MAGIC system (especially pMAGIC) could also make it more attractive, especially to label small populations of neurons born at known times. To this point, the authors could generate pMAGIC clones using hs-Cas9 for commonly used adult target neurons, such as projection neurons, central complex neurons, or mushroom body neurons. The method to label small numbers of these adult neurons is well worked out with known GAL4 lines, and demonstrating that pMAGIC could have similar results would capture the attention of many not familiar with the pMAGIC method.

      We agree that demonstrating inducible clone induction in the adult brain is an effective way for people to compare MAGIC with the MARCM method they are probably more familiar with. We used a heat shock Cas9 developed by the Tzumin Lee group (GarciaMarques, Espinosa-Medina et al. 2020) to experiment with clone induction, and the results are shown in the new Figure 3 (J-K’). We show that, with a pan-neuronal Gal4, heat shock during wandering 3rd instar larval stage induced more clones than during the pupal stage, and the later heat shock readily produced sparsely labeled neurons whose single-cell morphology can be easily visualized.

      Reviewer #1 (Recommendations for the authors):

      This is a marked improvement over the existing methods that the authors' lab has previously generated. It will be a nice addition to the Drosophila genetic tool kit after minor revisions.

      We appreciate the reviewer’s recognition of the new tools we developed.

      Minor issues:

      (1) In the data in Figures 1G and H, it is not ideal to compare the effect of different modifications on two different transgenes. uH and uDEH are compared in gRNA-40D2, whereas uDEH, tDEH, and tDES are compared in gRNA-42A4. If the transgenics are already available, it would be better to compare the uH, uDEH, tDEH, and tDES on either gRNA-40D2 or gRNA-42A4.

      We appreciate the reviewer’s concern. These transgenes were developed during different phases of this project. We first adopted the uDEH design during improvement of gRNA40D2, which solved both the leaky activity of pxn-Gal4 and dim epidermal clones. However, when we tried to expand this design to 2R (such as 42A4), we found that the clones were still too dim (probably due to positional effects). Thus, we next used uDEH in gRNA-42A4 as a base for further improvements. We did not make a uH version for gRNA-42A4 because we already knew that it is inferior to uDEH. Because of this history, we did not have the full set for gRNA42A4.

      Despite the lack of uH for gRNA-42A4, we believe our comparisons of different designs are still valid, given that uH and uDEH were compared with identical sequences elsewhere in the transgenic vector (including the gRNA target sequence) and in the identical insertion site.

      (2) It is not clear whether the authors tested destabilized Gal80 is potent to suppress Gal4 (e.g., in suppressing pxn>CD4-tdTom in hemocytes). The results in Figure 1C-E should be repeated with tub-Gal80-DE-SV40.

      We apologize for omitting the transgene identities in these experiments. We have redone the experiment using gRNA-42A4-tDES and updated the figures to clearly indicate which transgenes were used.

      (3) The difference in sgRNA scaffolds can be better explained in the text. The explanation here is very bare bones and reads like jargon. (i.e., changing F+E gRNA scaffold with gRNA2.1 scaffold is not a sufficient explanation).

      We have added more explanations to the differences between the scaffolds as suggested.

      (4) The stocks should be sent to Bloomington Stock Center to ensure widespread adoption of the method. This includes the Cas9 lines that are generated and used.

      It is our plan to freely share the reagents developed in this study with the community. Most of the fly lines are already available at Bloomington (https://bdsc.indiana.edu/stocks/misc/magic.html and https://bdsc.indiana.edu/stocks/genome_editing/crispr_cas9.html). We are in the process of depositing the remaining ones to BDSC.

      In conclusion, this is a nicely written manuscript that improves currently available tools and should be of interest to the readership of this journal.

      Reviewer #2 (Recommendations for the authors):

      Typos spotted:

      Line 163 issues -> tissues

      Line 613 significance -> significant

      We thank the reviewer for catching these typos. We have corrected them.

      Reviewer #3 (Recommendations for the authors):

      This is a welcome update to the MAGIC system, which is a brilliant method that has not been as widely adopted as it should be. The authors validate and introduce updates to this system to increase clonal efficiency and more robust labeling (for both pMAGIC and nMAGIC). The data presented are robust and convincing.

      We appreciate the reviewer’s positive comments about our study.

      Suggestions to improve the presentation and adoption of this work:

      (1) The MAGIC system might not be well known, and the manuscript would have benefited from an introductory schematic of how the system works. I realize this was already done in the PLoS Biology paper, but the authors should not assume readers will know that paper, or be willing to look it up. So a standalone schematic, as Figure 1, or something added to Figure 1, would greatly aid in understanding how this system works and what the new updates are doing.

      We thank the reviewer for this kind suggestion. We have added diagrams that explain the principle of MAGIC and the difference between pMAGIC and nMAGIC in Figure 1 - figure supplement 1.

      (2) There were many instances where abbreviations were not clearly defined, especially in the Figures and Figure legends. The main text is well-written, and while the information is in there, it is beneficial when the Figures and Figure legends can stand alone. For example:

      (a) Figure 1. DE, not defined in the Figure or Figure legend.

      (b) Figure 1. 'p' and 'n' not defined in the Figure legend.

      (c) The different Cas9 lines or GAL4 lines used-a brief description of their expression patterns might be helpful in the legend. E.g., zk-Cas9, vas-Cas9, gcm-Cas9, R38F11-GAL4, RabX4Gal4.

      We apologize for omitting the details mentioned. They have been added to the figures and figure legends.

      (3) "Traditional" mosaic analyses took advantage of hsFLP for inducible induction and to control the number of mitotic clones that were induced. A hs-Cas9 line does exist (as correctly pointed out by the authors), and it would be a valuable addition if the authors tested the utility of this reagent with the MAGIC system. Many possible adopters may not like the idea that an alwayson Cas9 line is used, which could result in too many clones, especially if one wanted to label very few cells. Granted, one could use a 'worse' gRNA-Marker line as mentioned in the manuscript, but this might still be hard to titrate, as well as an inducible system that uses a heatshock promoter. A hs promoter is especially useful for birthdating cells during development.

      We thank the reviewer for suggesting this experiment. We agree that demonstrating inducible clone induction in the adult brain is an effective way for people to compare MAGIC with the MARCM method they are probably more familiar with. We used a heat shock Cas9 developed by the Tzumin Lee group (Chen et al., Development, 2020) to experiment with clone induction, and the results are shown in the new Figure 3 (K and J). We show that, with a panneuronal Gal4, heat shock during wandering 3rd instar larval stage induced more clones than during the pupal stage, and the later heat shock readily produced sparsely labeled neurons whose single-cell morphology can be easily visualized.

      (4) Lines 61-63. "However, most of these mutant chromosomes cannot be analyzed by traditional mosaic techniques due to the lack of FRT sites or incompatibility with the FRT/Flp system." It might also be worth mentioning that recombining existing reagents (e.g., mutants, etc) onto an FRT chromosome can be labor and time-intensive. A brilliant advantage of MAGIC is that it can be used with any existing stock, such as from classical EMS mutant screens, Df screens (as pointed out), etc. So the more the authors can emphasize a new way of thinking (e.g, you don't need to recombine your mutant of interest onto an FRT stock before you can get started), the better!

      We thank the reviewer for this kind suggestion. As suggested, we have expanded our introduction and discussion to emphasize the advantages of the MAGIC system over traditional mosaic techniques.

      (5) One incredible advantage of the MAGIC system is that it can direct where recombination occurs. So if one had two mutations on a chromosome arm, it could be possible to make the most distal homozygous mutant while the other remains heterozygous. This is not possible with current FRT-based methods. It's not necessary to demonstrate this, but perhaps the authors could mention it as a possible next step? This was somewhat implied by lines 66-67 "In comparison, MAGIC can potentially be used to study these genes because the crossover site in MAGIC can be flexibly defined by users".

      Again, we thank the reviewer for this nice suggestion. We have added this point to the discussion.

      (6) How stable are the MAGIC lines? If gRNA (with Cas9 expressed) induced a germline mutation of the target site, the MAGIC line would break down. How often is this observed? Some mention of this would be appreciated, especially to end users, if caution is necessary and gRNA-marker stocks should not be maintained in the same flies as an x-Cas9 line.

      The reviewer made a very important point. Keeping gRNA and Cas9 in the same strain will risk mutating the target sequence in the germline, if the Cas9 has any activity in the germline. Thus, it is not recommended to keep gRNA and Cas9 in the same flies over multiple generations. For MAGIC experiments, this concern is lessened because by crossing gRNA + Cas9 flies to another strain containing the chromosome of interest, clones can still be induced (possibly with less efficiency) because the chromosome of interest is still cuttable by Cas9. Nevertheless, to address this concern, we have recently developed anti-CRISPR tools to suppress Cas9 activity in such strains. These tools will be reported in a separate study.

      In the revised manuscript, we added this point in Discussion to caution users.

      (7) Line 157, "identify efficient gRNAs for every chromosomal arm.". What is considered "efficient"? Is this quantifiable? Eg., >= 10 clones.

      Thanks for pointing this out! “Efficient” is an arbitrary evaluation, as different experiments may require different efficiencies. But operationally, we consider any gRNA that can generate >= 10 neuronal clones per larva as being efficient. We have clarified it in the text.

      (8) Line 163, "highly packed _issues_ such as the brain"; spelling, should be "tissues"

      Thanks for catching this typo. It has been corrected.

      (9) The authors use ey-Cas9 for their demonstration of adult brain labeling. Additional adult brain examples would increase exposure of this method and attract wider attention- targeting structures that have been well characterized, such as projection neurons (GH146-GAL4), central complex, mushroom bodies, etc. Especially if hs-Cas9 could be utilized to mimic previous MARCM clones (for example).

      We thank the reviewer for suggesting heat shock-induced clones in the adult brain. We have conducted the experiment as explained above and shown in Figure 3J-3K’. We showed a single neuronal clone that resembles lateral horn Leucokinin neurons.

      (10) Line 216, "Despite these advances, existing mutations on FRT-lacking 4th chromosomes still cannot be analyzed by the FRT/Flp system." For context, it might be worth pointing out that meiotic recombination is exceedingly rare on the 4th chromosome, which means it is practically impossible to recombine existing 4th chromosome mutations onto an FRT chromosome.

      We thank the reviewer for this kind suggestion. We have added a note about the difficulty of recombining FRT onto the 4th chromosome.

      (11) Figure 2 legend. What is the full genotype for D and E? eg, what is RabX4>MApHS?

      We apologize for being brief with the details. RabX4-Gal4 is a pan-neuronal driver. UAS-MApHS is a membrane fluorescent marker (UAS-pHluorin-CD4-tdTom). The genotypes have been added to the figure legend.

      (12) It would be good to include the Bloomington Stock numbers for the MAGIC toolkit, especially in Table 1. And include an HTML reference to their MAGIC page at Bloomington

      (https://bdsc.indiana.edu/stocks/misc/magic.html).

      Thank you for this suggestion! We have done as suggested.

      (13) Similarly, the key plasmids to create the improved gRNA-marker insertions should be deposited to Addgene (or similar repository) and their ID numbers included in the resources table.

      The plasmids have been deposited to Addgene and are currently being validated.

      (14) The authors might consider including (perhaps as supplementary to Figure 1 or Figure 2) a crossing scheme for one of their MAGIC experiments. This will make it even clearer how a MAGIC experiment could be set up using existing fly reagents.

      This is a good suggestion! We have added an example crossing scheme in Figure 1 – figure supplement 1C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript "Synaptotagmin 1 and Synaptotagmin 7 promote MR1-mediated presentation of Mycobacterium tuberculosis antigens", authored by Kim et al., showed that the calcium-sensing trafficking proteins Synaptotagmin (Syt) 1 and Syt7 specifically promote (are critical for) MAIT cell activation in response to Mtb-infected bronchial epithelial cell line BEAS-2B (Fig. 1) and monocyte-like cell line THP-1 (Figure 3) . This work also showed co-localization of Syt1 and Syt7 with Rab7a and Lamp1, but not with Rab5a (Figure 5). Loss of Syt1 and Syt7 resulted in a larger area of MR1 vesicles (Figure 6f) and an increased number of MR1 vesicles in close proximity to an Auxotrophic Mtb-containing vacuoles during infection (Figure 7ab). Moreover, flow organellometry was used to separate phagosomes from other subcellular fractions and identify enrichment of auxotrophic Mtb-containing vacuoles in fractions 42-50, which were enriched with Lamp1+ vacuoles or phagosomes (Figures 7e-f).

      Strengths:

      This work nicely associated Syt1 and Syt7 with late endocytic compartments and Mtb+ vacuoles. Gene editing of Syt1 and Syt7 loci of bronchial epithelial and monocyte-like cells supported Syt1 and Syt7 facilitated maintaining a normal level of antigen presentation for MAIT cell activation in Mtb infection. Imaging analyses further supported that Syt1 and Syt7 mutants enhanced the overlaps of MR1 with Mtb fluorescence, and the MR1 proximity with Mtb-infected vacuoles, suggesting that Syt1 and Syt7 proteins help antigen presentation in Mtb infection for MAIT activation.

      Weaknesses:

      Additional data are needed to support the conclusion, "identify a novel pathway in which Syt1 and Syt7 facilitate the translocation of MR1 from Mtb-containing vacuoles" and some pieces of other evidence may be seen by some to contradict this conclusion.

      We thank the reviewer for their positive and constructive comments. Because MR1 presents small molecule metabolites, specifically identifying MR1 molecules loaded with antigens derived from intracellular Mtb infection remains a significant technical challenge. Therefore, we agree that some of our approaches measure antigen-loaded MR1 indirectly. For example, IFN-γ release from a MAIT cell clone serves as a sensitive surrogate readout for the presence of antigen-loaded MR1 at the cell surface. This has been demonstrated in previous work showing that IFN-γ release from MAIT cells correlated with loaded MR1 molecules measured using flow cytometry and a TCR based tetramer (Kulicke et al., 2024). In this context, Syt1 and Syt7 represent the first endosomal trafficking proteins we have identified that play a specific role in MR1-mediated presentation of Mtb-derived metabolites. Syt1 and Syt7 do not contribute to the presentation of an exogenously delivered MR1 ligands, such as Ac-6-FP loaded in the ER or M. smegmatis supernatant. In Syt1 and Syt7 knockout cells expressing MR1-GFP, larger MR1 vesicles are observed, but MR1 continues to co-localize with LAMP1 similar to wildtype cells. Furthermore, Syt1 and Syt7 knockout cells exhibit an increased number of MR1 vesicles near the Mtb-containing vacuoles compared to wildtype cells. To increase the statistical power of our microscopy analyses, we have analyzed additional cells. Although the absolute magnitude of the observed effects is modest, T cell activation is highly sensitive to the number of loaded antigen presenting molecules at the cell surface. Also, a complementary approach using flow organellometry confirmed increased MR1 expression within Mtb<sup>+</sup>LAMP1<sup>+</sup> vesicles in Syt7 knockout cells. Thus, these findings suggest a mechanism whereby Syt1 and Syt7 facilitate the trafficking of loaded MR1 molecules from the Mtb-containing vacuoles to the plasma membrane. This specialized mechanism may be analogous to the previously described role of Syt7 in MHC class II trafficking (Becker et al., 2009). In our model, we observed increased accumulation and expression of MR1 within Mtb-containing vacuoles in Syt7 knockout cells.

      Reviewer #2 (Public review):

      Summary:

      The study demonstrates that calcium-sensing trafficking proteins Synaptotagmin (Syt) 1 and Syt7 are involved in the efficient presentation of mycobacterial antigens by MR1 during M. tuberculosis infection. This is achieved by creating antigen-presenting cells in which the Syt1 and Syt7 genes are knocked out. These mutated cell lines show significantly reduced stimulation of MAIT cells, while their stimulation of HLA class I-restricted T cells remains unchanged. Syt1 and Syt7 co-localize in a late endo-lysosomal compartment where MR1 molecules are also located, near M. tuberculosis-containing vacuoles.

      Strengths:

      This work uncovers a new aspect of how mycobacterial antigens generated during infection are presented. The finding that Syt1 and Syt7 are relevant for final MR1 surface expression and presentation to MR1-restricted T cells is novel and adds valuable information to this process. The experiments include all necessary controls and convincingly validate the role of Syt1 and Syt7. Another key point is that these proteins are essential during infection, but they are not significant when an exogenous synthetic antigen is used in the experiments. This emphasizes the importance of studying infection as a physiological context for antigen presentation to MAIT cells. An additional relevant aspect is that the study reveals the existence of different MR1 antigen presentation pathways, which differ from the endoplasmic reticulum or endosomal pathways that are typical for MHC-presented peptides.

      Weaknesses:

      The reduced MAIT cell response observed with Syt1 and Syt7-deficient cell lines is statistically significant but not completely abolished. This may suggest that only some MR1-loaded molecules depend on these two Syt proteins. Further research is needed to determine whether, during persistent M. tuberculosis infection, enough MR1-loaded molecules are produced and transported to the plasma membrane to sufficiently stimulate MAIT cells. The study proposes that other Syt proteins might also play a role, as outlined by the authors. However, exploring potential redundant mechanisms that facilitate MR1 loading with antigens remains a challenging task.

      We appreciate the reviewer’s comments and feedback. Syt1 and Syt7 knockout cells do not completely abolish MR1-mediated presentation of Mtb-derived metabolites. We agree that the likely explanation is that there are redundancies within the antigen presentation pathways. Whether these redundancies are due to other endosomal trafficking proteins or other intracellular compartments where MR1 loading can occur remains unknown. Moreover, Mtb-derived antigens can access the ER, where Syt1 and Syt7 are not involved, thereby enabling an ER-mediated pathway for MR1 antigen presentation. It is also important to note that relatively few (<10) loaded MHC class I molecules are sufficient to trigger T cell activation (Brower et al., 1994; Sykulev et al., 1995; Sykulev et al., 1996). A major challenge in exploring these mechanisms is due to the inability to directly track small molecule Mtb-derived antigens as they are loaded onto MR1 and presented at the cell surface. These hurdles are briefly outlined in the discussion as future directions. Nonetheless, Syt1 and Syt7 are the first endosomal trafficking proteins identified to have a specific effect on MR1-mediated presentation of Mtb-derived antigens.

      Reviewer #3 (Public review):

      Summary:

      In the submitted manuscript, the authors investigate the role of Synaptotagmins (Syt1) and (Syt7) in MR1 presentation of MtB.

      Strengths:

      In the first series of experiments, the authors determined that knocking down Syt1 and Sy7 in antigenpresenting cells decreases IFN-γ production following cellular infection with Mtb. These experiments are well performed and controlled.

      Weaknesses:

      Next, they aim to mechanistically investigate how Syt1 and Syt7 affect MtB presentation. In particular, they focus on MR1, a non-classical MHC-I molecule known to present endogenous and exogenous metabolites, including MtB metabolites. Results from these next series of experiments are less clear. Firstly, they show that knocking down Syt1 and Sy7 does not change MtB phagocytosis as well as MR1 ER-plasma membrane translocation. Based on this, they suggest that Syt1 and Syt7 may affect MR1 trafficking in endosomal compartments. However, neither subcellular compartment analysis nor flow organelleometry clearly establishes the role of Syt1 and Syt7 in MtB trafficking. Altogether, the notion that Synaptotagmins facilitate MR1 interaction with Mtb-containing compartments and its vesicular transport was already known. As such, the manuscript should add additional insight on where/how the interaction occurs. The reviewer is left with the notion that Syt1 and Sy7 may affect MR1 presentation, facilitating the trafficking of MR1 vesicles from endosomal compartments to either the cell surface or other endosomal compartments. The analysis is observational and additional data or discussion could address what the insight gained beyond what is already known from the literature.

      We thank Reviewer 3 for their comments. Our hypothesis is that Syt1 and Syt7 mediate MR1 trafficking rather than Mtb trafficking. While Syt7 has previously been implicated in MHC class II trafficking and vesicular transport, this study is the first to explore in detail the roles of Syt1 and Syt7 in MR1-mediated presentation of Mtb-derived metabolites. Since current technologies do not allow direct tracking of Mtbderived antigens loaded onto MR1, we relied on complementary approaches including IFN-γ release from MAIT cells, flow cytometry, fluorescence microscopy, and flow organelleometry. Both flow organelleometry and fluorescence microscopy show increased MR1 expression at Mtb-containing vacuoles in Syt7 knockout cells. Since total MR1 expression measured by flow cytometry and the overall number of MR1 vesicles remain unchanged, these data support a mechanism in which Syt7 facilitates the trafficking of antigen-loaded MR1 from Mtb-containing vacuoles to the cell surface, consistent with the observed reduction in MAIT cell IFN-γ release.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Concern 1, the data in the current manuscript have not been sufficient to "identify a novel pathway in which Syt1 and Syt7 facilitate the translocation of MR1 from Mtb-containing vacuoles, potentially to the cell surface for antigen presentation" (Last part of Abstract). To conclude this, additional pieces of data are needed: (a) Mtb-containing vacuoles associate with MR1 protein expression; (b) MR1+ vesicles traffic from one subcellular location to another; (c) Syt1 or Syt7 KO reduces MR1 vesicles at a downstream subcellular location, e.g., the cell surface. Important evidence supporting the "facilitation of translocation" is missing on whether Syt1 or Syt7 KO reduces MR1 vesicle traffic from one location to another.

      We thank the reviewer for their detailed suggestions to improve our proposed model. We would like to clarify that Figure 7g demonstrates increased MR1 protein expression in Syt7 knockout cells, as assessed by flow organellometry. This approach allowed us to specifically distinguish AuxMtb<sup>+</sup>LAMP1<sup>+</sup> compartments (Mtb-containing vacuoles) and to quantify MR1 expression using geometric mean fluorescence intensity. Moreover, in both Syt1 and Syt7 knockout cells, MR1+ vesicles are retained within lysosomal compartments, characterized by vesicle enlargement and accumulation. Therefore, we did not observe trafficking of MR1+ vesicles to other subcellular locations or to the plasma membrane. A key limitation, however, is the lack of current technologies that allow direct measurement of MR1 surface expression specifically during intracellular Mtb infection via flow cytometry. Given this limitation, IFN-γ ELISpot is a sensitive surrogate and supports the conclusion that loss of Syt1 and Syt7 results in decreased MR1 presentation of Mtb-derived antigens at the plasma membrane.

      The results "a significant increase in the number of MR1 vesicles within 1 μm of AuxMtb for Syt1 (1.13 {plus minus} 0.46) and Syt7 KO (1.31 {plus minus} 0.46) cells compared to WT cells (Fig.7b)." and "the surface of MR1 vesicles in Syt1 and Syt7 KO cells showed a 3-fold increase in overlap area with Mtb surfaces (Fig.7d)." may need to be further elaborated on whether MR1+vacuoles and Mtb+ vacuoles are overlapped or are adjacent. Figure 7b shows several groups of vacuoles with the same distance. This needs a larger sample size to randomize this distance measurement, for example, calculating 50~100 Mtb+ vacuoles.

      We appreciate the reviewer’s critical comments and suggestions. To quantify distance and surface overlap, the microscopy images were acquired from a single optical plane rather than full z-stacks. As a result, it is not possible to definitively determine whether MR1+ vesicles and Mtb-containing vacuoles are directly overlapping or adjacent. In response to the reviewer’s suggestion, we increased the sample size for both distance (n=51-53) and surface overlap analyses (n=51-53). Using the larger sample size, we observed a significant increase in the number of MR1 vesicles located within 1μm of AuxMtb in both Syt1 (1.23±0.21) and Syt7 knockout (1.28±0.22) cells. Also, there was an approximately 4-fold increase in MR1-Mtb surface overlap area compared to wildtype cells.

      Results from "performed flow organellometry to separate phagosomes from other subcellular fractions and identified enrichment of Mtb-containing vacuoles in fractions 42-50 (Fig.7e-f)" could not distinguish the difference between WT and Syt1/Syt7 KO, or further support the role of Syt1/Syt7 in endocytic trafficking. More specifically, authors claimed that "enhanced MR1 expression in Mtb+LAMP1+ compartments via flow organellometry in Syt1 and Syt7 KO cells.", may not be supported by Figure 7f, which does not show a difference in MR1 expression between Syt1 KO or Syt7 KO and WT.

      We appreciate the reviewer’s concerns and would like to clarify the interpretation of Figures 7f and 7g. Figure 7f demonstrates: (a) enrichment Mtb-containing vacuoles within fractions 42-50, (b) coenrichment of LAMP1+ vesicles within these Mtb-containing fractions, and (c) comparable subcellular fractionation profiles across wildtype, Syt1 knockout, and Syt7 knockout cells, indicating no major differences in fraction distribution. Differences in MR1 expression are shown in Figure 7g, which compares MR1 expression as the geometric mean fluorescence intensity within the fraction exhibiting the highest percentage of AuxMtb<sup>+</sup>LAMP1<sup>+</sup> across all fractions. We observed significant increase in MR1 expression in Syt7 knockout cells compared to wildtype cells.

      Concern 2, in abstract, "Loss of Syt1 and Syt7 results in enlarged MR1 vesicles and an increased number of MR1 vesicles in close proximity to Mtb-containing vacuoles during infection.". Although numbers of MR1 vesicles within 1um of Mtb increase (Figure 7b) and areas of MR1+ vacuoles for WT and KO cells enhance (Figure 6f), but numbers of MR1 vesicles/cells are not different between WT and Syt1 and Sy7 KO (Fig. 7c). These imaging analyses, including other figure panels, need more explicit presentation of (most if not all) random images for calculation, annotation of MR1-vacuoles for calculation, and raw statistical data for mean and p value calculation. These raw data can be presented in supplemental figure panels.

      We thank the reviewer for these suggestions. We have included more details on randomization, technical procedures, and statistical analyses in methods section for “Fluorescence Microscopy,” “Image Analysis,” and “Statistical Analysis.” Raw data collection and statistical data are presented in the supplemental data.

      Concern 3, additional evidence that does not support the conclusion "This study identifies a novel pathway in which Syt1 and Syt7 facilitate the translocation of MR1 from Mtb-containing vacuoles" (the last part of Abstract). This additional unsupportive evidence includes: (a) MR1 expression on the cell surface is not impacted or not different among WT, Syt1 KO, and Syt7 KO of BEAS-2B cells (Fig. 6d). (b) "Live-cell imaging showed no differences in MR1 cellular distribution in the presence or absence of Ac-6FP between WT, Syt1, and Syt7 KO BEAS-2B:TET-MR1GFP cells as MR1 translocated from the ER and vesicles to the cell surface as expected (Figure 6c).

      We thank the reviewer for this comment and would like to clarify our use of Ac-6-FP. Figures 6c and 6d examine MR1 cellular distribution and surface expression in the presence or absence of Ac-6-FP. Ac-6-FP is a small MR1 ligand that is loaded in the ER and promotes MR1 surface stabilization and trafficking to the cell membrane. In contrast, Mtb primarily resides within membrane-bound phagosomes. MR1 presentations of soluble/exogenously delivered ligands versus intracellular Mtb-derived antigens have shown to involve distinct pathways and endosomal trafficking proteins (Harriff et al., 2016; Karamooz et al., 2019; Karamooz et al., 2025). Findings from Figures 6c and 6d show that Syt1 and Syt7 do not contribute to the presentation of small soluble and ER-loaded ligands such as Ac-6-FP. Instead, they specifically contribute in MR1 presentation of Mtb-derived metabolites by translocating MR1 from Mtbcontaining vacuoles in the context of intracellular Mtb infection

      Other concerns:

      (1) Figure 1a uses Ct value to measure Syt1 and Syt7 expression levels, but a comparison with GAPDH Ct cycle numbers in different cell types will be helpful for understanding.

      We appreciate the reviewer’s suggestion of including GADPH Ct cycle numbers. We have revised Figure 1a to show Ct values for Syt1, Syt7, and GAPDH in both BEAS-2B and THP-1 cells.

      (2) Figure 1b indel, shown with an ICE method, should be confirmed with protein expression levels to interpret functional results.

      We thank the reviewer for raising this concern. We attempted to assess protein levels by western blot using multiple antibodies from both Abcam and Synaptic Systems. However, we were unable to identify a suitable antibody that reliably detected endogenous Syt1 or Syt7 protein levels.

      (3) Figure 1c. HLA-B45-restricted T cell clones also show some marginal reduction of IFN-γ spot responses and are more different in Figure 6b. Please discuss this conflicting data. Also, need a reference to support whether the exogenous CFP peptide antigen is presented via surface or endocytic antigen loading.

      We agree with the reviewer that there are some marginal reductions of IFN-γ responses for HLA-B45restricted T cell clones. Since T cell clones are used from frozen, there can be differences in maximal responses between T cell clones and expansions of the same T cell clone. However, the comparisons include a control arm and pool data from multiple experiments to reach statistical power and validity. In addition, Figure 6b shows Syt1 and Syt7 KO cells in the background of BEAS-2B MR1KO:tetMR1-GFP clone D4 cells, which overexpresses MR1 that may contribute to variability and potentially account for the observed differences. With respect to exogenous CFP peptide loading, earlier studies on peptides and antigen presenting cells demonstrated that peptides can be loaded onto fixed cells and subsequently presented to T cells (Shimonkevitz et al., 1983; Watts et al., 1985). Based on these findings, it is reasonable to assume that substantial peptide exchange occurs at the cell surface when exogenous peptides are added to antigen presenting cells.

      (4) Figure 2e: Delta CT values of Syt1, Syt7 in WT, KO cells can be shown together with Ct values of GAPDH or B2m house-keeping genes to help readers determine the efficiency of Syt1 and 7 mutation at the gene expression level. Also, in Figure 4a, the baseline of Ct values for GAPDH can be plotted together.

      As suggested by the reviewer, we have revised Figure 2e and 4a to include CT values for the genes of interest as well as housekeeping gene GAPDH.

      (5) Figure 3c and Figure 1d: M.smeg infection can be shown to be more comparable with Mtb infection.

      We thank the reviewer for this thoughtful comment. Although M. smegmatis infection could serve as a comparable control, M. smegmatis secretes large amounts of MR1 ligands derived from riboflavin metabolism. This makes it difficult to distinguish between extracellular and intracellular antigens, and to directly compare with Mtb infection, which is specifically an intracellular infection model.

      (6) Figure 4e: It appears Esyt2 Knockdown shows strong inhibition of MAIT activation mediated by BEAS2B cells with Mtb infection and M.smeg supernatant stimulation. Please add other relevant data, such as MR1 cell surface expression and colocalization, and discuss these results with Syt proteins.

      We appreciate the reviewer’s suggestion to include relevant data for Esyt2 knockdown. We performed flow cytometry analysis of Esyt2 knockdown cells and found surface MR1 expression under basal conditions. Treatment with Ac-6-FP resulted in increased MR1 surface stabilization, but MR1 surface level was significantly lower than those observed in missense control cells. Therefore, Esyt2 is not specific to MR1 presentation of Mtb-derived metabolites and instead may play a broader role in overall MR1 antigen presentation, including intracellular Mtb-derived antigens, exogenous antigens, and ER-loaded Ac-6-FP.

      (7) Figure 5 colocalization computational analyses can be more explicitly presented regarding randomization, technical procedures, and statistical analyses, as stated in Concern 2.

      As suggested, we have included more details in methods section and added the supplemental data.

      (8) Figure 6a: Syt1 and Syt7 protein expressions are also suggested to confirm the mutation, similar to the confirmation for Figures 1 and 3.

      We thank the reviewer for raising this concern. As discussed previously, we have not identified a suitable antibody for human Syt1 and Syt7. We have tested multiple antibodies from Abcam and Synaptic Systems.

      (9) For statistical analyses, "non-linear regression analysis comparing best-fit values of top and EC50 were used to calculate p-values by extra sum-of-squares F test" (Figure 6b) and "non-linear regression analysis of pairwise comparison to WT on best-fit values of top and EC50 were used to calculate p-values by extra sum-of-squares F test." (Figure 3bc), readers may need more specific demonstration in supplemental figures on how statistical analyses have been performed.

      We appreciate the reviewer’s suggestion to include more detailed information regarding the statistical analyses. For clarification, data presented in Figures 6b and 3bc were analyzed using the same statistical analysis in Prism 10. Specifically, nonlinear regression (curve fit) was performed using the [Agonist] vs. response model with three parameters. Best-fit values for the top and EC<sub>50</sub> parameters were compared using an extra sum-of-squares F test.No constraints were applied to the bottom and top parameters, and the EC<sub>50</sub> parameter was constrained to be greater than 0 for p-value calculation. We have revised the Statistical Analysis section of the Methods to more clearly describe this approach.

      (10) In discussion, the background section for Syt1 and Syt7 is more appropriate to be in the introduction. This will allow readers to better understand the association of Syt proteins with MR1 and the necessity to study the impact of Syt on MR1 trafficking.

      We thank the reviewer for this suggestion. We believe that the basic background and relevance of Syt1 and Syt7 in MR1 trafficking are covered in the introduction; however, we have added details to help readers understand their impact.

      Reviewer #2 (Recommendations for the authors):

      This reviewer has no requests for implementation and congratulates the authors on this nice piece of work.

      We thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations for the authors):

      Complete trafficking experiments to pinpoint the trafficking relationship between Syt 1 and 7 and MR1 in MtB infection.

      We appreciate the reviewer’s insightful comment. As this study represents the first detailed investigation into the roles of Syt1 and Syt7 in MR1-mediated presentation of Mtb-derived metabolites, we agree that a fully resolved trafficking mechanism has not yet been established. A major limitation is the inability to directly track Mtb-derived antigens as they are loaded onto MR1 and trafficked to the cell surface. Therefore, we relied on complementary functional and microscopy-based approaches, including IFN-γ ELISpot assays, flow cytometry, fluorescence microscopy, and flow organellometry, to infer the trafficking relationships between Syt1, Syt7, and MR1 during intracellular Mtb infection. Our data support a model that Syt1 and Syt7 facilitates the trafficking of MR1 from Mtb-containing vacuoles to the plasma membrane. This interpretation is supported with the increased accumulation of MR1 in Mtb-containing vacuoles and reduction in MAIT cell IFN-γ release observed in Syt1 and Syt7 knockout cells.

      References

      (1) Becker, S. M., Delamarre, L., Mellman, I., & Andrews, N. W. (2009). Differential role of the Ca(2+) sensor synaptotagmin VII in macrophages and dendritic cells. Immunobiology, 214(7), 495–505.

      (2) Brower, R. C., England, R., Takeshita, T., Kozlowski, S., Margulies, D. H., Berzofsky, J. A., & Delisi, C. (1994). Minimal requirements for peptide-mediated activation of CD8+ CTL. Molecular immunology, 31(16), 1285–1293.

      (3) Harriff, M. J., Karamooz, E., Burr, A., Grant, W. F., Canfield, E. T., Sorensen, M. L., Moita, L. F., & Lewinsohn, D. M. (2016). Endosomal MR1 Trafficking Plays a Key Role in Presentation of Mycobacterium tuberculosis Ligands to MAIT Cells. PLoS pathogens, 12(3), e1005524.

      (4) Karamooz, E., Harriff, M. J., Narayanan, G. A., Worley, A., & Lewinsohn, D. M. (2019). MR1 recycling and blockade of endosomal trafficking reveal distinguishable antigen presentation pathways between Mycobacterium tuberculosis infection and exogenously delivered antigens. Scientific reports, 9(1), 4797.

      (5) Karamooz, E., Kim, S. J., Peterson, J. C., Tammen, A. E., Soma, S., Soll, A. C. R., Meermeier, E. W., Khuzwayo, S., & Lewinsohn, D. M. (2025). Two-pore channels in MR1-dependent presentation of Mycobacterium tuberculosis infection. PLoS pathogens, 21(8), e1013342.

      (6) Kulicke, C. A., Swarbrick, G. M., Ladd, N. A., Cansler, M., Null, M., Worley, A., Lemon, C., Ahmed, T., Bennett, J., Lust, T. N., Heisler, C. M., Huber, M. E., Krawic, J. R., Ankley, L. M., McBride, S. K., Tafesse, F. G., Olive, A. J., Hildebrand, W. H., Lewinsohn, D. A., Adams, E. J., … Harriff, M. J. (2024). Delivery of loaded MR1 monomer results in efficient ligand exchange to host MR1 and subsequent MR1T cell activation. Communications biology, 7(1), 228.

      (7) Shimonkevitz, R., Kappler, J., Marrack, P., & Grey, H. (1983). Antigen recognition by H-2restricted T cells. I. Cell-free antigen processing. The Journal of Experimental Medicine, 158(2), 303–316.

      (8) Sykulev, Y., Cohen, R. J., & Eisen, H. N. (1995). The law of mass action governs antigen-stimulated cytolytic activity of CD8+ cytotoxic T lymphocytes. Proceedings of the National Academy of Sciences of the United States of America, 92(26), 11990–11992.

      (9) Sykulev, Y., Joo, M., Vturina, I., Tsomides, T. J., & Eisen, H. N. (1996). Evidence that a single peptide-MHC complex on a target cell can elicit a cytolytic T cell response. Immunity, 4(6), 565– 571.

      (10) Watts, T. H., Gariépy, J., Schoolnik, G. K., & McConnell, H. M. (1985). T-cell activation by peptide antigen: effect of peptide sequence and method of antigen presentation. Proceedings of the National Academy of Sciences of the United States of America, 82(16), 5480–5484.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable work investigates the role of protein N-glycosylation in regulating T-cell activation and function and suggests that B4GALT1 is a potential target for tumor immunotherapy. The strength of evidence is solid, and further mechanistic validation could be provided.

      We sincerely thank the editor and reviewers for their time and constructive feedback. Your recognition of our work is much appreciated. We clarify our mechanistic studies as stated below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy.

      Strengths:

      The strengths of this study are the findings of novel function of B4GALT1 deficiency in CD8 T cells.

      Weaknesses:

      However, authors did not directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements.

      We are very sorry that we did not highlight our results in Fig. 5f-h enough. In those figures, we demonstrated the interaction between TCR and CD8 increased significantly in B4GALT1 deficient T-cells, by FRET assays. To confirm the important role of TCR-CD8 interaction in mediating the functions of B4GALT1 in regulating T-cell functions, such as in vitro killing of target cells, we artificially tethered TCR and CD8 by a CD8β-CD3ε fusion protein and tested its functions in both WT and B4GALT1 knockout CD8<sup>+</sup> T-cell. Our results demonstrate that such fusion protein could bypass the effect of B4GALT1 knockout in CD8<sup>+</sup> T-cells (Fig. 5g-h). Together with the results that B4GALT1 directly regulates the galactosylation of TCR and CD8, those results strongly support the model that B4GALT1 modulates T-cell functions mainly by galactosylations of TCR and CD8 that interfere their interaction.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify the N-glycosylation factor B4GALT1 as an important regulator of CD8 T-cell function.

      Strengths:

      (1) The use of complementary ex vivo and in vivo CRISPR screens is commendable and provides a useful dataset for future studies of CD8 T-cell biology.

      (2) The authors perform multiple untargeted analyses (RNAseq, glycoproteomics) to hone their model on how B4GALT1 functions in CD8 T-cell activation.

      (3) B4GALT1 is shown to be important in both in vitro T-cell killing assays and a mouse model of tumor control, reinforcing the authors' claims.

      Weaknesses:

      (1) The authors did not verify the efficiency of knockout in their single-gene KO lines.

      Thank reviewer for reminding. We verified the efficiency of some gRNAs by T7E1 assay. We will add those data in supplementary results in revised version later.

      (2) As B4GALT1 is a general N-glycosylation factor, the phenotypes the authors observe could formally be attributable to indirect effects on glycosylation of other proteins.

      Please see response to reviewer #1.

      (3) The specific N-glycosylation sites of TCR and CD8 are not identified, and would be helpful for site-specific mutational analysis to further the authors' model.

      Thank reviewer for suggestion! Unfortunately, there are multiple-sites of TCR and CD8 involved in N-glycosylation (https://glycosmos.org/glycomeatlas). We worry that mutations of all these sites may not only affect glycosylation of TCR and CD8 but also other essential functions of those proteins.

      (4) The study could benefit from further in vivo experiments testing the role of B4GALT1 in other physiological contexts relevant to CD8 T cells, for example, autoimmune disease or infectious disease.

      Thank reviewer for this great suggestion to expand the roles of B4GALT1 in autoimmune and infection diseases. However, since in current manuscript we are mainly focusing on tumor immunology, we think we should leave these studies for future works.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy. However, authors need to directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements. In addition, blocking PD1 has been shown to enhance antitumor effect, whereas the presented data in this study suggest that the activation of PD1 expression in the condition of B4GALT1 deficiency in T cells enhanced antitumor effect. How to reconcile this discrepancy? Finally, several minor questions need to be addressed to strengthen the conclusions in this manuscript.

      (1) We used a FRET (Fluorescence Resonance Energy Transfer) assay to measure interaction between TCR and CD8. FRET signals of TCR-CD8 increased significantly in B4GALT1 deficient T-cells, compared with control cells (Fig. 5f). For functional outcomes of this interaction, we observed enhanced T-cell killing activities in B4GALT1 deficient CD8<sup>+</sup> T-cells (Fig. 3f and Fig. 5h).

      To confirm whether reduced TCR-CD8 interaction is the major cause of TCR activation phenotypes in B4GALT1 knockout CD8<sup>+</sup> T-cells, we generated a construct in which we fused the CD8b ectodomain (ECD) with CD3e to artificially tether TCR with CD8 (Fig.5g). Overexpression of such CD8β-CD3ε fusion led to enhanced in vitro killing activities in control wild-type CD8<sup>+</sup> T-cells. On the other hand, in B4GALT1 deficient CD8<sup>+</sup>T-cells, such enhanced T-cell killing activities by fusion construct was significantly diminished (Fig.5h), suggesting it bypassed the regulation by B4GALT1.

      (2) PD-1 is both an early T-cell activation marker upon TCR activation and a T-exhausted marker under consecutive or repeated stimulations. In our screenings, PD-1 was used as an early activation marker for T-cells.

      We have clarified this in new Discussion section.

      (1) The present data relies on statistical graphs (e.g., bar and line charts) for all data, excluding the bioinformatics analysis. Including data such as flow cytometry plots, photomicrographs, or immunohistochemistry staining images will provide more direct support for the conclusions.

      Thank the reviewer for valuable suggestions! We added original flow cytometry gating strategies for Cas9 screening sorting (Fig. S1a), TIL analysis (Fig.S5), and FRET assay (Fig. S8) in revised version to provide more direct support for our conclusions.

      (2) To further validate the enhanced tumor infiltration phenotype resulting from B4GALT1 knockout, the following data would strengthen the manuscript:

      (a) Flow cytometric analysis of TILs or immunofluorescence data from tumor sections.

      Thank the reviewer for valuable suggestion! We added original flow cytometry gating strategies for TILs in Fig. S5 in revised version.

      (b) Assessment of in vivo T cell proliferation, for example, by tracking changes in the proportion of CD8+ T cells in the peripheral blood over time.

      We analyzed in vivo T-cell proliferation within tumor by CFSE (carboxyfluorescein succinimidyl ester) analysis. As shown in Fig. S6b, 6 days after infusion, B4GALT1 knockout OT-I T-cell showed increased proliferation within tumors, comparing with wild type control OT-I cells.

      (c) Evaluation of the proliferation and activation status of OT-1 CD8+ T cells specifically in the draining lymph nodes of the mouse model.

      Thank the reviewer for valuable suggestion! We plan to perform this experiment in the future.

      (3) The authors provide evidence that B4GALT1 knockout enhances CD8+ T cell function in both mouse models and human TCR-T cells (in vitro). Definitive support for the translational potential of this strategy would come from showing that B4GALT1-knockout human TCR-T cells also mediate potent in vivo function (NSG tumor-bearing model may be a better choice).

      Thank the reviewer for valuable suggestion! We are going to perform those experiments in the future. However, we do not expect that in vitro and in vivo (NSG mice) experiments will show much different results, which may also not add too much for current manuscript.

      (4) It would be preferable to include data on T cell activation and effector function (e.g., flow cytometry for IL-2, TNF-α, and IFN-γ, or ELISPOT) following stimulation with an OVA-specific peptide or co-culturing of OVA-expressing tumor cells with B4GALT1-knockout OT-1 CD8 T cells, especially the changes in the TILs compared with the non-targeting control group.

      Following co-culturing of B16-OVA tumor cells with B4GALT1-knockout or wild-type OT-I CD8<sup>+</sup> T-cells, the RNA levels and secretion levels of TNFα and IFNγ were detected by RT-qPCR and ELISA, respectively (Fig. 3c). B4GALT1-deficient OT-I T-cells showed increased expression of T-cell activation and cytotoxic markers such as IFNγ and TNFα.

      (5) What is the correlation between the expression of B4GALT1, PD-1, and TCR activation markers at various time points during a long-term T cell co-culture with tumor cells?

      Thanks for the reviewer for valuable suggestion! We don’t have this data now. While we agree that exploring this might be interesting, we think it falls outside the scope of the current study.

      (6) In line 136: Regarding the genetic targeting of B4GALT1 in T cells, it is unclear whether single or multiple gRNAs were used and if potential off-target effects were assessed. To fully validate the model, it would be important to clarify these strategies, and it is essential to include data on the knockout efficiency at both the protein (e.g., Western blot) and mRNA levels.

      We are sorry about the unclear statements for gene knockout strategy. In current study, single sgRNAs were used in all experiments for gene knockout. B4galt1 sg2 was used in Fig. 3a. Both B4galt1 sg1 and sg2 were used in Fig. S1d. We clarified this in each figure legend in revised version.

      The phenotypes of B4galt1 knockout T-cells could be rescued by overexpression of either a short or long isoform of mouse B4galt1 cDNA (Fig. 3b), indicating that potential off-target effects could be excluded.

      The sgRNA knockout efficiencies were confirmed by T7E1 assay in revised version (Fig. S2). Regrettably, anti-mouse B4galt1 antibody didn’t work in western blot.

    1. Reviewer #2 (Public review):

      Summary:

      Fu and colleagues have shown that VALOR, a model of multimodal and dynamic stimulus features, better predicts brain responses compared to unimodal or static models such as AlexNet, WordNet, or CLIP. The authors demonstrated robustness of their findings from generalizing encoding results to an external dataset. They demonstrated the models' practical benefit by showing that semantic mappings were comparable to another model that required labor-intensive manual annotation. Finally, the authors showed that the model reveals predictive coding mechanisms of the brain, which held meaningful relationship with individuals' fluid intelligence measure.

      Strengths:

      Recent advances in neural network models that extract visual, linguistic, and semantic features from real-world stimuli have enabled neuroscientists to build encoding models that predict brain responses from these features. Higher prediction accuracy indicates greater explained variance in neural activity, and therefore a better model of brain function. Commonly used models include AlexNet for visual features, WordNet for audio-semantic features, and CLIP for visuo-semantic features; these served as comparison models in the study. Building on this line of work, the authors developed an encoding model using VALOR, which captures the multimodal and dynamic nature of real-world stimuli. VALOR outperformed the comparison models in predicting brain responses. It also recapitulated known semantic mappings and revealed evidence of predictive processing in the brain. These findings support VALOR as a strong candidate model of brain function.

      Weaknesses:

      The authors argue that this modeling contributes to better understanding how the brain works. However, upon reading, I am less convinced how VALOR's superior performance than other models tell us more about the brain. VALOR is a better model of the audiovisual stimulus because it processes multimodal and dynamic stimuli compared to other unimodal or static models. If the model better captures real-world stimuli, then I almost feel that it has to better capture brain responses, assuming that the brain is a system that is optimized to process multimodal and dynamic inputs from the real world. The authors could strengthen the manuscript if the significance of their encoding model findings is better explained.

      In Study 3, the authors show high alignment between WordNet and VALOR feature PCs. Upon reading the method together with Figure 3, I suspect that the alignment almost has to be high, given that the authors projected VALOR features to the Huth et al.'s PC space. Could the authors conduct non-parametric permutation tests, such as shuffling the VALOR features prior to mapping onto Huth et al.'s PC space, and then calculating the Jaccard scores? I imagine that the null distribution would be positively shifted. Still, I would be convinced if the alignment is higher than this shifted null distribution for each PC. If my understanding about this is incorrect, I suggest editing the relevant Method section (line 508) because this analysis was not easy to understand.

      In Study 4, the authors show that individuals whose superior parietal gyrus (SPG) exhibited high prediction distance had high fluid cognitive scores (Figure 4C). I had a hard time believing that this was a hypothesis-driven analysis. The authors motivate the analysis that "SPG and PCu have been strongly linked to fluid intelligence (line 304)". Did the authors conduct two analyses only-SPG-fluid intelligence and PCu-fluid intelligence-without relating other brain regions with other individual differences measures? Even if so, the authors should have reported the same r value and p value for PCu-fluid intelligence. If SPG-fluid intelligence indeed hold specificity in terms of statistical significance compared to all possible scenarios that were tested, is this rationally an expected result and could the authors explain the specificity? Also, the authors should explain why they considered fluid intelligence to be the proxy of one's ability to anticipate upcoming scenes during movie watching. I would have understood the rationale better if the authors have at least aggregated predictive scores for all brain regions that held significance into one summary statistics and have found significant correlation with the fluid intelligence measure.

      Comments on revisions:

      The revision has addressed these concerns.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This study compares four models - VALOR (dynamic visual-text alignment), CLIP (static visual-text alignment), AlexNet (vision-only), and WordNet (text-only) - in their ability to predict human brain responses using voxel-wise encoding modeling. The results show that VALOR not only achieves the highest accuracy in predicting neural responses but also generalizes more effectively to novel datasets. In addition, VALOR captures meaningful semantic dimensions across the cortical surface and demonstrates impressive predictive power for brain responses elicited by future events.

      Strengths:

      The study leverages a multimodal machine learning model to investigate how the human brain aligns visual and textual information. Overall, the manuscript is logically organized, clearly written, and easy to follow. The results well support the main conclusions of the paper.

      (1) My primary concern is that the performance difference between VALOR and CLIP is not sufficiently explained. Both models are trained using contrastive learning on visual and textual inputs, yet CLIP performs significantly worse. The authors suggest that this may be due to VALOR being trained on dynamic movie data while CLIP is trained on static images. However, this explanation remains speculative. More in-depth discussion is needed on the architectural and inductive biases of the two models, and how these may contribute to their differences in modeling brain responses.

      Thank you for this thoughtful comment. We agree that attributing VALOR’s advantage over CLIP solely to ‘dynamic (video) versus static (image) pretraining’ would be incomplete, and that the architectural and inductive biases of the two models are central to understanding the observed performance gap.

      Both VALOR and CLIP use contrastive learning to align visual and textual representations, but they differ in several key inductive biases that are particularly relevant for modeling brain responses during continuous movie viewing. First, VALOR is trained to align temporally extended video segments with text, introducing an explicit temporal integration window that aggregates information across consecutive frames. This encourages representations that maintain context, stabilize semantics across time, and encode event-level structure. Second, VALOR’s alignment operates at the level of multi-second narrative units, rather than isolated visual snapshots, biasing the model toward representations that are sensitive to unfolding events and cross-frame consistency.

      In contrast, CLIP processes frames independently and aligns single static images with text. As a result, it lacks an intrinsic mechanism for temporal binding, context accumulation, or event-level representation. While CLIP can capture rich visual–semantic associations at the image level, it is less well suited to represent higher-order temporal structure, which is known to strongly drive responses in association cortex during naturalistic narrative perception.

      We therefore interpret VALOR’s superior encoding performance as reflecting not only exposure to dynamic audiovisual data, but also inductive biases—temporal integration and event-level alignment—that more closely match how the brain integrates information over time during movie watching. We have revised the Discussion (p. 16) to articulate these architectural and representational differences explicitly, rather than attributing the effect solely to training data modality.

      (On page 16) “Additionally, VALOR exceeds the performance of CLIP, a leading static multimodal model, as its training objective aligns multi-second video–text units, enforcing a temporal integration window and event-level semantics that maintain cross-frame consistency and narrative context, whereas CLIP’s image-level alignment provides no intrinsic mechanism for such temporal continuity.”

      (2) The methods section lacks clarity regarding which layers of VALOR and CLIP were used to extract features for voxel-wise encoding modeling. A more detailed methodological description is necessary to ensure reproducibility and interpretability. Furthermore, discussion of the inductive biases inherent in these models-and their implications for brain alignment - is crucial.

      Thank you for this comment. We agree that reproducibility and interpretability require precise specification of which model representations were used for voxel-wise encoding, as well as clearer discussion of the inductive biases inherent in these models and their implications for brain alignment.

      In the revised Methods, we now explicitly specify the feature sources for both models. For CLIP (ViT-B/32), we use the final pooled image embedding after projection into the shared image–text space, extracted frame-by-frame; one representative frame is sampled per TR, and its projected embedding serves as the regressor. For VALOR, we use the final joint video–text projection head, yielding a 512-dimensional embedding computed at the segment/TR level that integrates information across consecutive frames and aligns each multi-second video segment with its associated text. These procedures are now described step-by-step in the Methods (p. 21).

      In addition, we expanded the Discussion (p. 16) to explicitly articulate the models’ inductive biases and their relevance for brain alignment. In particular, we contrast CLIP’s image-level, framewise alignment—which lacks intrinsic temporal integration—with VALOR’s event-level, temporally extended video–text alignment, which biases representations toward context maintenance and narrative continuity. This distinction helps explain why the two models differ in their ability to predict neural responses during continuous movie viewing.

      (Methods, On page 21)

      “(1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) CLIP features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.”

      (Discussion, On page 16)

      “Additionally, VALOR exceeds the performance of CLIP, a leading static multimodal model, as its training objective aligns multi-second video–text units, enforcing a temporal integration window and event-level semantics that maintain cross-frame consistency and narrative context, whereas CLIP’s image-level alignment provides no intrinsic mechanism for such temporal continuity. More broadly, this difference reflects distinct inductive biases in how the two models represent visual–linguistic information. CLIP is optimized for framewise image–text correspondence, encouraging representations that emphasize instantaneous visual semantics but remain agnostic to temporal structure. In contrast, VALOR is explicitly biased toward aggregating information over multiple consecutive frames and aligning representations at the level of temporally extended events. These inductive biases favor context maintenance, semantic stabilization, and narrative coherence over time, which are known to be critical for driving responses in association cortex during continuous movie perception.”

      (3) A broader question remains insufficiently addressed: what is the purpose of visual-text alignment in the human brain? One hypothesis is that it supports the formation of abstract semantic representations that rely on no specific input modality. While VALOR performs well in voxel-wise encoding, it is unclear whether this necessarily indicates the emergence of such abstract semantics. The authors are encouraged to discuss how the computational architecture of VALOR may reflect this alignment mechanism and what implications it has for understanding brain function.

      Thank you for this important conceptual question. We agree that improved voxel-wise encoding performance does not, by itself, imply the emergence of fully amodal or modality-independent semantic representations in the brain. In the revision, we therefore avoid framing our findings as evidence for abstract amodal semantics and instead clarify a more constrained interpretation.

      Specifically, we suggest that visual–text alignment may support the stabilization and coordination of scene-level meaning across modalities and over time, rather than the formation of modality-free semantic codes. From this perspective, VALOR’s advantage reflects inductive biases that promote (i) integration of visual information over multi-second windows and (ii) alignment of temporally extended visual events with linguistic descriptions, yielding representations that are more temporally stable, context-sensitive, and constrained by language.

      We therefore interpret VALOR’s superior encoding performance as identifying cortical regions whose responses are better captured by temporally stabilized, cross-modal representations, rather than as evidence that these regions encode fully abstract semantics independent of input modality. We have expanded the Discussion (p. 16) to articulate this interpretation and to clarify the implications of video–text alignment for understanding how the brain integrates perception and language during naturalistic cognition.

      (On page 16) “Together, the relative gains over AlexNet (purely visual), WordNet (manual semantic annotation), and CLIP (static image–text alignment) indicate cortical systems whose responses are best captured by multi-second, multimodal integration, and highlight regions that accumulate and stabilize narrative context over time. At the same time, these findings do not imply that visual–text alignment in the brain gives rise to fully amodal, modality-independent semantic representations. Instead, we suggest that alignment between visual and linguistic signals may serve to stabilize and coordinate scene-level meaning across modalities and over time. From this perspective, VALOR’s architecture—by integrating visual information over multi-second windows and aligning temporally extended video segments with language—provides a computational proxy for how the brain may use linguistic constraints to organize, disambiguate, and maintain coherent representations of unfolding events. The observed encoding gains therefore highlight regions engaged in temporally stabilized, cross-modal integration during naturalistic perception, rather than providing evidence for abstract semantic codes divorced from sensory input.”

      (4) The current methods section does not provide enough details about the network architectures, parameter settings, or whether pretrained models were used. If so, please provide links to the pretrained models to facilitate reproducible science.

      We appreciate this comment and agree that our original description of model sources and implementation details was not sufficiently explicit. These details are essential for both reproducibility and interpretability. We have now made these specifications explicit in the revised Methods.

      In particular, we now state for each model:

      VALOR. We use the publicly released pretrained VALOR-large checkpoint. For each movie segment, we extract the joint video–text projection head output (512-D) that encodes the aligned segment-level audiovisual semantics. We report the checkpoint source, the segment duration (in frames/seconds), and how these segment-level embeddings are temporally aligned to TRs for voxel-wise encoding.

      CLIP (ViT-B/32). We use the standard pretrained CLIP weights. For each video frame, we extract the final pooled image representation after projection into CLIP’s shared image–text embedding space (512-D). We also clarify that one representative frame is sampled and aligned to each TR, and that these projected embeddings are used as regressors in the encoding model.

      AlexNet. We use the ImageNet-pretrained AlexNet. We take activations from conv5, and then apply PCA to reduce them to 512 dimensions before mapping them to the fMRI time series.

      For each model, the revised Methods now specify: the pretrained source/checkpoint, the layer or head from which features were taken, output dimensionality, any preprocessing or dimensionality reduction, and the temporal alignment procedure used to generate TR-level regressors. These revisions appear in the updated Methods (page 21).

      (On page 21) “(1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) P features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.

      (3) AlexNet features: Visual features were extracted by sampling frames at the TR level and processing them with AlexNet, an eight-layer convolutional neural network comprising five convolutional layers followed by three fully connected layers. Features from all five convolutional layers were evaluated in preliminary analyses; the fifth convolutional layer showed the best performance and was used in subsequent analyses. Intra-image z-score normalization was applied to reduce amplitude effects. Principal component analysis (PCA) was used to reduce dimensionality, retaining the top 512 components to match the dimensionality of multimodal feature spaces. This pipeline was implemented using the DNNBrain toolkit 53.

      (4) WordNet features: Semantic features were obtained from publicly available WordNet annotations provided with the HCP dataset (7T_movie_resources/WordNetFeatures.hdf5), following the procedure of Huth et al. (2012). Each second of the movie clips was manually annotated with WordNet categories according to predefined guidelines: (a) identifying clear objects and actions in the scene; (b) labeling categories that dominated for more than half of the segment duration; and (c) using specific category labels rather than general ones. A semantic feature matrix was constructed with rows corresponding to time points and columns to semantic categories, with category presence coded as binary values. More specific categories from the WordNet hierarchy were added to each labeled category, yielding a total of 859 semantic features. These features were used directly as regressors. We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text. For the generalization analysis in Study 2, annotations for the SFM dataset were aligned to the same WordNet category space to ensure consistency.”

      Reviewer #2 (Public review):

      Fu and colleagues have shown that VALOR, a model of multimodal and dynamic stimulus features, better predicts brain responses compared to unimodal or static models such as AlexNet, WordNet, or CLIP. The authors demonstrated the robustness of their findings by generalizing encoding results to an external dataset. They demonstrated the models' practical benefit by showing that semantic mappings were comparable to another model that required labor-intensive manual annotation. Finally, the authors showed that the model reveals predictive coding mechanisms of the brain, which held a meaningful relationship with individuals' fluid intelligence measures.

      Strengths:

      Recent advances in neural network models that extract visual, linguistic, and semantic features from real-world stimuli have enabled neuroscientists to build encoding models that predict brain responses from these features. Higher prediction accuracy indicates greater explained variance in neural activity, and therefore a better model of brain function. Commonly used models include AlexNet for visual features, WordNet for audio-semantic features, and CLIP for visuo-semantic features; these served as comparison models in the study. Building on this line of work, the authors developed an encoding model using VALOR, which captures the multimodal and dynamic nature of real-world stimuli. VALOR outperformed the comparison models in predicting brain responses. It also recapitulated known semantic mappings and revealed evidence of predictive processing in the brain. These findings support VALOR as a strong candidate model of brain function.

      (1) The authors argue that this modeling contributes to a better understanding of how the brain works. However, upon reading, I am less convinced about how VALOR's superior performance over other models tells us more about the brain. VALOR is a better model of the audiovisual stimulus because it processes multimodal and dynamic stimuli compared to other unimodal or static models. If the model better captures real-world stimuli, then I almost feel that it has to better capture brain responses, assuming that the brain is a system that is optimized to process multimodal and dynamic inputs from the real world. The authors could strengthen the manuscript if the significance of their encoding model findings were better explained.

      We thank the reviewer for this thoughtful comment and agree with the premise that a model preserving multimodal and temporal structure might a priori be expected to better predict brain responses to naturalistic stimuli. Our intent is not to claim that higher accuracy alone explains brain function, but rather that where and how VALOR improves prediction provides diagnostic insight into cortical processing. We have revised the Discussion to make this distinction explicit.

      Specifically, we clarify three ways in which VALOR’s gains are scientifically informative rather than merely unsurprising:

      (1) Anatomical specificity of improvement. VALOR’s advantage is not uniform across the cortex; gains are largest in regions implicated in multi-second, cross-modal integration. This spatial pattern constrains where the brain accumulates information over time and stabilizes visual representations using linguistic context.

      (2) Model as a computational probe. Beyond prediction accuracy, VALOR’s feature space recovers large-scale semantic organization without manual annotation and enables targeted tests of predictive processing. Features reflecting upcoming content selectively improve fits in specific regions, consistent with anticipatory coding during continuous narrative perception.

      (3) Link to individual differences. Individuals whose neural responses are better captured by anticipatory features show higher fluid intelligence, suggesting that VALOR indexes meaningful variability in forward-looking representations rather than merely tracking stimulus complexity.

      Accordingly, we have revised the Discussion (p. 16) to frame VALOR as a tool for mapping cortical integration profiles, probing semantic and predictive structure, and linking representational dynamics to cognition, rather than asserting that higher encoding accuracy alone explains brain function.

      (On page 16) “Together, the relative gains over AlexNet (purely visual), WordNet (manual semantic annotation), and CLIP (static image–text alignment) indicate cortical systems whose responses are best captured by multi-second, multimodal integration, and highlight regions that accumulate and stabilize narrative context over time.”

      (2) In Study 3, the authors show high alignment between WordNet and VALOR feature PCs. Upon reading the method together with Figure 3, I suspect that the alignment almost has to be high, given that the authors projected VALOR features to the Huth et al.'s PC space. Could the authors conduct non-parametric permutation tests, such as shuffling the VALOR features prior to mapping onto Huth et al.'s PC space, and then calculating the Jaccard scores? I imagine that the null distribution would be positively shifted. Still, I would be convinced if the alignment is higher than this shifted null distribution for each PC. If my understanding of this is incorrect, I suggest editing the relevant Method section (line 508) because this analysis was not easy to understand.

      Thank you for this helpful comment and for pointing out a potential source of confusion. We apologize that the original Methods description was not sufficiently clear. Importantly, VALOR features were never projected into the Huth et al. PC space, and no optimization or rotation toward the WordNet basis occurred at any stage.

      The analysis proceeded as follows:

      (1) VALOR PCs. We first fit voxel-wise encoding models using VALOR features on the Huth et al. dataset. We then applied PCA to the resulting cortical weight maps, yielding spatial components (‘VALOR PCs’) that summarize shared patterns of VALOR feature weights across the cortex.

      (2) WordNet PCs. We used the semantic principal components reported by Huth et al. (2012) directly as published, with no refitting, projection, or modification using VALOR.

      (3) Correspondence analysis. Only after obtaining these two independent sets of cortical maps did we threshold each to their top-loading vertices and compute Jaccard overlap between VALOR PCs and WordNet PCs.

      Although a permutation that shuffles VALOR features prior to projection addresses a scenario that does not apply here, we agree that the Methods description should more clearly convey the independence of the two decompositions. We have therefore revised the Methods (p. 24) to describe the procedure step-by-step and explicitly state that no projection, refitting, or optimization toward the WordNet basis was performed.

      (On page 24) “We first fit voxel-wise encoding models using VALOR features for each of the five participants in the Huth et al. dataset. For each participant, this yielded a weight map linking each VALOR feature to each voxel. We then stacked these weight maps across participants to form a single voxel-by-feature weight matrix and applied principal component analysis (PCA). The top four principal components from this analysis (“VALOR PCs”) captured shared spatial patterns of VALOR feature weights across cortex. To interpret these components, we projected VALOR feature vectors from >20,000 video segments in the VALOR training set onto each VALOR PC, which revealed dominant semantic axes (e.g., mobility, sociality, civilization). For comparison, we used the semantic principal components reported by Huth et al. (2012) from their WordNet-based encoding model; these “WordNet PCs” were taken directly from the published work and were not refit or reweighted using VALOR.”

      (3) In Study 4, the authors show that individuals whose superior parietal gyrus (SPG) exhibited high prediction distance had high fluid cognitive scores (Figure 4C). I had a hard time believing that this was a hypothesis-driven analysis. The authors motivate the analysis that "SPG and PCu have been strongly linked to fluid intelligence (line 304)". Did the authors conduct two analyses only-SPG-fluid intelligence and PCu-fluid intelligence-without relating other brain regions to other individual differences measures? Even if so, the authors should have reported the same r-value and p-value for PCu-fluid intelligence. If SPG-fluid intelligence indeed holds specificity in terms of statistical significance compared to all possible scenarios that were tested, is this rationally an expected result, and could the authors explain the specificity? Also, the authors should explain why they considered fluid intelligence to be the proxy of one's ability to anticipate upcoming scenes during movie watching. I would have understood the rationale better if the authors had at least aggregated predictive scores for all brain regions that held significance into one summary statistic and found a significant correlation with the fluid intelligence measure.

      We thank the reviewer for this careful and constructive comment and agree that greater transparency about analytic intent, specificity, and rationale is needed. We have revised the manuscript accordingly.

      (1) Analytic scope and a priori restriction. The analysis in Fig. 4C was hypothesis-driven and restricted a priori to two regions — superior parietal gyrus (SPG) and precuneus (PCu) — based on convergent evidence linking frontoparietal and medial parietal systems to fluid reasoning, relational integration, and domain-general cognitive control. Importantly, we did not conduct a whole-brain search across regions or behaviors to identify the strongest correlation post hoc.

      (2) Specificity and reporting. In response to the reviewer’s request, we now report the full results for both hypothesized regions. Prediction horizon in SPG showed a statistically reliable association with fluid intelligence, whereas PCu showed a positive but weaker trend that did not survive correction. Reporting both results makes the regional specificity explicit rather than implicit.

      (3) Why SPG over PCu? Although both regions are implicated in fluid cognition, SPG has been more consistently linked to active maintenance and manipulation of relational structure and top-down attentional control, whereas PCu is more often associated with internally oriented and mnemonic processes. We therefore interpret the stronger SPG association as consistent with a role for sustained, externally driven predictive processing during continuous perception, rather than as evidence of exclusivity.

      (4) Why fluid intelligence? We do not equate fluid intelligence with “anticipation” per se. Rather, we used gF as an a priori proxy for domain-general capacities — maintaining and updating relational context over multi-second windows, integrating multiple constraints, and exerting flexible control — that are plausibly recruited when anticipating upcoming events during naturalistic narratives. The reported relationship is associative and hypothesis-consistent, not causal.

      (5) Why not aggregate across regions? We agree that aggregation could reveal more global relationships; however, our goal in this analysis was to test whether predictive timescales in theoretically motivated control regions relate to individual differences, rather than to maximize correlation by pooling heterogeneous regions. We now clarify this rationale in the Results.

      These clarifications and additional statistics have been incorporated in the revised Results section (p. 14).

      (On page 14) “Finally, we examined whether prediction horizons were linked to individual differences in cognition. We focused on fluid intelligence (gF) because gF is widely taken to index domain-general capacities such as maintaining and updating relational context over several seconds, integrating multiple constraints, and exerting flexible top-down control — functions that should support anticipating what will happen next in a continuous narrative. We targeted two parietal regions, the SPG and the PCu, which have both been repeatedly linked to gF and high-level cognitive control in the individual-differences literature 36,37. For each participant, we correlated fluid cognition scores with that participant’s average prediction horizon in each region. As shown in Fig. 4c, individuals with longer prediction horizons in SPG showed higher fluid cognition scores (SPG: r = 0.172, FDR-corrected p = 0.047). PCu showed a similar positive trend (PCu: r = 0.111, FDR-corrected p = 0.146) but did not reach significance. These associations suggest that the ability to sustain a longer predictive timescale during naturalistic perception co-varies with broader fluid cognitive capacity. No additional brain regions or behavioral measures were examined in this analysis.”

      Reviewer #3 (Public review):

      In this work, the authors aim to improve neural encoding models for naturalistic video stimuli by integrating temporally aligned multimodal features derived from a deep learning model (VALOR) to predict fMRI responses during movie viewing.

      Strengths:

      The major strength of the study lies in its systematic comparison across unimodal and multimodal models using large-scale, high-resolution fMRI datasets. The VALOR model demonstrates improved predictive accuracy and cross-dataset generalization. The model also reveals inherent semantic dimensions of cortical organization and can be used to evaluate the integration timescale of predictive coding.

      This study demonstrates the utility of modern multimodal pretrained models for improving brain encoding in naturalistic contexts. While not conceptually novel, the application is technically sound, and the data and modeling pipeline may serve as a valuable benchmark for future studies.

      (1) Lines 95-96: The authors claim that "cortical areas share a common space," citing references [22-24]. However, these references primarily support the notion that different modalities or representations can be aligned in a common embedding space from a modeling perspective, rather than providing direct evidence that cortical areas themselves are aligned in a shared neural representational space.

      We thank the reviewer for this important clarification. We agree that the cited works do not provide direct evidence that cortical areas themselves are aligned in a single neural representational space. Rather, they demonstrate that representations derived from different modalities can be mapped into a shared embedding space from a modeling and computational perspective.

      We have therefore revised the text to avoid overstatement and to more precisely reflect what these studies support. In the revised manuscript (p. 4), we now frame the claim in terms of a shared representational framework or feature space used for modeling, rather than implying that cortical areas themselves intrinsically share a unified neural space. This clarification aligns the conceptual claim with the scope of the cited literature.

      (On page 4) “As a result, researchers are turning to multimodal deep learning, which learns from visual, linguistic, and auditory streams to model complex brain functions. This trend is supported by neuroscience evidence that cortical responses across regions can be jointly modeled within a common representational space.”

      (2) The authors discuss semantic annotation as if it is still a critical component of encoding models. However, recent advances in AI-based encoding methods rely on features derived from large-scale pretrained models (e.g., CLIP, GPT), which automatically capture semantic structure without requiring explicit annotation. While the manuscript does not systematically address this transition, it is important to clarify that the use of such pretrained models is now standard in the field and should not be positioned as an innovation of the present work. Additionally, the citation of Huth et al. (2012, Neuron) to justify the use of WordNet-based annotation omits the important methodological shift in Huth et al. (2016, Nature), which moved away from manual semantic labeling altogether. Since the 2012 dataset is used primarily to enable comparison in study 3, the emphasis should not be placed on reiterating the disadvantages of semantic annotation, which have already been addressed in prior work. Instead, the manuscript's strength lies in its direct comparison between data-driven feature representations and semantic annotation based on WordNet categories. The authors should place greater emphasis on analyzing and discussing the differences revealed by these two approaches, rather than focusing mainly on the general advantage of automated semantic mapping.

      Thank you for this thoughtful and constructive comment. We agree with the reviewer that the field has largely transitioned away from manual semantic annotation toward features derived from large-scale pretrained models (e.g., CLIP, GPT-style architectures), and that this shift is now standard rather than a novelty of the present work.

      We have revised the manuscript to clarify this positioning. Our goal is not to claim automated semantic extraction as an innovation, but rather to demonstrate how a multimodal, temporally informed video–text model can be used as a direct feature space for voxel-wise encoding of naturalistic movie fMRI data. VALOR is used as a representative example of this broader class of pretrained models, and our emphasis is on the general modeling approach rather than on promoting a specific architecture.

      We also agree that our original discussion underemphasized the important methodological shift introduced in Huth et al. (2016, Nature), which moved away from manual semantic labeling in the context of continuous spoken narratives. We now explicitly acknowledge this work and clarify that our use of WordNet-based annotations from Huth et al. (2012) serves a different purpose: it provides an interpretable, historically grounded benchmark for comparison in Study 3, rather than a claim that semantic annotation remains necessary or state-of-the-art.

      In response to the reviewer’s suggestion, we have revised the Results (p.10) and Discussion (p.18) to place greater emphasis on what is revealed by directly comparing data-driven multimodal features with category-based semantic annotation under matched conditions. Specifically, we focus on how these two approaches converge at the level of large-scale semantic organization while differing in their flexibility, temporal resolution, and dependence on human-defined categories. These revisions better reflect the current state of the field and sharpen the manuscript’s central contribution as a principled comparison between modeling approaches, rather than a general argument for automated semantic mapping.

      (On page 10) “Study 3: Comparing data-driven multimodal representations with category-based semantic annotation

      A central question in naturalistic encoding is how data-driven feature representations derived from pretrained models relate to more interpretable, category-based semantic annotations that have historically been used to study cortical semantic organization. Although recent work has shown that pretrained language and vision–language models can capture semantic structure without explicit annotation, category-based approaches such as WordNet remain valuable as interpretable reference frameworks. Here, we leverage the WordNet-based semantic components reported by Huth et al. (2012) 5 not as a state-of-the-art alternative, but as a historically grounded benchmark, allowing a controlled comparison between data-driven multimodal representations and manually defined semantic categories under matched naturalistic movie stimuli.”

      (On page 18) “Study 3 demonstrates the utility of video–text alignment models for probing higher-order semantic representations during naturalistic perception. Our comparison between VALOR-derived representations and WordNet-based semantic components highlights an important distinction between data-driven and category-based approaches to modeling meaning in the brain. While multimodal pretrained models offer flexible, high-dimensional representations that capture semantic structure without explicit annotation, category-based frameworks provide interpretability and theoretical anchoring 4,48. Using WordNet-based labeling from prior work as an interpretable reference point, we show that VALOR automatically extracts semantic dimensions—including mobility, sociality, and civilization—that closely mirror those identified using manual semantic categories (Fig. 3). The observed alignment between VALOR PCs and WordNet semantic components suggests that large-scale semantic organization emerges consistently across these approaches, even though they differ in how semantic structure is defined and learned. This convergence supports the use of pretrained multimodal models as practical encoding tools for naturalistic stimuli, while also underscoring the continued value of interpretable semantic benchmarks for understanding which aspects of meaning are represented across cortex. We do not argue that semantic annotation is required for modern encoding models; rather, WordNet-based features serve here as a historically grounded and interpretable reference for contextualizing data-driven multimodal representations.”

      (3) The authors use subject-specific encoding models trained on the HCP dataset to predict group-level mean responses in an independent in-house dataset. While this analysis is framed as testing model generalization, it is important to clarify that it is not assessing traditional out-of-distribution (OOD) generalization, where the same subject is tested on novel stimuli, but rather evaluating which encoding model's feature space contains more stimulus-specific and cross-subject-consistent information that can transfer across datasets.

      We thank the reviewer for this helpful clarification and agree that the type of generalization tested here should be described more precisely. Our analysis does not assess classical within-subject out-of-distribution (OOD) generalization, in which the same individual is tested on novel stimuli.

      Instead, for each HCP participant we train a subject-specific encoding model and transfer it to predict group-mean responses in an independent in-house dataset collected at a different site, with different participants, different movies, and different acquisition conditions. This design evaluates which encoding model’s feature space contains stimulus-locked representations that are consistent across individuals and robust to changes in dataset and experimental context, rather than within-subject stimulus novelty per se.

      We have revised the Results (p. 10) and Discussion section (p. 17) to explicitly describe this analysis as a test of cross-subject and cross-dataset transferability of stimulus representations, and to clarify the distinction from traditional OOD generalization.

      (On Page 10) “Although this analysis is not a classical within-subject out-of-distribution generalization test, it evaluates the extent to which different feature spaces capture stimulus-locked representations that are consistent across subjects and transferable across datasets, stimuli, and acquisition environments.”

      (On Page 17) “By contrast, VALOR exhibited stronger generalization in a cross-cohort, cross-stimulus, and cross-site transfer evaluation.”

      (4) Within this setup, the finding that VALOR outperforms CLIP, AlexNet, and WordNet is somewhat expected. VALOR encodes rich spatiotemporal information from videos, making it more aligned with movie-based neural responses. CLIP and AlexNet are static image-based models and thus lack temporal context, while WordNet only provides coarse categorical labels with no stimulus-specific detail. Therefore, the results primarily reflect the advantage of temporally-aware features in capturing shared neural dynamics, rather than revealing surprising model generalization. A direct comparison to pure video-based models, such as Video Swin Transformers or other more recent video models, would help strengthen the argument.

      We thank the reviewer for this baseline-focused comment and agree that, in naturalistic movie paradigms, a temporally structured audiovisual model would be expected to outperform static or unimodal feature spaces. Our intent in this comparison is therefore not to claim a surprising advantage, but to isolate which inductive biases matter for cross-dataset transfer of movie-evoked neural responses.

      The baseline models were chosen deliberately to span feature spaces that are widely used and interpretable in cognitive neuroscience: AlexNet (vision-only, frame-based), WordNet (human-defined semantic categories without learned visual features), and CLIP (static image–text alignment without temporal context). Comparing VALOR against these established baselines under matched preprocessing, TR alignment, and dimensionality control allows us to attribute performance differences specifically to temporal integration and audiovisual alignment, rather than to generic model capacity.

      We agree that a direct comparison with purely visual spatiotemporal encoders (e.g., Video Swin or TimeSformer-style models) would further dissociate the contribution of temporal visual processing from cross-modal video–text alignment. We now explicitly note this as an important direction for future work and frame VALOR as one representative of a broader class of multimodal video models, rather than as a uniquely optimal solution (Discussion, p. 16).

      (On page 16) “Second, we did not directly compare VALOR to state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE, and related architectures) that are designed to capture temporal visual structure without language grounding; such comparisons will be important for isolating the specific contributions of temporal visual processing versus cross-modal video–text alignment in naturalistic neural responses.”

      (5) Moreover, while WordNet-based encoding models perform reasonably well within-subject in the HCP dataset, their generalization to group-level responses in the Short Fun Movies (SFM) dataset is markedly poorer. This could indicate that these models capture a considerable amount of subject-specific variance, which fails to translate to consistent group-level activity. This observation highlights the importance of distinguishing between encoding models that capture stimulus-driven representations and those that overfit to individual heterogeneities.

      Thank you for this thoughtful observation. We agree with the reviewer’s interpretation. In our analyses, WordNet-based models perform reasonably well when fit and evaluated within individual HCP participants, but their performance degrades substantially when transferred to predict group-averaged responses in the independent SFM dataset. This dissociation suggests that, while WordNet annotations capture meaningful variance at the individual level, a larger fraction of that variance may be subject-specific or idiosyncratic, and therefore does not translate into consistent, stimulus-locked responses at the group level.

      One motivation for our cross-dataset, cross-subject evaluation is precisely to distinguish encoding models that primarily capture shared stimulus-driven structure from those whose apparent performance depends more strongly on individual heterogeneity. In this context, the reduced transferability of WordNet-based models highlights a potential limitation of category-based semantic features for capturing population-consistent neural dynamics during naturalistic viewing.

      We note that this effect likely reflects multiple factors rather than a single failure mode, including differences in annotation schemes, labeling granularity, and semantic coverage across datasets. By contrast, video–text models provide time-aligned linguistic features directly from the stimulus itself, reducing reliance on dataset-specific human annotation and exhibiting stronger transfer across cohorts. We have clarified this interpretation in the revised Discussion (p. 17).

      (Page 17) “Together, these findings underscore the importance of distinguishing encoding models that primarily capture shared, stimulus-driven neural structure from those whose performance relies more heavily on subject-specific heterogeneity, particularly when evaluating generalization across participants and datasets.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Methods section, please clarify which specific layer of VALOR the 512-dimensional feature vector was extracted from.

      Thank you for this suggestion. We have revised the Methods to state explicitly that the 512-dimensional feature vector is extracted from VALOR’s joint video–text projection head, i.e., the final projection layer of the contrastive alignment module that maps video and text representations into a shared embedding space. We also clarify that these 512-D embeddings are computed at the segment/TR level and then time-aligned to the BOLD signal (Methods, p. 21).

      (On page 21) “We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.”

      (2) It would be helpful to include more detailed descriptions of the network architectures and parameters for all models used.

      Thank you for the suggestion. We have revised the Methods to include model-specific subsections for all feature spaces used (VALOR, CLIP, AlexNet, and WordNet). For each model, we now explicitly report (i) the backbone architecture and training objective, (ii) the exact feature source (layer or projection head) and output dimensionality, and (iii) how features were temporally aligned to the BOLD signal. All models were used with their publicly released pretrained parameters, without additional fine-tuning. These additions are intended to improve transparency and reproducibility (Methods, p. 21).

      (On page 21) “Movie Feature Extraction

      (1) Video–text alignment features (VALOR): To extract video-based multimodal features, we used VALOR (VALOR-large checkpoint), an open-source pretrained video–text alignment model24. VALOR combines visual encoders (CLIP and Video Swin Transformer) for extracting visual features and a text encoder (BERT) for extracting textual features 23,51,52. These representations are aligned in a shared embedding space through contrastive learning. We segmented each movie at the TR level and, for each segment, extracted VALOR’s projected video–text embedding from the final projection head of the alignment module to obtain a 512-dimensional feature vector. These embeddings were then time-aligned to the corresponding BOLD responses.

      (2) CLIP features: To compare with static image-based multimodal models, we utilized CLIP (ViT-B/32), which aligns visual and textual representations through contrastive learning but processes individual frames independently without capturing temporal context. One video frame was sampled per TR, and the pooled image embedding after CLIP’s projection into the shared image–text space was extracted to obtain a 512-dimensional feature vector. These TR-aligned vectors were used directly as regressors in the voxel-wise encoding models.

      (3) AlexNet features: Visual features were extracted by sampling frames at the TR level and processing them with AlexNet, an eight-layer convolutional neural network comprising five convolutional layers followed by three fully connected layers. Features from all five convolutional layers were evaluated in preliminary analyses; the fifth convolutional layer showed the best performance and was used in subsequent analyses. Intra-image z-score normalization was applied to reduce amplitude effects. Principal component analysis (PCA) was used to reduce dimensionality, retaining the top 512 components to match the dimensionality of multimodal feature spaces. This pipeline was implemented using the DNNBrain toolkit 53.

      (4) WordNet features: Semantic features were obtained from publicly available WordNet annotations provided with the HCP dataset (7T_movie_resources/WordNetFeatures.hdf5), following the procedure of Huth et al. (2012). Throughout this manuscript, we use the term “semantic features” to refer to such human-annotated, category-based representations of scene content, and we reserve the term “linguistic features” for continuous language embeddings derived automatically from pretrained language or vision–language models. Each second of the movie clips was manually annotated with WordNet categories according to predefined guidelines: (a) identifying clear objects and actions in the scene; (b) labeling categories that dominated for more than half of the segment duration; and (c) using specific category labels rather than general ones. A semantic feature matrix was constructed with rows corresponding to time points and columns to semantic categories, with category presence coded as binary values. More specific categories from the WordNet hierarchy were added to each labeled category, yielding a total of 859 semantic features. These features were used directly as regressors. We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text. For the generalization analysis in Study 2, annotations for the SFM dataset were aligned to the same WordNet category space to ensure consistency.”

      (3) In Figure 3, consider following Huth et al.'s approach by using 3-4 distinct colors to visualize semantic representations across the cortical surface more clearly.

      Thank you for this excellent suggestion. We have generated an alternative visualization using a discrete 3–4 color scheme following Huth et al. to display the semantic components on the cortical surface. This version makes the spatial correspondence between components and the boundaries between cortical territories easier to see. We now include this visualization in the Supplement (Fig. S3)

      (4) In Figure 2, the brain renderings are too small. Please consider creating a separate, enlarged figure with clearer delineation of relevant ROIs.

      We appreciate this suggestion and agree that clear delineation of ROIs is important. We evaluated larger brain renderings; however, within the multi-panel layout of Fig. 2, enlarging them compressed accompanying plots/legends and introduced visual crowding, which reduced overall readability. To preserve a balanced layout and consistent typography across panels, we have kept the current rendering size in the main text and added Fig. S4 with enlarged brain renderings showing clearer ROI boundaries for the same ROIs.

      Reviewer #2 (Recommendations for the authors):

      (1) From the introduction, I feel like naïve readers would have a hard time understanding what semantic models (e.g., WordNet) are, which the authors write are based on "labor-intensive and subjective manual annotation of semantic content". It would be straightforward to explain the process-how scientists have written descriptions or denoted categories of what's happening within a TR and transformed these into embedding vectors based on language models. This description would explain what the authors mean by "labor-intensive, time-consuming, and subjective". Related to this point, the authors seem to be using the words "semantic model/feature" and "linguistic model/feature" interchangeably, which may exacerbate the confusion.

      Thank you for this helpful suggestion. We agree that naïve readers would benefit from a clearer explanation of how “semantic” models such as WordNet are constructed and from a more precise distinction between semantic and linguistic features.

      In response, we expanded the Introduction (p. 3) to explicitly describe the process by which semantic features are generated via dense human annotation (i.e., raters label objects, actions, and events within each TR and map these labels onto a predefined ontology to form feature vectors), clarifying why this approach is labor-intensive, time-consuming, and subject to rater variability.

      To avoid disrupting the conceptual flow of the Introduction, we placed the explicit terminology clarification in the Methods section (p. 22), where feature extraction is described. There, we now define semantic features as human-annotated, category-based representations of scene content, and linguistic features as continuous language embeddings derived automatically from pretrained language or vision–language models. These revisions are intended to improve clarity and consistency for both expert and non-expert readers.

      (On page 3) “Critically, semantic models often rely on dense human annotation. In early naturalistic encoding studies, trained raters watched the stimulus and labeled what was happening within each TR or short time window—for example, identifying objects, actions, or events present in the scene. These labels were then mapped onto a predefined semantic ontology (such as WordNet), yielding high-dimensional categorical feature vectors that served as regressors in encoding models. While this approach provides interpretable semantic features, it is labor-intensive, time-consuming, and inherently subjective, as annotations depend on rater judgment, labeling guidelines, and dataset-specific conventions, limiting scalability and reproducibility.”

      (On page 22) “Throughout this manuscript, we use the term “semantic features” to refer to such human-annotated, category-based representations of scene content, and we reserve the term “linguistic features” for continuous language embeddings derived automatically from pretrained language or vision–language models.”

      (2) Figure 1A does not look like an accurate schematic of the encoding method. For example, shouldn't the "Train" give rise to weight matrices, and Movies come from moments at Test? I would appreciate it if this schematic figure would explain what the encoding model is to naïve readers.

      (3) Figure 1B emphasizes that VALOR is utilizing multimodal features, but does not emphasize that the model is trained on dynamic video. The current figure looks like the model extracted visual and linguistic features from a screenshot of the video, much like the CLIP model.

      Thank you for this helpful comment. We agree that the original Fig. 1A did not sufficiently clarify what is learned during training versus what is applied during testing, and that this distinction is particularly important for naïve readers unfamiliar with encoding models. We also agree that the original Fig. 1B did not sufficiently emphasize that VALOR is trained on dynamic video segments, and that the schematic could be misinterpreted as aligning a single video frame with text, similar to CLIP-style image–text models.

      We have revised Fig. 1A (p. 6) to make the encoding procedure explicit and pedagogical. Specifically, we now clearly depict that, during the training phase (HCP dataset), voxel-wise encoding models learn feature-to-voxel weight matrices from stimulus features and BOLD responses. These learned weights are explicitly labeled as voxel-wise weight matrices and visually associated with the training stage. In the testing/generalization phase (SFM dataset), we now indicate that these learned weights are held fixed and applied to features extracted from novel movies to generate predicted BOLD responses. Additional labels were added to distinguish “Training (learn weights)” from “Testing/Transfer (apply fixed weights)” and to clarify that the encoding model implements a linear mapping from stimulus features to voxel responses. We have also rewritten the Fig. 1 legend (p. 6) to explicitly explain the encoding workflow in words, including (i) the learning of voxel-specific weights during training, (ii) their reuse during cross-dataset transfer, and (iii) how generalization performance is evaluated. These changes are intended to ensure that Fig. 1A accurately reflects the encoding methodology and is understandable to readers without prior experience with encoding models.

      We have revised Fig. 1B (p. 6) to explicitly highlight the temporal nature of the video input used by VALOR. In the updated schematic, the visual stream is depicted as a sequence of consecutive frames spanning multiple seconds, grouped into a video segment, rather than as a single static image. Additional labels indicate that VALOR encodes temporally extended video clips and aligns them with corresponding textual descriptions in a shared embedding space via contrastive learning. We have also updated the figure legend (p. 6) to clarify that VALOR operates on multi-frame video segments and explicitly models temporal structure, distinguishing it from static image–text models such as CLIP. These changes are intended to make clear that VALOR’s advantage derives not only from multimodality, but also from learning representations over time.

      (4) Regarding Figure 2, why were paired t-tests conducted in one-sided comparisons? Shouldn't this be two-sided, given that there is no reason to assume one is higher or lower than another?

      Thank you for raising this point. We agree that, in the absence of a preregistered directional hypothesis, paired comparisons should be evaluated using two-sided statistical tests.

      In response, we have re-run all paired comparisons reported in Figure 2 (p. 9) using two-sided paired t-tests, recomputed the corresponding p-values and false discovery rate (FDR) corrections, and updated the significance markers in the figure and captions accordingly. Importantly, this change does not alter the qualitative pattern of results or the main conclusions reported in the manuscript.

      (5) Regarding Study 4, I am curious whether the results are specific to forward-looking representations (predictive coding) or whether the results broadly reveal regions that are sensitive to contexts. For example, if the authors were to incorporate nearby past scenes in the analysis rather than the nearby future scenes, would different brain regions light up?

      Thank you for this thoughtful question. We agree that it is important to distinguish forward-looking (predictive) representations from more general sensitivity to temporal context. In Study 4, we deliberately operationalized prediction using future-aligned features, such that only information from upcoming scenes was incorporated into the encoding model. Accordingly, the reported effects should be interpreted as reflecting forward-oriented representations rather than generic context sensitivity.

      To make this interpretive scope explicit, we have added a clarifying sentence at the beginning of the Study 4 paragraph in the Discussion (p.18), noting that our analysis incorporates only future-aligned features and that directly contrasting past- and future-aligned features will be an important direction for future work. This clarification is intended to clearly bound our claims while addressing the reviewer’s conceptual distinction..

      (On page 18) “In Study 4, we used a video-text alignment model to investigate predictive coding mechanisms. Because our analysis incorporates only future-aligned features, the reported effects should be interpreted as reflecting forward-oriented representations rather than generic sensitivity to temporal context; directly contrasting past- and future-aligned features will be an important direction for future work.”

      (6) In the paragraph starting in line 447, were WordNet feature time series also reduced to 512 dimensions like the rest of the model features?

      Thank you for the question. In the main analyses, WordNet feature time series were not reduced to 512 dimensions and were instead used at their full dimensionality (859 features).

      For comparability with the other feature spaces, we additionally conducted a control analysis in which WordNet features were reduced to 512 dimensions using PCA. The PCA was fit within each training fold to avoid information leakage, and the resulting 512-D features were evaluated using the same encoding pipeline. This PCA-reduced version performed slightly worse than the full 859-D WordNet representation. Accordingly, we report results from the full 859-D WordNet features in the main text. We have clarified this point in the Methods section (p. 22).

      (On page 22) “We also evaluated a PCA-reduced 512-dimensional variant (fit within each training fold to avoid leakage); because this version performed slightly worse, we report results from the full 859-dimensional representation in the main text.”

      (7) I don't think authors have written what VALOR stands for.

      Thank you for the reminder. We now define the VALOR acronym at its first mention in the Abstract and Introduction and use the abbreviation thereafter.

      (On page 2) “Using a state-of-the-art deep learning model (VALOR; Vision-Audio-Language Omni-peRception)”

      (On page 5) “To answer this, we apply a video-text alignment encoding framework, using VALOR (Vision-Audio-Language Omni-peRception)—a high-performing, open-source model that aligns visual and linguistic features over time—to predict brain responses during movie watching.”

      (8) When calculating equation (3), please make sure that the correlation values are Fisher's r-to-z transformed.

      Thank you for this reminder. We confirm that all correlation coefficients used in Equation (3) are now Fisher r-to-z transformed prior to any averaging, contrasts, or statistical testing, and this procedure is now explicitly stated in the Methods. We have also updated Fig. 4a (p. 15) to reflect this transformation. Importantly, applying the r-to-z transformation does not change the qualitative pattern of results or their statistical significance.

      (9) I wasn't able to check the OSF data/codes because it required permission.

      Thank you for flagging this, and we apologize for the inconvenience. We have removed the permission restriction and set the OSF repository to public read-only access, which should resolve the issue.

      Reviewer #3 (Recommendations for the authors):

      (1) The current approach extracts features from a single "best" layer of each model, which may be suboptimal for predicting neural responses. Prior work has shown that combining features across multiple layers through optimized fusion strategies (e.g., St-Yves et al., 2023) or using model ensembles (e.g., Li et al., 2024) can substantially improve encoding performance. The authors may consider these more comprehensive approaches either as additional baselines or as alternative directions to enhance model accuracy.

      Thank you for this constructive suggestion. We agree that combining features across multiple layers or using optimized fusion and ensemble strategies, as demonstrated in recent work (e.g., St-Yves et al., 2023; Li et al., 2024), can substantially improve absolute encoding performance.

      In the present study, however, we intentionally evaluated each model using its single best-performing layer within a matched encoding pipeline. This design choice was made to maintain model-agnostic comparability and interpretability, and to ensure that performance differences could be attributed primarily to the type of representation (e.g., temporally informed video–text features versus static or unimodal features), rather than to differences in model complexity, parameter count, or fusion strategy. Importantly, this constraint was applied uniformly across all models and therefore does not favor VALOR over the baselines.

      We now explicitly note in the Discussion (p. 19) that multilayer fusion and ensemble approaches represent a natural and promising extension of our framework and are likely to further improve absolute prediction accuracy. Our goal in the current work was to establish the practical utility and generalizability of temporally aligned video–text features for naturalistic movie fMRI under a controlled and comparable evaluation setting..

      (On page 19) “Third, for comparability across models we evaluated each model using its single best-performing layer within a matched encoding pipeline rather than using multilayer fusion or ensembling, which allowed us to attribute performance differences to representational format but likely underestimates the absolute performance ceiling.”

      (2) Given the naturalistic video-based task, the manuscript would benefit from including state-of-the-art video-only models (e.g., Video Swin Transformer, VideoMAE, and other more recent architectures) as explicit baselines. These models are designed to capture spatiotemporal structure without relying on language input and would provide a more targeted comparison to assess the specific contribution of temporal visual processing.

      Thank you for this thoughtful suggestion. We agree that state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE) are highly relevant baselines for naturalistic movie paradigms and would provide a more targeted comparison for isolating the contribution of temporal visual processing independent of language input.

      In the present study, our primary goal was not to exhaustively benchmark all possible video architectures, but to evaluate whether temporally informed video–text features can serve as a practical and general-purpose encoding framework that improves upon the models most commonly used in cognitive neuroscience for naturalistic fMRI (e.g., AlexNet for vision, WordNet for semantic annotation, and CLIP for static multimodal alignment). Using these established baselines allowed us to place our results in direct continuity with prior neuroimaging work and to attribute performance differences to representational format under a controlled encoding pipeline.

      We agree that incorporating modern video-only spatiotemporal encoders is an important next step, particularly for disentangling the relative contributions of temporal visual structure and cross-modal video–text alignment. We now explicitly note this point in the Discussion (p.19) as a limitation and future direction, and view such comparisons as a natural extension of the current framework within the same TR-aligned encoding setup.

      (On page 19) “Second, we did not directly compare VALOR to state-of-the-art video-only spatiotemporal models (e.g., Video Swin Transformer, VideoMAE, and related architectures) that are designed to capture temporal visual structure without language grounding; such comparisons will be important for isolating the specific contributions of temporal visual processing versus cross-modal video–text alignment in naturalistic neural responses.”

      (3) An additional consideration is the scale of the AI models used for feature extraction. Previous studies (e.g., Matsuyama et al., 2023) have indicated that model size - particularly the number of parameters - can influence neural prediction performance, independently of architecture. A discussion or analysis of how model size contributes to the observed encoding gains would help clarify whether improvements are due to the representational quality of the model or simply its scale

      Thank you for this important point. We agree that model scale—particularly parameter count—can influence neural prediction performance independently of architecture, as noted in prior work (e.g., Matsuyama et al., 2023).

      In the present study, our primary goal was to evaluate whether temporally informed video–text representations provide practical advantages over unimodal and static multimodal baselines that are widely used in cognitive neuroscience for naturalistic movie fMRI, under a matched encoding pipeline. We did not perform a systematic scale-controlled analysis in this revision because doing so would require training or evaluating multiple size-matched variants across video-only and video–text architectures, which is beyond the scope of the current work.

      We therefore agree that part of the observed performance gains may reflect model capacity in addition to representational format, and we caution against attributing all improvements solely to cross-modal alignment or temporal structure. We now explicitly acknowledge this limitation in the Discussion and note that comparing size-matched video-only and video–text models within the same pipeline is an important next step for disentangling model scale from representational content.

      (On page 19) “Finally, part of VALOR’s advantage may reflect model capacity: larger pretrained models often yield higher encoding accuracy, so repeating these analyses with size-matched image-only and image–text models will be critical for disentangling model scale from representational content.”

    1. The advancement in additive manufacturing technologies has revolutionized thedesign and fabrication of complex structures, enabling the production of components withintricate geometries that were previously unattainable through traditional manufacturingmethods [ 1]. Fused Deposition Modeling (FDM) is a widely used additive manufacturingtechnique [ 2]. In FDM 3D printing, a thermoplastic filament is fed through a heatednozzle. The nozzle heats the filament to its melting point and then deposits it layer by layeraccording to the pre-designed model [3 ]. This layer-by-layer deposition process allows forthe creation of complex three-dimensional geometries. For continuous fiber composites inMaterials 2025, 18, 192 https://doi.org/10.3390/ma18010192Materials 2025, 18, 192 2 of 21FDM, it offers many unique advantages. The continuous fibers can be incorporated into thethermoplastic matrix during the extrusion process [4]. This enables the reinforcement of theprinted structure, enhancing its mechanical properties, such as strength and stiffness. Theability to precisely control the fiber orientation and distribution within each layer providesan opportunity to optimize the performance of the final structure. For example, by aligningthe fibers along the direction of the main stress, the load-bearing capacity of the structurecan be significantly improved. Additionally, FDM technology allows for relatively easycustomization and rapid prototyping, which is beneficial for fabricating complex structureswith different cell sizes and geometries to meet specific application requirements [5].

      In this paragraph it explains the process of 3D-Fused-Desposition-Modeling. To make this process make sense think of how a 3D-Printer works, it takes a filament and feeds it through a heated nozzle, then deposits it layer by layer.

    1. Reviewer #1 (Public review):

      Summary:

      Huang et al. examined ACC response during a novel discrimination-avoid task. The authors concluded that ACC neurons primarily encode post-action variables over extended periods, reflecting the animal's preceding actions rather than the outcomes or values of those actions. The authors have made considerable revisions to address the raised concerns. However, it appears that some important issues remain unresolved.

      Strengths:

      The inclusion of new figures and analyses in response to the reviews is appreciated, such as Fig. 2 and 5.

      Weaknesses:

      Motion related signal in ACC: the new Fig. 2E looks good, but it is hard to visualize how it is just a reordering of the old Fig. 5C.

      All categories in the new Fig. 4D appear to respond to shuttle initiation, with less than 1s latency. For example, type 2a/2b consists of 40% of the population and their response to movement onset is apparent. Thus, it is not clear whether most neurons respond to shuttle crossing as described in the manuscript.

      Could the authors use relatively simple analysis, such as comparing spike rate before and after crossing, or before and after initiation, to quantify the response properties of each neuron? This could also help validate the classification analysis performed in Fig. 4.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the current study, Huang et al. examined ACC response during a novel discrimination-avoid task. The authors concluded that ACC neurons primarily encode post-action variables over extended periods, reflecting the animal's preceding actions rather than the outcomes or values of those actions. Specifically, they identified two subgroups of ACC neurons that responded to different aspects of the actions. This work represents admirable efforts to investigate the role of ACC in task-performing mice. However, in my opinion, alternative explanations of the data were not sufficiently explored, and some key findings were not well supported.

      Strengths:

      The development of the new discrimination-avoid task is applauded. Single-unit electrophysiology in task-performing animals represents admirable efforts and the datasets are valuable. The identification of different groups of encoding neurons in ACC can be potentially important.

      Weaknesses:

      One major conclusion is that ACC primarily encodes the so-called post-action variables (specifically shuttle crossing). However, only a single example session was included in Figure 2, while in Supplementary Figure 2 a considerable fraction of ACC neurons appears to respond to either the onset of movement or ramp up their activity prior to movement onset. How did the authors reach the conclusion that ACC preferentially respond to shuttle crossing?

      We now include more example sessions and the main results from individual animals (Fig. 3; Figs. S2–S3; Fig. 8). Overall, the results are consistent across recording sessions and animals.

      While shuttle crossings were the primary reference for most analysis, using shuttle initiation as a reference led to similar conclusions (Fig.4). Namely, we found that most ACC neurons exhibit either robust (22%; Types 1a & 2a) or moderate (51%; Types 1b & 2b) post-shuttle activity changes (Fig.4), while only a subset exhibits ramping pre-shuttle activity (16%; Types 3b & 3c). Therefore, our conclusion was intended to highlight the role of post-shuttle activity in learning. While we do not exclude the possibility that pre-shuttle ACC activity contributes to learning, its involvement is likely more limited.

      In Figure 4, it was concluded that ACC neurons respond to action independent of outcome. Since these neurons are active on both correct and incorrect shuttle but not stay trials, they seem to primarily respond to overt movement. If so, the rationale for linking ACC activity and adaptive behavior/ associative learning is not very clear to me. Further analyses are needed to test whether their firing rates correlated with locomotion speed or acceleration/deceleration. On a similar note, to what extent are the action state neurons actually responding to locomotion-related signals? And can ACC activity actually differentiate correct vs. incorrect stays?

      In this study, we highlight two distinct groups of ACC neurons: action-state and action-content neurons. Both groups of neurons tend to show sustained activity even when the animals remain immobile after completing shuttle behaviors, suggesting that their activity is not directly driven by locomotion. Furthermore, action-content neurons are selectively engaged in only one of the two shuttle categories, either rooms A→B or B→A shuttles. Therefore, differences in neuronal activity are unlikely to reflect locomotor differences, given that both shuttle types involve similar movement patterns. Finally, we analyzed ACC neuronal activity in relation to locomotion speed. Our results indicate that only a small fraction of neurons (<15%) show speed-correlated activity (Fig.5), suggesting that most ACC neurons do not encode movement-related information. Taken together, these findings support the distinction between ACC activity and locomotion encoding.

      As for the small subset of speed-related neurons, it remains unclear whether these speed-related neurons represent a distinct subpopulation within the ACC or reflect recordings from the nearby motor cortex. Postmortem examination of the recording sites suggests that most neurons were recorded from the ACC, while a small subset may be located at the border between the ACC and motor cortex (Fig. S2). Therefore, it is possible that the small fraction of speed-related neurons originated from the motor cortex.

      Lastly, given that the ACC neurons display no or limited activity during stay trials, their activity generally does not differentiate correct vs. incorrect stays (Fig.S7). However, ACC activity does show moderate differentiation between room-A vs. room-B stays (Fig.S7).

      Given that a considerable amount of ACC neurons encode 'action content', it is not surprising that by including all neurons the model is able to make accurate predictions in Figure 6. How would the model performance change by removing the content neurons?

      We thank the reviewer for this thoughtful analysis idea. Excluding action-content neurons drastically reduces decoding accuracy (Fig.8), suggesting that they are the main drivers for differentiating rooms AB vs. BA shuttles.

      Moving on to Figure 7. Since Figure 4 showed that ACC neurons respond to movement regardless of outcome, it is somewhat puzzling how ACC activity can be linked to future performance.

      As discussed earlier (point #2), ACC activity does not simply reflect locomotion itself. We interpret the post-shuttle ACC activity as encoding both the preceding shuttle state (shuttle or stay) and shuttle content (rooms AB or BA). Regardless of the outcome (safety or shock), such encoding is essential for cue–action–outcome associative learning, because both positive and negative feedback can drive learning. The level of post-shuttle ACC activity may reflect task engagement, with greater engagement facilitating learning and improving future performance.

      Two mice contributed about 50% of all the recorded cells. How robust are the results when analyzing mouse by mouse?

      We have added further analysis of highlighting the results of each mouse. Although the total number of recorded neurons varied across mice, the major findings were consistent. In every mouse, we observed sustained post-shuttle ACC activity (Fig.S2), and population-level ACC activity reliably decoded shuttle contents (rooms AB vs. BA; Fig.8).

      Lastly, the development of the new discrimination-avoid task is applauded. However, a major missing piece here is to show the importance of ACC in this task and what aspects of this behavior require ACC.

      We appreciate this feedback. We are currently conducting additional experiments to determine whether inhibiting ACC activity during distinct time windows disrupts task learning. We hope to publish a follow-up paper on these findings in the near future.

      Reviewer #2 (Public review):

      Summary:

      The current dataset utilized a 2x2 factorial shuttle-escape task in combination with extracellular single-unit recording in the anterior cingulate cortex (ACC) of mice to determine ACC action coding. The contributions of neocortical signaling to action-outcome learning as assessed by behavioral tasks outside of the prototypical reward versus non-reward or punished vs non-punished is an important and relevant research topic, given that ACC plays a clear role in several human neurological and psychiatric conditions. The authors present useful findings regarding the role of ACC in action monitoring and learning. The core methods themselves - electrophysiology and behavior - are adequate; however, the analyses are incomplete since ruling out alternative explanations for neural activity, such as movement itself, requires substantial control analyses, and details on statistical methods are not clear.

      Strengths:

      (1) The factorial design nicely controls for sensory coding and value coding, since the same stimulus can signal different actions and values.

      (2) The figures are mostly well-presented, labeled, and easy to read.

      (3) Additional analyses, such as the 2.5/7.5s windows and place-field analysis, are nice to see and indicate that the authors were careful in their neural analyses.

      (4) The n-trial + 1 analysis where ACC activity was higher on trials that preceded correct responses is a nice addition, since it shows that ACC activity predicts future behavior, well before it happens.

      (5) The authors identified ACC neurons that fire to shuttle crossings in one direction or to crossings in both directions. This is very clear in the spike rasters and population-scaled color images. While other factors such as place fields, sensory input, and their integration can account for this activity, the authors discuss this and provide additional supplemental analyses.

      Weaknesses:

      (1) The behavioral data could use slightly more characterization, such as separating stay versus shuttle trials.

      We appreciate this feedback. In the revised manuscript, we present data separating stay versus shuttle trials (Fig.1). Additionally, we provide new data from extended training sessions (Fig.S2).

      (2) Some of the neural analyses could use the necessary and sufficient comparisons to strengthen the authors' claims.

      We have now used the necessary and sufficient comparisons where applicable. In the SVM decoding analysis, we show that population ACC activity is sufficient to decode AB or BA shuttles. We also show that excluding action-content, but not other ACC neurons, drastically reduces decoding accuracy, suggesting that these neurons are necessary for the decoding (Fig.8).

      (3) Many of the neural analyses seem to utilize long time windows, not leveraging the very real strength of recording spike times. Specifics on the exact neural activity binning/averaging, tests, classifier validation, and methods for quantification are difficult to find.

      We chose to perform our neural analyses on a longer time scale, given the sustained activity we see in the data. To further justify that decision, we now provide additional results highlighting the sustained activity of ACC neurons in our task (Fig.2; Fig.S2). Additionally, we now provide more specifics of the neural analyses in Methods section.

      (4) The neural analyses seem to suggest that ACC neurons encode one variable or the other, but are there any that multiplex? Given the overwhelming evidence of multiplexing in the ACC a bit more discussion of its presence or absence is warranted.

      This is an interesting point of discussion, and we thank the reviewer for pointing this out. Overall, our results suggest that individual ACC neurons preferentially engage in only one of the proposed functions, rather than multiplexing across them. For example, action-state and action-content ACC neurons primarily engage in action monitoring, but not in decision-making, planning, or outcome tracking. Nevertheless, we cannot rule out the possibility that other ACC neurons, through their distinct connectivity or location in different ACC subregions, engage in other proposed functions. Thus, when considering the ACC as a whole, its function may still be multiplexed.

      Another possible reason we do not see clear multiplexing of neurons may be due to the dynamic nature of our task. Unlike established tasks that often assign fixed positive or negative values to cues, the cues in our task are not inherently associated with valence. Instead, their meaning is dynamically determined by the animal’s location (context) at the time of cue presentation. Since values are not fixed and change based on context, value-related responses may not be reflected in the ACC in our tasks.

      We have now incorporated the above discussions into our revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      The authors record from the ACC during a task in which animals must switch contexts to avoid shock as instructed by a cue. As expected, they find neurons that encode context, with some encoding of actions prior to the context, and encoding of neurons post-action. The primary novelty of the task seems to be dynamically encoding action-outcome in a discrimination-avoidance domain, while this is traditionally done using operant methods. While I'm not sure that this task is all that novel, I can't recall this being applied to the frontal cortex before, and this extends the well-known action/context/post-context encoding of ACC to the discrimination-avoidance domain.

      While the analysis is well done, there are several points that I believe should be elaborated upon. First, I had questions about several details (see point 3 below). Second, I wonder why the authors downplayed the clear action coding of ACC ensembles. Third, I wonder if the purported 'novelty' of the task (which I'm not sure of) and pseudo-debate on ACC's role undermines the real novelty - action/context/outcome encoding of ACC in discrimination-avoidance and early learning.

      Strengths:

      Recording frontal cortical ensembles during this task is particularly novel, and the analyses are sophisticated. The task has the potential to generate elegant comparisons of action and outcome, and the analyses are sophisticated.

      Weaknesses:

      I had some questions that might help me understand this work better.

      (1) I wonder if the field would agree that there is a true 'debate' and 'controversy' about the ACC and conflict monitoring, or if this is a pseudodebate (Line 34). They cite 2 very old papers to support this point. I might reframe this in terms of the frontal cortex studying action-outcome associations in discrimination-avoidance, as the bulk of evidence in rodents comes from overtrained operant behavior, and in humans comes from high-level tasks, and humans are unlikely to get aversive stimuli such as shocks.

      We appreciate this feedback. We have revised the Introduction and Discussion.

      (2) Does the purported novelty of the task undermine the argument? While I don't have an exhaustive knowledge of this behavior, the novelty involves applying this ACC. There are many paradigms where a shock triggers some action that could be antecedents to this task.

      We argue our newly designed discrimination–avoidance task is unique for several reasons. First, it requires animals to discriminate both sensory cues and environment contexts. Unlike established tasks that often assign fixed positive or negative values to cues, the cues in our task are not inherently associated with valence. Instead, their meaning is dynamically determined by the animal’s location (context) at the time of cue presentation, which reflects a conceptual advance over previous techniques. Furthermore, by removing valence from the cues, this design helps disentangle the ACC’s potential role in value encoding from other cognitive functions.

      Second, this task involves robust, ethologically relevant actions (i.e., shuttles), unlike many established paradigms that rely on less naturalistic behaviors such as saccades or lever presses. We view this as a key distinction from prior approaches, as even previous paradigms that utilize shutting responses or other naturalistic responses, fail to incorporate dynamic integration of cues and contexts.

      Finally, the clear temporal separation between actions and outcomes further helps disentangle the ACC’s roles in action monitoring vs. outcome tracking.

      (3) The lack of details was confusing to me:

      (a) How many total mice? Are the same mice in all analyses? Are the same neurons? Which training day? Is it 4 mice in Figure 3? Five mice in line 382? An accounting of mice should be in the methods. All data points and figures should have the number of neurons and mice clearly indicated, along with a table. Without these details, it is challenging to interpret the findings.

      We are sorry for the confusion. We now provide additional details and clear N numbers for each analysis to improve clarity.

      (b) How many neurons are from which stage of training? In some figures, I see 325, in some ~350, and in S5/S2B, 370. The number of neurons should be clearly indicated in each figure, and perhaps a table.

      All data were obtained from well-trained mice. For some analyses, the N is smaller because certain task sessions contained very few incorrect trials (≤3), which prevented us from examining ACC activity during those trials. We have modified figure legend so that neuron count is clear.

      (c) Were the tetrodes driven deeper each day? The depth should be used as a regressor in all analyses?

      Yes, the tetrodes were driven slightly deeper across task sessions (~80 µm per step; 2–4 depths per mouse). Given limited depth changes, preliminary analyses indicate no clear differences in ACC activity across these recording depths. However, we cannot rule out potential dorsal–ventral subregion differences if recordings were to span larger depth ranges.

      (d) Was is really ACC (Figure 2A)? Some shanks are in M2? All electrodes from all mice need to be plotted as a main figure with the drive length indicated.

      We have now included a supplementary figure showing all recording sites (Fig.S2). It is likely that a small subset of neurons was recorded at the ACC/M2 border area. Unfortunately, we are unable to separate them out due to blind recording design of our tetrode arrays.

      (e) It's not clear which sessions and how many go into which analysis

      We have now specified the number of task sessions for each analysis (see Methods).

      (f) How many correct and incorrect trials (<7?) are there per session?

      We have now specified the number of correct and incorrect trials per session (see Methods).

      (g) Why 'up to 10 shocks' on line 358? What amplitudes were tried? What does scrambled mean?

      We decided to use up to 10 mild shocks per trial because mice do not necessarily shuttle to the safe room after one or even a few shocks during the early stages of training. This design allows mice to efficiently learn the concept of the task (i.e., one room is safe while the other delivers shocks). Each shock was specified in the Methods section as 0.5 mA, 0.1 s. A “scrambled shock” refers to an electric shock delivered through multiple floor bars in a randomized pattern, effectively preventing the animal from avoiding the stimulus.

      (4) Why do the authors downplay pre-action encoding? It is clearly evident in the PETHs, and the classifiers are above chance. It's not surprising that post-shuttle classification is so high because the behavior has occurred. This is most evident in Figure S2B, which likely should be a main figure.

      We did not intend to downplay pre-action encoding. Our analysis shows that most ACC neurons exhibit either robust (22%; Types 1a & 2a) or moderate (51%;Types 1b & 2b) post-shuttle activity changes (Fig.4). Although a subset of ACC neurons exhibits ramping pre-shuttle activity, they represent a much smaller fraction (16%; Types 3b & 3c). Therefore, our conclusion was intended to highlight the role of post-shuttle activity in learning. While we do not exclude the possibility that pre-shuttle ACC activity contributes to learning, its involvement is likely more limited

      (5) The statistics seem inappropriate. A linear mixed effects model accounting for between-mouse variance seems most appropriate. Statistical power or effect size is needed to interpret these results. This is important in analyses like Figure 7C or 6B.

      We appreciate this feedback. We now use appropriate statistics and report effect size.

      (6) Better behavioral details might help readers understand the task. These can be pulled from Figures S2 and S5. This is particularly important in a 'novel' task.

      We now provide more details to help better understand the task and have added new figures (Fig.1; Figs. S1&S2).

      (7) Can the authors put post-action encoding on the same classification accuracy axes as Figure 6B? It'd be useful to compare.

      We appreciate the comment, but we are unsure what clarification is being requested.

      (8) What limitations are there? I can think of several - number of animals, lack of causal manipulations, ACC in rodents and humans.

      We now include discussions on limitation of our study. One caveat of our study is that the discrimination–avoidance task requires weeks of training in mice. By the time they master the task, ACC activity may reflect modified neural circuits. Investigating ACC activity during early phase of learning, such as by introducing a new pair of cues or contexts, could provide further insights into ACC’s role in learning and cognitive processes. Additionally, a limitation of the current study is the lack of evidence for the causal role of post-action ACC activity in complex associative learning. Future investigations using closed-loop strategies to selectively disrupt ACC activity during the post-action phase could help address this question.

      Minor:

      (1) Each PCA analysis needs a scree plot to understand the variance explained.

      We have added a scree plot for each PCA analysis.

      (2) Figure 4C - y and x-axes have the same label?

      We have corrected the y-axis label.

      (3) What bin size do the authors use for machine learning (Not clear from line 416)?

      The bin sizes used were 2.5, 5, 7.5, or 10 sec which have now been discussed in the Methods section.

      (4) Why not just use PCA instead of 'dimension reduction' (of which there are many?)

      We have adjusted the phrasing where appropriate.

      (5) Would a video enhance understanding of the behavior?

      We appreciate this feedback. We now include a few videos to accompany our paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Is Figure 1C sufficiently powered?

      We have now included data from additional mice and updated the figure accordingly.

      (2) Task performance was not plateaued after 10 sessions in Figure 1B. How variable is task performance in the datasets with ephys recordings (session to session, mouse to mouse).

      We have now included additional data from extended training (15 sessions; Fig.S2). Moderate variations across both sessions and mice are observed. Specifically, the total number of correct/incorrect shuttles used for ephys analysis are 19/5, 19/4, 21/5, 20/4 (mouse #1; 4 sessions); 20/7, 23/7, 20/7 (mouse #2; 3 sessions); 19/4, 16/2 (mouse #3; 2 sessions); 26/4, 23/4, 17/6, 25/5 (mouse #4; 4 sessions); 20/5, and 17/4 (mouse #5; 2 sessions), respectively.

      (3) Please quantify the results in Figure 3, for both within individual mice and across mice.

      We have calculated maximum trajectory length within the 3-D space (Fig. 3C).

      (4) What is the effect size in Figure 7C?

      We now report the effect size.

      (5) Please provide more details for spike sorting.

      We have now included more details in the Methods section.

      (6) More detailed cell type or correlation analysis in Figures 4 and 5 may be helpful. For example, if putative regular and fast-spiking neurons were simultaneously recorded, did the FS directly inhibit the RS to give rise to the apparent encoding properties?

      We recorded a small number of putative interneurons (n = 13) from only three mice, which precludes drawing meaningful conclusions, particularly given their heterogeneous responses during discrimination–avoidance tasks. Accordingly, we include only an example interneuron demonstrating discrimination between AB vs. BA shuttles (Fig. S5). Nevertheless, it is evident there are reciprocal monosynaptic connections between putative interneurons and certain pyramidal neurons, as indicated by short-latency (~2 ms) excitatory or inhibitory interactions (Fig. S5). That said, follow up studies with greater Ns are needed to parse out these details

      Reviewer #2 (Recommendations for the authors):

      (1) While I appreciate displaying the success rate for the sake of simplifying behavioral data in Figure 1B, it would be nice to also see these data broken out as correct vs incorrect for stay vs shuttle trials, since it is difficult to determine whether the performance increases are primarily driven by mice improving at stay vs shuttle responses

      We appreciate this feedback. In the revised manuscript, we present data separating stay versus shuttle trials (Fig.1; Fig.S2).

      (2) In Figure 2 the comparison between shuttle and stay is not particularly convincing, since the comparison is also essentially movement vs no movement and place1-->place2 vs place1-->place1. A more appropriate comparison might be action state neurons vs action content neurons during A-->B, B-->A, or both crossings. If it is true that these populations contain this information, then action state neurons should traverse a large component space in both directions, action content neurons only one direction, and so on.

      We agree that the comparison is not ideal due to differences in locomotion. However, it provides valuable information suggesting that the ACC plays a limited role during stay trials, despite these trials involve mental and cognitive processes comparable to shuttle trials. While we appreciate the reviewer’s suggestion, the proposed analysis is not particularly reliable given the relatively small number of simultaneously recorded action-state or action-content neurons.

      (3) I would say the above point applies to Figure 3 as well. I would also note that this reviewer greatly appreciates the rigor of showing ensemble activity in each subject.

      We appreciate this comment. See our response above.

      (4) In Figure 5 do these neurons show the same A-->B vs B-->A firing patterns during correct vs incorrect shuttles? The text describing the data in Figure 4 suggests this should be the case but even from a quick glance it sort of seems like the population dynamics during correct vs incorrect shuttles are not the same. My concern is that averaging neural activity over 5s windows washes out all these dynamics

      Preliminary analysis suggests that these firing patterns apply to both correct and incorrect shuttles. However, the main reason we did not compare correct and incorrect trials is the limited amount of data. In many sessions, there are only a few (≤5) incorrect shuttles, which include both AB or BA shuttles (Fig.1C; Fig.S2), thus lacking the statistical power for a meaningful comparison.

      (5) Some information on classifier validation is required - was this leave-out validation and if so how many trials were left-out vs tested? K-fold, and if so, how many folds? Was the trial order shuffled for each simulation? Classifiers will pick up within-session temporal information. In addition to this classifier accuracy during the different time points should be compared by a non-parametric test, and compared to the 95th percentile of the label-shuffled distribution.

      Yes, we use standard 10-fold cross-validation. We appreciate the suggestion on trial-order shuffling, and implementing this procedure does not change our original conclusion. Additionally, we have applied a non-parametric test.

      (6) How exactly were neurons classified as content vs state? Was it the average activity during the 5s following the shuttle? If this is stated I could not really find it easily so I might suggest clarifying.

      We now use a new method for classification of the two neuron types (Fig.7). We have included detailed methods in the revised manuscript.

      (7) Movement drives cortical neuron activity more than anything else I have ever seen. Really, more than anything else, it would be nice to demonstrate that it is not movement alone or movement multiplexed with place/sensory information/direction driving these responses.

      We have analyzed ACC neuronal activity in relation to locomotion speed. Our results indicate that only a small fraction of ACC neurons (<15%) show speed-correlated activity (Fig.5). It remains unclear whether these speed-related neurons represent a distinct subpopulation within the ACC or reflect recordings from nearby motor cortex. Postmortem examination of the recording sites suggests that most neurons were recorded from the ACC, while a small subset may be located at the border between the ACC and motor cortex. Therefore, it is possible that the small fraction of speed-related neurons originated from the motor cortex.

      Furthermore, we identify two distinct groups of ACC neurons: <iaction-state and action-content neurons, both of which tend to show sustained activity even when the animals remain immobile after completing shuttle behaviors. This prolonged activation in the absence of movement suggests that their activity is not directly driven by locomotion. Moreover, action-content neurons are selectively engaged in only one of the two shuttle categories, either rooms AB or BA shuttles. Therefore, differences in neuronal activity are unlikely to reflect locomotor differences, given that both shuttle types involve similar movement patterns.

      (8) In addition to the above, the place-field analysis in Supplemental Figure 5 only shows 4 neurons. Was the whole population analyzed? Is it possible to decode place from the population during the ITI? The data in this figure sort of look exactly like place fields - many cortical neurons and also some hippocampal neurons have more than 1 place field

      We have now provided additional place-field analysis. A comparison with hippocampal CA1 neurons (recorded during the same task) suggests that ACC neurons encode limited spatial information.

      (9) "a simple Pavlovian association strategy is unlikely to be sufficient for learning the task" ... is Pavlovian occasion setting not a simple association? Tones and contexts both readily act as Pavlovian occasion setters. Similarly positive/negative patterning might also explain how the task is learned.

      We appreciate this comment and have revised the sentence accordingly. It is possible that animals use multiple strategies to learn and perform the task effectively. In the early stages, animals may rely more heavily on sensory–spatial integration, whereas in later stages, sensory- or location-related Pavlovian associative strategies may contribute to performance, particularly when animals begin to show place preferences during inter-trial intervals.

      (10) I might suggest softening this language and others like it. For example, 2x2 factorial designs are not really novel.

      We have revised the language used to describe the task.

      (11) Some of the color-scale bars and figures do not have labels. For example, Supplementary Figure 3, Supplementary Figure 5. Please add labels.

      We have added the missing labels to all color bars.

      Reviewer #3 (Recommendations for the authors):

      (1) Some relevant papers that should be cited:

      https://doi.org/10.1523/JNEUROSCI.4450-08.2008

      10.1016/j.neuron.2018.11.016

      https://doi.org/10.1016/j.jphysparis.2014.12.001

      We appreciate these suggestions.

      (2) Where can we download the data and code?

      We will upload the essential data and MATLAB code to GitHub to accompany the publication of the final version of this paper.

    1. Reviewer #3 (Public review):

      Summary:

      The manuscript addresses an important, yet unresolved and long-debated, question: whether atypical protein kinase C is required for the maintenance of late-long-term synaptic potentiation (L-LTP) and long-term memory (LTM). The authors confirm previous findings that persistent activity of PKMζ is required for hippocampal L-LTP and spatial memory. They demonstrate that genetically deleting PKCι/λ and PKMζ individually induces compensatory upregulation, whereas deleting both atypical PKCs abolishes hippocampal L-LTP spatial long-term memory. The study uses an elegant combination of immunoblots, electrophysiology, and behavioral assays. The use of Cre-recombinase to target specific hippocampal regions and neurons adds to the rigor of the findings.

      Strengths:

      The manuscript addresses an important, yet unresolved and long-debated, question; whether PKMζ is required for the maintenance of L-LTP and LTM. The study demonstrates that PKCι/λ, which was previously shown to be critical for the initial generation of the early phase of LTP and short-term memory, becomes persistently active in L-LTP and LTM in a PKMζ knock-out model, compensating for the loss of PKMζ. Furthermore, when the compensation mechanisms are eliminated by simultaneous deletion of both PKMζ and PKCι/λ, maintenance of LTP and long-term spatial memory, but not of short-term memory, is diminished. The strength of this study is that the authors used a double-knockout strategy to directly address the controversy concerning the roles of PKMζ in memory formation. By showing that PKCι/λ compensates when PKMζ is deleted, the authors provided a compelling explanation for previous contradictory findings.

      Weaknesses:

      (1) The authors should provide the numerical values for all data.

      (2) It appears that blind procedures were only used for the behavioral experiments. Some explanation is warranted.

      (3) The description of the immunoblotting procedures lacks sufficient detail. The authors state that immunoblots were stained with multiple antisera to visualize multiple PKCs on the same immunoblot. To conserve antisera, the immunoblots were cut to isolate the relevant proteins based on molecular weight. Isoforms with similar molecular weights were either stained with antisera of different species or on separate blots. Despite this explanation, it is unclear how immunoblotting was performed in practice. For example, in Figure 1B, the authors compared the changes of four conventional PKC isoforms. Because all four antibodies are mouse monoclonal antibodies recognizing proteins of similar molecular weights, each probing should presumably have its own actin loading controls. However, these controls are missing from the figure. Some clarification is warranted.

      (4) The statement in the legend to Figure 4B, that the increases of maximum avoidance time from pretraining to trial 1 are not different, indicates both groups of mice successfully established short-term memory, which is not correct. The analysis only reveals that there is no difference between the two groups. No differences could be due to both groups learning the same, as the authors suggest, or alternatively to no learning in either group.

      (5) The labeling on some of the illustrations (e.g., Figure 2B) is unreadable.

      (6) In Figure 4B, only the single statistical comparison between "pretaining" and "1 trial" is shown. The other comparisons described in the legend should also be illustrated.

      (7) There is no documentation to support the statement that "The prevailing textbook mechanism for how memory is retained asserts that stable structural changes at synapses, the result of initial protein synthesis and growth, sustain memory without the need for ongoing biochemical activity dedicated to storing information" or for the statement in the Discussion that the structural model of memory storage is the standard account.

    1. Reviewer #1 (Public review):

      In this paper, the authors use a doxycycline-inducible DLD1 cell line expressing a Clover-tagged RNA-binding-defective TDP-43 2KQ mutant that forms nuclear "anisosomes" (TDP-43 shell with HSP70 core) to carry out a small-molecule screen using the LOPAC 1280 library to identify compounds that reduce anisosome number or shift their morphology and dynamics. They also conducted a genome-wide siRNA screen to identify genetic modifiers of anisosome formation and dynamics. From these screens, the authors identify pathways in RNA splicing, translation, proteostasis (proteasome and HSP90), and nuclear transport, including XPO1. They then focus on XPO1 as their primary hit. Pharmacological inhibition of XPO1 using KPT-276, Verdinexor, and Leptomycin B reduces anisosome number while enlarging remaining condensates, which retain liquid-like behavior by FRAP and fusion assays. XPO1 overexpression causes fewer, enlarged TDP-43 puncta, including cytoplasmic puncta, with little or no FRAP recovery, interpreted as gel or solid-like aggregates. Anisosome induction reduces detectable nucleoplasmic XPO1 staining. Finally, the authors examine a homozygous TDP-43 K181E iPSC-derived forebrain organoid model, showing increased cytosolic pTDP-43 in K181E/K181E organoids compared to wild-type controls. Chronic low-dose KPT-276 reduces cytoplasmic pTDP-43 without changing total TDP-43 levels. Bulk RNA-seq shows only a modest fraction of dysregulated genes in K181E/K181E organoids are rescued by KPT-276. They conclude that nuclear export, via XPO1, is a key regulator of TDP-43 liquid-to-solid phase transitions and that cytoplasmic aggregation per se may contribute only modestly to TDP-43 proteinopathy, with RNA-processing defects being dominant.

      The study presents well-executed chemical and genome-wide siRNA screens in a DLD1 TDP-43 2KQ anisosome model and follows up on nuclear transport, particularly XPO1, as a modulator of TDP-43 phase behavior and cytoplasmic aggregation. The screens are impressive in scale, and the microscopy and fluorescence recovery after photobleaching (FRAP) work is technically strong. However, the central mechanistic and disease-relevance claims are not yet sufficiently supported. There are major concerns about the heavy reliance on non-physiological, RNA-binding-defective, and acetylation-mimetic TDP-43 (2KQ) and a homozygous TDP-43 K181E organoid model. An underdeveloped and partly contradictory mechanistic link exists between XPO1 and TDP-43 phase transitions in the context of prior work showing TDP-43 is not a canonical XPO1 cargo. The paper also appears to overinterpret organoid data to conclude that cytoplasmic TDP-43 aggregation plays only a minor role in pathology, based largely on pTDP-43 antibody staining with limited sensitivity and relatively modest rescue readouts. A deeper mechanistic analysis and additional, more physiological validation are needed for this to reach the level of rigor and impact implied by the title and abstract. The work feels screen-rich but conceptually underdeveloped, with key claims outpacing the data. A major revision with substantial new data and tempering of conclusions is warranted. I outline several problematic areas below:

      (1) The central mechanistic discoveries are derived almost entirely from a DLD1 colon cancer cell line overexpressing an RNA-binding-defective, acetylation-mimetic TDP-43 2KQ mutant and homozygous TDP-43 K181E iPSC-derived organoids. Both systems are far from physiological. The 2KQ mutation is a synthetic double lysine-to-glutamine mutant originally designed to mimic acetylation and disrupt RNA binding. In this study, essentially all cell-based mechanistic data on phase behavior, screens, and XPO1 effects rely on 2KQ. Yet there is no quantification of how much endogenous TDP-43 is acetylated in degenerating human neurons, nor whether a 2KQ-like acetylation state is ever achieved in vivo. It is not established that the phase behavior of 2KQ recapitulates the physiological or pathological phase behavior of wild-type TDP-43 or genuine disease-linked mutants, which may retain partial RNA binding and different post-translational modification patterns. As a result, it is difficult to know whether the modifiers identified here regulate a highly artificial 2KQ condensate or physiologically relevant TDP-43 condensates. To address this concern, the paper would benefit from quantifying endogenous TDP-43 acetylation at the relevant lysines in control and ALS/FTD patient tissue or more disease-proximal models such as heterozygous TARDBP mutant iPSC neurons, which would justify the focus on an acetyl-mimetic mutant. Key phenomena, including XPO1 dependence of phase behavior, effects of proteasome and HSP90 inhibition, and effects of splicing and translation inhibitors, should be tested for wild-type TDP-43 expressed at near-physiological levels and for one or more bona fide ALS/FTD-linked TARDBP mutants that are not acetyl mimetics. At a minimum, the authors should show that endogenous TDP-43 in neuronally differentiated cells exhibits qualitatively similar responses to XPO1 modulation, rather than exclusively relying on DLD1 2KQ overexpression.

      (2) The organoid model is based on a homozygous K181E knock-in line. However, in patients, TARDBP mutations are overwhelmingly heterozygous. Homozygosity is thus a severe, arguably non-physiological sensitized background that may exaggerate nuclear RNA mis-splicing and phase defects and alter the relative contribution of cytoplasmic aggregation versus nuclear loss-of-function. In addition, it is not fully clear from this manuscript whether the structures in K181E organoids are bona fide anisosomes as defined in Yu et al. 2021, characterized by HSP70-enriched central liquid cores with TDP-43 shells and similar FRAP and fusion behavior to anisosomes in the DLD1 model. At present, the organoid section is framed as validation of "anisosome-bearing organoids," but the figures in this manuscript mainly show pTDP-43 puncta and total TDP-43 immunostaining, without detailed structural or biophysical characterization. The authors should explicitly compare heterozygous K181E/+ organoids or another heterozygous TARDBP mutant line with homozygous K181E/K181E organoids to assess whether XPO1 inhibition has similar effects in a genotype that more closely resembles patient genetics. They should provide direct evidence that the K181E condensates in organoids are anisosomes through HSP70 core immunostaining, three-dimensional reconstruction, and FRAP measurements, and clarify whether KPT-276 is acting on anisosome-like structures or more generic cytoplasmic aggregates or puncta. Without this, the leap from a DLD1 2KQ cancer cell model to human ALS/FTD-relevant neurons is not convincingly supported.

      (3) The title and framing assert that "nuclear export governs TDP-43 phase transitions." However, prior studies such as Pinarbasi et al. 2018 and Duan et al. 2022 indicate that TDP-43 is not a canonical XPO1 cargo and that its export is largely passive, with active nuclear import being the dominant determinant of nuclear localization. The authors cite these studies but still position XPO1 as a central, quasi-direct regulator. The data presented are largely correlative or based on pharmacologic manipulation and overexpression in an overexpression mutant background, with no direct evidence that XPO1 engages TDP-43 in a specific, regulated manner. Even if XPO1 does not engage WT TDP-43, it could still engage the 2KQ variant, which needs to be tested.

      (4) The XPO1 perturbations yield somewhat confusing phenotypes. XPO1 inhibition using Leptomycin B, KPT-276, and Verdinexor reduces anisosome number and enlarges remaining anisosomes, which remain liquid-like by FRAP recovery and fusion assays and stay nuclear. XPO1 overexpression causes fewer, enlarged puncta, but these are FRAP-impaired (gel-like) and redistribute to the cytoplasm. Thus, both decreased and increased XPO1 activity reduce anisosome number and enlarge puncta, but with opposite phase behaviors and subcellular localizations. The model presented in Figure 5L is relatively qualitative and does not resolve these issues. Moreover, XPO1 inhibition globally impairs nuclear export of many cargos and profoundly alters the nuclear environment, transcription, RNA processing, and chromatin. It is therefore difficult to conclude that the observed effects are specific to TDP-43 phase regulation as opposed to secondary consequences of broad nuclear export blockade.

      (5) The authors show that anisosome induction depletes nucleoplasmic XPO1 signal and that mCherry-XPO1 can be seen in some TDP-43 puncta. However, antibody penetration into anisosomes is limited, so XPO1 depletion from nucleoplasm could reflect sequestration in the anisosome shell or core, but this is not demonstrated. There is no demonstration of physical interaction, even indirect interaction, between XPO1 and TDP-43 or a defined adaptor, nor identification of a specific mutant of XPO1 that selectively disrupts this putative interaction while preserving other functions. The known TDP-43 NES has been shown to be weak and not a functional XPO1-dependent NES in multiple studies. If XPO1 is acting through an adaptor that recognizes 2KQ or K181E specifically, that by itself would bring into question the generality of the mechanism for wild-type TDP-43.

      (6) To support a mechanistic claim that nuclear export governs TDP-43 phase transitions, more targeted evidence is needed. The authors should test whether siRNA knockdown or CRISPR interference of XPO1 in the DLD1 2KQ model reproduces the effects seen with Leptomycin B and KPT-276, including FRAP and fusion phenotypes, and verify on-target effects by rescue with an siRNA-resistant XPO1 construct. They should demonstrate that canonical XPO1 cargos behave as expected under the inhibitor conditions used, as a positive control, and that the concentrations used are not grossly toxic. They should attempt to identify or at least constrain candidate adaptors that might enable XPO1-dependent export of TDP-43 through proteomic analysis of XPO1 co-purifying with 2KQ condensates or loss-of-function studies of candidate adaptors from the siRNA screen. Finally, they should test whether a TDP-43 mutant that cannot bind the proposed adaptor still responds to XPO1 manipulation.

      (7) Even with these data, what is currently shown is that global modulation of nuclear export capacity can alter the phase behavior and localization of a highly overexpressed RNA-binding-defective TDP-43 mutant and of K181E in organoids. This is important, but it is weaker than asserting that XPO1 directly governs TDP-43 phase transitions in physiological contexts. The title, abstract, and Discussion should be tempered to reflect that nuclear export is one of several pathways, alongside RNA splicing, translation, and proteostasis, that influence TDP-43 phase states in this model, and that the specific mechanism and cargo relationship between XPO1 and TDP-43 remain unresolved and may be indirect.

      (8) The authors conclude that cytoplasmic TDP-43 aggregation plays only a modest role in TDP-43 proteinopathies because in homozygous K181E organoids, chronic KPT-276 treatment almost abolishes cytoplasmic pTDP-43 puncta, yet bulk RNA-seq shows only a relatively small fraction of dysregulated genes are rescued. There are several issues with this inference. Relying primarily on pTDP-43 antibody staining to define cytoplasmic TDP-43 aggregation is limiting. pTDP-43 antibodies label only phosphorylated species and may miss non-phosphorylated, oligomeric, or amorphous TDP-43 species that could still be toxic. Different pTDP-43 antibodies vary in epitope accessibility depending on aggregate conformation and subcellular location. More sensitive approaches, such as high-affinity TDP-43 RNA aptamer probes developed by Gregory and colleagues, biochemical fractionation for SDS-insoluble and urea-soluble TDP-43, and filter-trap assays, would provide a more quantitative assessment of cytoplasmic aggregation and its reduction by KPT-276. Without these, it is not safe to assume that cytoplasmic aggregation has been eliminated, as opposed to one antigenic subclass.

      (9) The treatment window, spanning from day 87 to 122 with 20 nanomolar KPT-276, may be too late or too mild to reverse entrenched nuclear RNA-processing defects, even if cytoplasmic inclusions are cleared. Once widespread cryptic exon inclusion and alternative polyadenylation misregulation are established, many downstream changes may become self-sustaining or only partially reversible. Moreover, XPO1 inhibition will massively rewire nucleocytoplasmic transport of many transcription factors, splicing factors, and RNA-binding proteins. Thus, the lack of full transcriptomic rescue cannot be cleanly interpreted as evidence that cytoplasmic aggregates are only modest contributors. It may instead reflect that nuclear dysfunction is primary and XPO1 inhibition does not correct, and may even exacerbate, certain nuclear defects.

      (10) To support a causal statement about the modest contribution of cytoplasmic aggregates, one would want more direct measures of neuronal health and function, such as cell death, neurite complexity, synaptic markers, and electrophysiology before and after KPT-276, not only transcriptomics. A way to selectively reduce cytoplasmic aggregation without globally inhibiting nuclear export would allow comparison of outcomes.

      (11) Given these caveats, the concluding statements that cytoplasmic TDP-43 aggregation is only a modest contributor should be substantially softened. A more defensible interpretation is that in this homozygous K181E organoid model, chronic global XPO1 inhibition reduces pTDP-43-positive cytoplasmic puncta but only partially normalizes the steady-state transcriptome, suggesting that persistent nuclear RNA-processing defects and other pathways continue to drive pathology.

      (12) The screens are a major strength but need more rigorous validation for key hits, especially nuclear transport factors. For the siRNA screen, hits are filtered by anisosome number per nucleus, but there is no direct demonstration in the main text that XPO1 or CSE1L knockdown is efficient at the messenger RNA or protein level. For the highlighted genes, Western blot or quantitative polymerase chain reaction validation and phenotypic rescue would strengthen confidence. For small-molecule hits, it is not systematically shown that anisosome modulation is independent of changes in total TDP-43 2KQ expression or gross toxicity. Translation inhibitors are tested for this, but for many other hits, including proteasome, HSP90, and kinase inhibitors, expression and general nuclear structure should be monitored. Given the reliance on anisosome count as a readout, secondary screens that specifically distinguish changes in TDP-43 expression levels, changes in nuclear morphology or cell cycle, and specific changes in anisosome phase behavior, including FRAP and fusion for top hits, would greatly increase interpretability.

      (13) The classification of condensates as liquid versus gel-like or solid is based almost entirely on FRAP recovery or lack thereof. While FRAP is appropriate, interpretations could be made more robust by including half-region-of-interest bleach controls and assessing mobile fractions and recovery kinetics more quantitatively across conditions. Complementing FRAP with other phase-behavior assays such as sensitivity to 1,6-hexanediol, shape relaxation after deformation, and coarsening behavior over longer timescales would strengthen the analysis. At present, some assignments, such as that XPO1 overexpression drives a gel-like transition, are reasonable but somewhat qualitative.

      (14) For the Leptomycin B and KPT-276 experiments in cells and organoids, it would be important to confirm that canonical XPO1 cargo proteins accumulate in the nucleus and that the concentrations used are within a range that is not overtly toxic over the experimental timeframe. Assessing nuclear morphology, chromatin condensation, and general transcriptional activity through global RNA synthesis or key reporter genes would ensure that observed effects are not secondary to severe global nuclear export collapse.

      (15) In the organoid section, it is not clear how many independent iPSC clones and organoid batches were used per condition, nor whether batch effects were assessed in the bulk RNA-seq analysis. This should be fully specified and ideally controlled with isogenic wild-type and K181E clones. For transcriptional rescue, it is important to know whether the changes in wild-type organoids treated with KPT-276 are negligible. A direct wild-type comparison with or without KPT-276 is important to disentangle general drug effects from K181E-specific rescue. More detailed quantification of total TDP-43 and pTDP-43 in both nuclear and cytoplasmic fractions, including biochemical fractionation if possible, would strengthen the assertion that KPT-276 specifically reduces cytosolic pTDP-43 aggregates while sparing nuclear TDP-43.

      (16) Beyond the core issues above, several additions could greatly enhance the impact. The manuscript currently emphasizes XPO1, but the genetic and chemical data clearly implicate RNA splicing, translation, and proteostasis as equally strong or stronger regulators of TDP-43 phase states. A more integrated model that explains how these pathways intersect, for example, how splicing factor availability, ribosome loading, and proteasome capacity co-govern anisosome nucleation, growth, and hardening, would be valuable.

      (17) A key unresolved question is whether XPO1 is acting directly on TDP-43, or instead primarily regulates anisosomes by exporting other factors that more proximally control TDP-43 phase behavior. Given that TDP-43 is not a canonical XPO1 cargo and prior work indicates that its nuclear export is largely passive, it seems at least as plausible that XPO1 inhibition alters the nuclear concentration or localization of splicing factors, RNA-binding proteins, chaperones, or other modifiers identified in the screens, and that changes in these proteins secondarily reshape anisosome dynamics. In other words, XPO1 may be exporting a more direct regulator of anisome formation and hardening, rather than exporting TDP-43 itself in a specific, regulated way. The current data do not distinguish between these possibilities. Systematic identification of XPO1-dependent cargos that colocalize with or biochemically associate with anisosomes, combined with targeted perturbation of their nuclear export, would be needed to determine whether the relevant XPO1 substrate in this system is actually TDP-43 or an upstream modulator of its phase behavior.

      (18) Testing whether identified modifiers converge on nuclear TDP-43 concentration would be informative. Since phase separation is concentration-dependent, measuring nuclear versus cytoplasmic TDP-43 levels across key perturbations, including splicing inhibition, translation inhibition, proteasome inhibition, HSP90 inhibition, and XPO1 modulation, would help determine whether modifiers mainly work by changing nuclear TDP-43 concentration or by altering interaction networks and the material properties of condensates.

      (19) Examining other ALS-relevant RNA-binding proteins would be valuable. Given the role of XPO1 and other hits, it would be informative to briefly test whether similar principles apply to FUS, hnRNPA1, or other ALS-relevant RNA-binding proteins in the same cellular context, to argue for generality versus TDP-43-specific idiosyncrasies of the 2KQ system.

      (20) The Introduction sometimes implies that anisosomes are common and well-established intermediates en route to pathology. It would be helpful to more clearly state that, to date, anisosomes are primarily observed in overexpression and mutant systems and have not yet been unequivocally demonstrated in human patient tissue. The link between PDGFRβ, PAK4, GSK-3β, and YAP and TDP-43 phase dynamics is intriguing but only briefly mentioned. The authors should either expand on this or tone down the emphasis in the Results section.

      (21) In the organoid methods, the authors should consider clarifying whether doxycycline is continuously used, which might alter TDP-43 expression and nuclear transport in a non-negligible way.

      (22) For statistical methods, it would be beneficial to indicate whether multiple-comparison corrections were applied for the many FRAP, anisosome count, and size comparisons beyond DESeq2 internal corrections for RNA-seq.

      (23) Some figure legends could more clearly indicate whether the images shown are single z-planes or maximum intensity projections and how the thresholding for anisosome detection was performed.

      (24) In its current form, the manuscript contains an impressive set of screens and some nicely executed imaging of TDP-43 condensates, highlighting nuclear export among other pathways as a modulator of TDP-43 phase behavior. However, the physiological relevance is undercut by heavy reliance on an acetylation-mimetic, RNA-binding-defective TDP-43 mutant and a homozygous K181E organoid model. The mechanistic link between XPO1 and TDP-43 remains largely inferential and partly at odds with prior work. The conclusion that cytoplasmic TDP-43 aggregation is only a modest contributor to disease is not firmly supported by the available data.

      (25) With substantial additional mechanistic work, particularly around XPO1, rigorous validation in more physiological TDP-43 contexts, more sensitive detection of cytoplasmic TDP-43 aggregates, and a tempering of the central claims, this study could make a meaningful contribution to understanding how nucleocytoplasmic transport and other cellular pathways influence TDP-43 phase transitions and aggregation. The work should be reframed as an important screening study that identifies nuclear export as one among several cellular processes that modulate TDP-43 phase behavior in a model system, rather than as a definitive demonstration that nuclear export governs pathological TDP-43 aggregation in disease.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript addresses an important and timely question in TDP-43 biology by systematically identifying regulators of TDP-43 anisosome formation, with a particular focus on nuclear export via XPO1. Using a combination of unbiased chemical screening, genetic perturbation, and advanced imaging approaches, the authors propose that inhibition of nuclear export modulates the abundance and biophysical properties of TDP-43 anisosomes. The study is conceptually innovative and has potential relevance for neurodegenerative diseases characterized by TDP-43 pathology. However, significant concerns regarding experimental controls, reporting transparency, and model translatability currently limit the strength of the conclusions and the interpretability of several key findings.

      Strengths:

      (1) The study employs an unbiased, hypothesis-free compound screen to identify regulators of TDP-43 anisosome formation, which is a major strength and reduces confirmation bias.

      (2) The authors combine chemical and genetic screening approaches, providing orthogonal validation of key pathways and increasing confidence in the biological relevance of top hits.

      (3) The focus on biophysical properties of TDP-43 assemblies, assessed through imaging and FRAP, moves beyond simple presence/absence of aggregates and provides mechanistic insight into the biophysical states of TDP-43.

      (4) The use of multiple experimental modalities, including live-cell imaging, FRAP, pharmacological perturbation, and transcriptomic analysis, reflects a technically sophisticated and ambitious study design.

      (5) The authors attempt to extend findings beyond immortalized cancer cell lines by incorporating organoid models, demonstrating awareness of disease relevance and translational importance.

      Overall, the manuscript is clearly written and logically structured, making complex experimental workflows accessible and the central hypotheses easy to follow.

      Weaknesses:

      Despite its strengths, the manuscript has several major limitations that affect data interpretation and confidence in the conclusions.

      (1) Lack of appropriate controls for overexpression experiments:

      A central concern is the absence of proper controls for TDP-43 and XPO1 overexpression. Prior studies (including those cited by the authors, Archbold et al.2018) show that overexpression of WT TDP-43 alone is toxic to neurons. Thus, the experimental system itself may induce anisosome formation independently of the mechanisms under study. Similarly, XPO1 overexpression lacks a suitable control (e.g., mCherry alone or mCherry fused to a protein known to be independent of TDP-43). The near-complete colocalization of XPO1 with TDP-43 anisosomes upon overexpression raises the possibility that these structures reflect non-physiological protein accumulation rather than regulated assemblies.

      2) Insufficient experimental and analytical transparency:

      The manuscript frequently lacks clear reporting of experimental details. In multiple figures, the stated number of independent experiments does not match the number of data points shown, making it difficult to assess statistical validity. Concentrations used in the compound screen are not clearly defined, nor is it stated whether multiple concentrations were tested. It is unclear how many wells, cells, or independent cultures were analyzed. The criteria used to reduce 1,533 screening hits to 211 candidates via STRING analysis are not explained. Knockdown and overexpression efficiencies are not reported.

      (3) RNA-seq concerns:

      The RNA-seq experiments are particularly problematic. The number of biological replicates per condition is not stated, and heatmaps suggest that only one sample per group may have been used, which would preclude statistical analysis. No baseline comparison between WT and mutant TDP-43 is shown. Given that TDP-43 is an RNA-binding protein, splicing analyses would be far more informative than gene expression alone, yet no splicing data are presented. Moreover, nuclear retention of TDP-43 does not preclude nuclear aggregation, which may still impair its splicing function.

      (4) Limited translatability to neuronal biology:

      All anisosome analyses are performed in a cancer cell line, raising concerns about relevance to post-mitotic neurons. While organoids are used as a secondary model, the assays performed do not overlap with those used in cancer cells, making it difficult to assess whether anisosome-related mechanisms are conserved. Neuronal toxicity, a critical outcome given known TDP-43 biology, is not assessed. Prior work has shown that WT TDP-43 overexpression alone is toxic to neurons, yet this is not addressed.

      (5) Conceptual and interpretational gaps:

      The authors quantify anisosome number but also report conditions in which anisosome number decreases while size increases. The biological interpretation of larger anisosomes is not discussed, and whether this reflects improvement or worsening of pathology is unclear. Compounds targeting the same mechanism (e.g., nuclear export inhibition) are inconsistently used across experiments (KPT compounds, verdinexor, leptomycin B), raising concerns about reproducibility. In organoids, the experimental paradigm shifts to long-term treatment (35 days vs. 16 hours), further complicating interpretation.

      (6) Overinterpretation of rescue effects:

      Although the authors state that they aim to test whether nuclear export inhibition rescues neuronal defects, no functional neuronal readouts are provided (e.g., viability, morphology, axon outgrowth, or electrophysiological measures). RNA-seq alone is insufficient to support claims of rescue.

      (7) Finally, the model does not appear to exhibit cytosolic TDP-43 aggregation at baseline. It remains unclear whether longer induction would produce cytosolic gel-like assemblies and whether these would be prevented by nuclear export inhibition. Long-term data are shown only in organoids, yet anisosome formation is not assessed there.

    3. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, the authors use a doxycycline-inducible DLD1 cell line expressing a Clover-tagged RNA-binding-defective TDP-43 2KQ mutant that forms nuclear "anisosomes" (TDP-43 shell with HSP70 core) to carry out a small-molecule screen using the LOPAC 1280 library to identify compounds that reduce anisosome number or shift their morphology and dynamics. They also conducted a genome-wide siRNA screen to identify genetic modifiers of anisosome formation and dynamics. From these screens, the authors identify pathways in RNA splicing, translation, proteostasis (proteasome and HSP90), and nuclear transport, including XPO1. They then focus on XPO1 as their primary hit. Pharmacological inhibition of XPO1 using KPT-276, Verdinexor, and Leptomycin B reduces anisosome number while enlarging remaining condensates, which retain liquid-like behavior by FRAP and fusion assays. XPO1 overexpression causes fewer, enlarged TDP-43 puncta, including cytoplasmic puncta, with little or no FRAP recovery, interpreted as gel or solid-like aggregates. Anisosome induction reduces detectable nucleoplasmic XPO1 staining. Finally, the authors examine a homozygous TDP-43 K181E iPSC-derived forebrain organoid model, showing increased cytosolic pTDP-43 in K181E/K181E organoids compared to wild-type controls. Chronic low-dose KPT-276 reduces cytoplasmic pTDP-43 without changing total TDP-43 levels. Bulk RNA-seq shows only a modest fraction of dysregulated genes in K181E/K181E organoids are rescued by KPT-276. They conclude that nuclear export, via XPO1, is a key regulator of TDP-43 liquid-to-solid phase transitions and that cytoplasmic aggregation per se may contribute only modestly to TDP-43 proteinopathy, with RNA-processing defects being dominant.

      We thank the reviewer for carefully summarizing our study.

      The study presents well-executed chemical and genome-wide siRNA screens in a DLD1 TDP-43 2KQ anisosome model and follows up on nuclear transport, particularly XPO1, as a modulator of TDP-43 phase behavior and cytoplasmic aggregation. The screens are impressive in scale, and the microscopy and fluorescence recovery after photobleaching (FRAP) work is technically strong. However, the central mechanistic and disease-relevance claims are not yet sufficiently supported. There are major concerns about the heavy reliance on non-physiological, RNA-binding-defective, and acetylation-mimetic TDP-43 (2KQ) and a homozygous TDP-43 K181E organoid model. An underdeveloped and partly contradictory mechanistic link exists between XPO1 and TDP-43 phase transitions in the context of prior work showing TDP-43 is not a canonical XPO1 cargo. The paper also appears to overinterpret organoid data to conclude that cytoplasmic TDP-43 aggregation plays only a minor role in pathology, based largely on pTDP-43 antibody staining with limited sensitivity and relatively modest rescue readouts. A deeper mechanistic analysis and additional, more physiological validation are needed for this to reach the level of rigor and impact implied by the title and abstract. The work feels screen-rich but conceptually underdeveloped, with key claims outpacing the data. A major revision with substantial new data and tempering of conclusions is warranted. I outline several problematic areas below:

      (1) The central mechanistic discoveries are derived almost entirely from a DLD1 colon cancer cell line overexpressing an RNA-binding-defective, acetylation-mimetic TDP-43 2KQ mutant and homozygous TDP-43 K181E iPSC-derived organoids. Both systems are far from physiological. The 2KQ mutation is a synthetic double lysine-to-glutamine mutant originally designed to mimic acetylation and disrupt RNA binding. In this study, essentially all cell-based mechanistic data on phase behavior, screens, and XPO1 effects rely on 2KQ. Yet there is no quantification of how much endogenous TDP-43 is acetylated in degenerating human neurons, nor whether a 2KQ-like acetylation state is ever achieved in vivo. It is not established that the phase behavior of 2KQ recapitulates the physiological or pathological phase behavior of wild-type TDP-43 or genuine disease-linked mutants, which may retain partial RNA binding and different post-translational modification patterns. As a result, it is difficult to know whether the modifiers identified here regulate a highly artificial 2KQ condensate or physiologically relevant TDP-43 condensates. To address this concern, the paper would benefit from quantifying endogenous TDP-43 acetylation at the relevant lysines in control and ALS/FTD patient tissue or more disease-proximal models such as heterozygous TARDBP mutant iPSC neurons, which would justify the focus on an acetyl-mimetic mutant. Key phenomena, including XPO1 dependence of phase behavior, effects of proteasome and HSP90 inhibition, and effects of splicing and translation inhibitors, should be tested for wild-type TDP-43 expressed at near-physiological levels and for one or more bona fide ALS/FTD-linked TARDBP mutants that are not acetyl mimetics. At a minimum, the authors should show that endogenous TDP-43 in neuronally differentiated cells exhibits qualitatively similar responses to XPO1 modulation, rather than exclusively relying on DLD1 2KQ overexpression.

      Acetylation of endogenous TDP-43 was reported by several studies. Although it occurs at low levels under normal conditions, TDP-43 acetylation is upregulated under stress conditions (e.g. oxidative stress and proteotoxic stress) (PMID: 25556531; PMID: 28724966). Importantly, Cohen et al. reported the identification of acetylated TDP-43 in ALS patient spinal cord (PMID: 25556531), while Yu et al. showed that endogenous wildtype TDP-43 undergoes demixing when neurons were treated with either a deacetylase inhibitor or proteasome inhibitors (PMID: 33335017). These studies also show that acetylated TDP-43 is defective in RNA binding and more prone to aggregation. Furthermore, ectopic expression of acetylated TDP-43 mimetics in cells and mice induces cellular defects similar to those observed in disease models (PMID: 28724966). Thus, our findings, based on previously established TDP-43 mimetics, should provide valuable information regarding the regulation of TDP-43 phase behavior. We agree with the reviewers that the model used in this study has its limitations, and we will be happy to revise the manuscript to tone down some conclusions, and include more background information to justify the use of TDP-43 acetylation mimetics.

      (2) The organoid model is based on a homozygous K181E knock-in line. However, in patients, TARDBP mutations are overwhelmingly heterozygous. Homozygosity is thus a severe, arguably non-physiological sensitized background that may exaggerate nuclear RNA mis-splicing and phase defects and alter the relative contribution of cytoplasmic aggregation versus nuclear loss-of-function. In addition, it is not fully clear from this manuscript whether the structures in K181E organoids are bona fide anisosomes as defined in Yu et al. 2021, characterized by HSP70-enriched central liquid cores with TDP-43 shells and similar FRAP and fusion behavior to anisosomes in the DLD1 model. At present, the organoid section is framed as validation of "anisosome-bearing organoids," but the figures in this manuscript mainly show pTDP-43 puncta and total TDP-43 immunostaining, without detailed structural or biophysical characterization. The authors should explicitly compare heterozygous K181E/+ organoids or another heterozygous TARDBP mutant line with homozygous K181E/K181E organoids to assess whether XPO1 inhibition has similar effects in a genotype that more closely resembles patient genetics. They should provide direct evidence that the K181E condensates in organoids are anisosomes through HSP70 core immunostaining, three-dimensional reconstruction, and FRAP measurements, and clarify whether KPT-276 is acting on anisosome-like structures or more generic cytoplasmic aggregates or puncta. Without this, the leap from a DLD1 2KQ cancer cell model to human ALS/FTD-relevant neurons is not convincingly supported.

      The reviewer is correct that the use of homozygous K181E organoids generates a homogenous background that is more sensitive for detecting phosphor-TDP43. The goal of the experiment was to test whether XPO1 inhibition mitigates the aggregation of a TDP-43 disease mutant. For this purpose, we believe that our experimental setup is suitable. We agree that we should not extrapolate the result to overemphasize on its disease connections. We will revise the paper to tone down this part.

      Regarding the immunostained signals in K181E organoids, we did not report them as anisosomes. As widely documented in the literature, p-TPD-43 is widely used as a marker of pathological TDP-43 aggregation. P-TDP-43 is enriched in pathological aggregates in human ALS and FTLD patients, colocalized with other aggregation signatures such as ubiquitin and other aggregation prone proteins (PMID: 36008843), and is being used as a diagnostic marker for neurodegeneration (PMID: 31661037). Figure 7A showed that inhibiting nuclear export mitigates the accumulation of p-TDP-43 in mutant tissues. We will revise the subheading and the corresponding text to avoid the confusion.

      (3) The title and framing assert that "nuclear export governs TDP-43 phase transitions." However, prior studies such as Pinarbasi et al. 2018 and Duan et al. 2022 indicate that TDP-43 is not a canonical XPO1 cargo and that its export is largely passive, with active nuclear import being the dominant determinant of nuclear localization. The authors cite these studies but still position XPO1 as a central, quasi-direct regulator. The data presented are largely correlative or based on pharmacologic manipulation and overexpression in an overexpression mutant background, with no direct evidence that XPO1 engages TDP-43 in a specific, regulated manner. Even if XPO1 does not engage WT TDP-43, it could still engage the 2KQ variant, which needs to be tested.

      We did not conclude or imply the regulation of TDP-43 by XPO1 is direct. In fact, we explicatively mentioned on page 8 that the regulation is likely indirect and mediated by other factors. The sentence reads as “Since XPO1 does not bind TDP-43 directly (Pinarbasi et al., 2018), additional factors likely facilitate XPO1-mediated TDP-43 nuclear egression under this condition.” We can revise the part to make it clearer. We will also revise the title and change the framing accordingly. 

      (4) The XPO1 perturbations yield somewhat confusing phenotypes. XPO1 inhibition using Leptomycin B, KPT-276, and Verdinexor reduces anisosome number and enlarges remaining anisosomes, which remain liquid-like by FRAP recovery and fusion assays and stay nuclear. XPO1 overexpression causes fewer, enlarged puncta, but these are FRAP-impaired (gel-like) and redistribute to the cytoplasm. Thus, both decreased and increased XPO1 activity reduce anisosome number and enlarge puncta, but with opposite phase behaviors and subcellular localizations. The model presented in Figure 5L is relatively qualitative and does not resolve these issues. Moreover, XPO1 inhibition globally impairs nuclear export of many cargos and profoundly alters the nuclear environment, transcription, RNA processing, and chromatin. It is therefore difficult to conclude that the observed effects are specific to TDP-43 phase regulation as opposed to secondary consequences of broad nuclear export blockade.

      The reviewer correctly summarizes our data and interpretation: XPO1 loss-of-function and gain-of-function generate opposite phenotypes regarding TDP-43 phase behavior. We agree that additional studies are needed to elucidate the underlying mechanism (e.g. direct or indirect), but we feel that belong to a separate study. We plan to re-test the effect of nuclear export inhibition on the subcellular distribution of WT TDP-43 and the acetylation mimetics. We will also add more discussions about the potential indirect effect of XPO-1 inhibition on TDP-43 phase behavior.

      (5) The authors show that anisosome induction depletes nucleoplasmic XPO1 signal and that mCherry-XPO1 can be seen in some TDP-43 puncta. However, antibody penetration into anisosomes is limited, so XPO1 depletion from nucleoplasm could reflect sequestration in the anisosome shell or core, but this is not demonstrated. There is no demonstration of physical interaction, even indirect interaction, between XPO1 and TDP-43 or a defined adaptor, nor identification of a specific mutant of XPO1 that selectively disrupts this putative interaction while preserving other functions. The known TDP-43 NES has been shown to be weak and not a functional XPO1-dependent NES in multiple studies. If XPO1 is acting through an adaptor that recognizes 2KQ or K181E specifically, that by itself would bring into question the generality of the mechanism for wild-type TDP-43.

      We agree that our observation does not demonstrate an interaction between XPO1 and TDP-43. As mentioned above, we did discuss that the regulation of TDP-43 by XPO1 is likely indirect. We will revise our paper further to separate any speculative statements from the data and narrow our mechanistic claim.

      (6) To support a mechanistic claim that nuclear export governs TDP-43 phase transitions, more targeted evidence is needed. The authors should test whether siRNA knockdown or CRISPR interference of XPO1 in the DLD1 2KQ model reproduces the effects seen with Leptomycin B and KPT-276, including FRAP and fusion phenotypes, and verify on-target effects by rescue with an siRNA-resistant XPO1 construct. They should demonstrate that canonical XPO1 cargos behave as expected under the inhibitor conditions used, as a positive control, and that the concentrations used are not grossly toxic. They should attempt to identify or at least constrain candidate adaptors that might enable XPO1-dependent export of TDP-43 through proteomic analysis of XPO1 co-purifying with 2KQ condensates or loss-of-function studies of candidate adaptors from the siRNA screen. Finally, they should test whether a TDP-43 mutant that cannot bind the proposed adaptor still responds to XPO1 manipulation.

      The anisosome enlargement phenotype upon XPO1 depletion was seen in our siRNA screend, which was identified by machine-based image analyses using 6 distinct siRNAs. This, together with the chemical inhibition experiments, convinced us that the phenotype is specifically caused by XPO1 inactivation.

      When characterizing the effect of XPO1 inhibition on anisosome dynamics, we preferred chemical inhibitor because the effect is acute, and is therefore, less likely to be caused by secondary effects.

      Regarding the inhibitor concentration, a literature survey suggested that 50-200nM of Leptomycin B was commonly used. We chose 200nm to ensure a quick and complete inhibition of XPO1-mediated nuclear export (see Figure 3 in PMID: 9628873). This dose is also well tolerated by our cells, at least during the chosen time window.

      We did not propose any specific adaptor that mediates XPO1 interaction with TDP-43. The identification of such adaptor is out of the scope of this study. We will revise our paper to avoid this confusion.

      (7) Even with these data, what is currently shown is that global modulation of nuclear export capacity can alter the phase behavior and localization of a highly overexpressed RNA-binding-defective TDP-43 mutant and of K181E in organoids. This is important, but it is weaker than asserting that XPO1 directly governs TDP-43 phase transitions in physiological contexts. The title, abstract, and Discussion should be tempered to reflect that nuclear export is one of several pathways, alongside RNA splicing, translation, and proteostasis, that influence TDP-43 phase states in this model, and that the specific mechanism and cargo relationship between XPO1 and TDP-43 remain unresolved and may be indirect.

      We will revise the title, abstract, and discussion to temper the conclusion.

      (8) The authors conclude that cytoplasmic TDP-43 aggregation plays only a modest role in TDP-43 proteinopathies because in homozygous K181E organoids, chronic KPT-276 treatment almost abolishes cytoplasmic pTDP-43 puncta, yet bulk RNA-seq shows only a relatively small fraction of dysregulated genes are rescued. There are several issues with this inference. Relying primarily on pTDP-43 antibody staining to define cytoplasmic TDP-43 aggregation is limiting. pTDP-43 antibodies label only phosphorylated species and may miss non-phosphorylated, oligomeric, or amorphous TDP-43 species that could still be toxic. Different pTDP-43 antibodies vary in epitope accessibility depending on aggregate conformation and subcellular location. More sensitive approaches, such as high-affinity TDP-43 RNA aptamer probes developed by Gregory and colleagues, biochemical fractionation for SDS-insoluble and urea-soluble TDP-43, and filter-trap assays, would provide a more quantitative assessment of cytoplasmic aggregation and its reduction by KPT-276. Without these, it is not safe to assume that cytoplasmic aggregation has been eliminated, as opposed to one antigenic subclass.

      We agree with the reviewer that p-TDP-43 may not represent all aggregate species. However, p-TDP-43 antibodies detect the pathologically validated species most tightly associated with TDP-43 proteinopatheis. In human ALS and FTLD-TDP tissues, cytoplasmic inclusions are strongly immunoreactive for phosphorylated TDP-43 (typically S409/410, as used here). Additionally, p-TDP-43 immunohistochemistry is a routine diagnostic criterion in neuropathology. For these reasons, we believe that the observation that inhibition of XPO1 significantly reduces p-TDP-43 is a very significant finding, as it suggests that an improvement in TDP-43 proteinopathy can be achieved by the inhibition of nuclear transport. We plan to revise the text to better explain the significance of p-TDP-43 staining.

      (9) The treatment window, spanning from day 87 to 122 with 20 nanomolar KPT-276, may be too late or too mild to reverse entrenched nuclear RNA-processing defects, even if cytoplasmic inclusions are cleared. Once widespread cryptic exon inclusion and alternative polyadenylation misregulation are established, many downstream changes may become self-sustaining or only partially reversible. Moreover, XPO1 inhibition will massively rewire nucleocytoplasmic transport of many transcription factors, splicing factors, and RNA-binding proteins. Thus, the lack of full transcriptomic rescue cannot be cleanly interpreted as evidence that cytoplasmic aggregates are only modest contributors. It may instead reflect that nuclear dysfunction is primary and XPO1 inhibition does not correct, and may even exacerbate, certain nuclear defects.

      We agree with the reviewer that the lack of rescue may be caused by technical issues. We will remove the RNAseq data and related texts since it is not essential for our main conclusion.

      (10) To support a causal statement about the modest contribution of cytoplasmic aggregates, one would want more direct measures of neuronal health and function, such as cell death, neurite complexity, synaptic markers, and electrophysiology before and after KPT-276, not only transcriptomics. A way to selectively reduce cytoplasmic aggregation without globally inhibiting nuclear export would allow comparison of outcomes.

      We will remove the discussion regarding the role of cytoplasmic aggregates in disease.

      (11) Given these caveats, the concluding statements that cytoplasmic TDP-43 aggregation is only a modest contributor should be substantially softened. A more defensible interpretation is that in this homozygous K181E organoid model, chronic global XPO1 inhibition reduces pTDP-43-positive cytoplasmic puncta but only partially normalizes the steady-state transcriptome, suggesting that persistent nuclear RNA-processing defects and other pathways continue to drive pathology.

      We agree with the review and will revise this part accordingly.

      (12) The screens are a major strength but need more rigorous validation for key hits, especially nuclear transport factors. For the siRNA screen, hits are filtered by anisosome number per nucleus, but there is no direct demonstration in the main text that XPO1 or CSE1L knockdown is efficient at the messenger RNA or protein level. For the highlighted genes, Western blot or quantitative polymerase chain reaction validation and phenotypic rescue would strengthen confidence. For small-molecule hits, it is not systematically shown that anisosome modulation is independent of changes in total TDP-43 2KQ expression or gross toxicity. Translation inhibitors are tested for this, but for many other hits, including proteasome, HSP90, and kinase inhibitors, expression and general nuclear structure should be monitored. Given the reliance on anisosome count as a readout, secondary screens that specifically distinguish changes in TDP-43 expression levels, changes in nuclear morphology or cell cycle, and specific changes in anisosome phase behavior, including FRAP and fusion for top hits, would greatly increase interpretability.

      For the siRNA screen, each positive hit was confirmed by two rounds of screen with 6 independent siRNAs in total. Although we did not validate the knockdown efficiency due to the large number of hits, we routinely include a positive siRNA control in our study (siRNAdeath), which targets an essential gene. Transfection efficiency was controlled by measuring cell viability after knocking down this essential gene. In addition, the identification of XPO1 as a positive regulator of TDP-43 phase behavior was independently validated by our chemical genetic screens. We feel confident that XPO1 is a key modulator of TDP-43 phase behavior. For chemical treatment experiments, the anisosome fusion phenotypes could be detected as early as 5 h post treatment. Given the short treatment, we do not expect a significant change in protein level or toxicity.

      (13) The classification of condensates as liquid versus gel-like or solid is based almost entirely on FRAP recovery or lack thereof. While FRAP is appropriate, interpretations could be made more robust by including half-region-of-interest bleach controls and assessing mobile fractions and recovery kinetics more quantitatively across conditions. Complementing FRAP with other phase-behavior assays such as sensitivity to 1,6-hexanediol, shape relaxation after deformation, and coarsening behavior over longer timescales would strengthen the analysis. At present, some assignments, such as that XPO1 overexpression drives a gel-like transition, are reasonable but somewhat qualitative.

      In this study, we described two types of condensates formed by TDP-43 2KQ, one characterized previously as nuclear anisosome and the other as cytosolic puncta in XPO1 over-expressing cells. The two can be clearly distinguished by several features including the subcellular localization, shape, and mobility. We feel that our FRAP data clearly segregate these puncta into two distinctive types of assemblies. The difference in fluorescence recovery rate is huge. The proposed half-region-of-interest bleach is technically challenging for small anisosomes under normal conditions. When they were enlarged by Leptomycin B treatment, we did perform both whole anisosome bleach and partial bleach (Figure 5D, I). Both assays demonstrate that TDP-43 in these enlarged anisosomes is highly mobile.

      (14) For the Leptomycin B and KPT-276 experiments in cells and organoids, it would be important to confirm that canonical XPO1 cargo proteins accumulate in the nucleus and that the concentrations used are within a range that is not overtly toxic over the experimental timeframe. Assessing nuclear morphology, chromatin condensation, and general transcriptional activity through global RNA synthesis or key reporter genes would ensure that observed effects are not secondary to severe global nuclear export collapse.

      In Leptomycin B treatment experiments, we carefully chose a dose that was previously validated (see Figure 3 in PMID: 9628873). Based on our DAPI staining, the nuclear morphology appears normal (Figure 5A). Additionally, in cell line-based experiment, the effect of Leptomycin B on anisosomes was detected 6-8 hours post treatment. The change in global protein synthesis should be relatively minor at this time point. In the organoid experiment, the drug dose was determined by a pre-experiment in which the morphology of organoids was evaluated after prolonged treatment with different doses of the inhibitors.

      (15) In the organoid section, it is not clear how many independent iPSC clones and organoid batches were used per condition, nor whether batch effects were assessed in the bulk RNA-seq analysis. This should be fully specified and ideally controlled with isogenic wild-type and K181E clones. For transcriptional rescue, it is important to know whether the changes in wild-type organoids treated with KPT-276 are negligible. A direct wild-type comparison with or without KPT-276 is important to disentangle general drug effects from K181E-specific rescue. More detailed quantification of total TDP-43 and pTDP-43 in both nuclear and cytoplasmic fractions, including biochemical fractionation if possible, would strengthen the assertion that KPT-276 specifically reduces cytosolic pTDP-43 aggregates while sparing nuclear TDP-43.

      The organoid experiment was performed with two batches per condition. This is to reduce the effect of batch variation. The wildtype cells and K181E mutant are derived from the same genetic background. We will revise the text to clarify these issues. Given the cost of this experiment, we did not include drug-treated wild-type as a control. Given the criticisms by review 1 and 2 on the RNAseq data, we will remove this non-essential data from our revision.

      (16) Beyond the core issues above, several additions could greatly enhance the impact. The manuscript currently emphasizes XPO1, but the genetic and chemical data clearly implicate RNA splicing, translation, and proteostasis as equally strong or stronger regulators of TDP-43 phase states. A more integrated model that explains how these pathways intersect, for example, how splicing factor availability, ribosome loading, and proteasome capacity co-govern anisosome nucleation, growth, and hardening, would be valuable.

      We agree with the reviewer that these are important directions for future studies. We will include some discussions on a possible model that integrate these factors.

      (17) A key unresolved question is whether XPO1 is acting directly on TDP-43, or instead primarily regulates anisosomes by exporting other factors that more proximally control TDP-43 phase behavior. Given that TDP-43 is not a canonical XPO1 cargo and prior work indicates that its nuclear export is largely passive, it seems at least as plausible that XPO1 inhibition alters the nuclear concentration or localization of splicing factors, RNA-binding proteins, chaperones, or other modifiers identified in the screens, and that changes in these proteins secondarily reshape anisosome dynamics. In other words, XPO1 may be exporting a more direct regulator of anisome formation and hardening, rather than exporting TDP-43 itself in a specific, regulated way. The current data do not distinguish between these possibilities. Systematic identification of XPO1-dependent cargos that colocalize with or biochemically associate with anisosomes, combined with targeted perturbation of their nuclear export, would be needed to determine whether the relevant XPO1 substrate in this system is actually TDP-43 or an upstream modulator of its phase behavior.

      The reviewer raises an important point. We did include some discussions along this line in our paper. We can add more to further clarify this issue. Again, as mentioned in the original draft, we did not conclude there is an interaction between TDP-43 and XPO1.

      (18) Testing whether identified modifiers converge on nuclear TDP-43 concentration would be informative. Since phase separation is concentration-dependent, measuring nuclear versus cytoplasmic TDP-43 levels across key perturbations, including splicing inhibition, translation inhibition, proteasome inhibition, HSP90 inhibition, and XPO1 modulation, would help determine whether modifiers mainly work by changing nuclear TDP-43 concentration or by altering interaction networks and the material properties of condensates.

      We will measure the nuclear TDP-43 concentration in our imaging experiments and add the data to a revised version.

      (19) Examining other ALS-relevant RNA-binding proteins would be valuable. Given the role of XPO1 and other hits, it would be informative to briefly test whether similar principles apply to FUS, hnRNPA1, or other ALS-relevant RNA-binding proteins in the same cellular context, to argue for generality versus TDP-43-specific idiosyncrasies of the 2KQ system.

      We agree that this is an important issue but we feel the proposed experiments are beyond the scope of the study.

      (20) The Introduction sometimes implies that anisosomes are common and well-established intermediates en route to pathology. It would be helpful to more clearly state that, to date, anisosomes are primarily observed in overexpression and mutant systems and have not yet been unequivocally demonstrated in human patient tissue. The link between PDGFRβ, PAK4, GSK-3β, and YAP and TDP-43 phase dynamics is intriguing but only briefly mentioned. The authors should either expand on this or tone down the emphasis in the Results section.

      We will revise the introduction accordingly.

      (21) In the organoid methods, the authors should consider clarifying whether doxycycline is continuously used, which might alter TDP-43 expression and nuclear transport in a non-negligible way.

      The organoid model does not involve protein overexpression or doxycycline treatment. We measured endogenous p-TDP-43. We will revise to paper to avoid the confusion.

      (22) For statistical methods, it would be beneficial to indicate whether multiple-comparison corrections were applied for the many FRAP, anisosome count, and size comparisons beyond DESeq2 internal corrections for RNA-seq.

      We will add this information to the figure legends during revision.

      (23) Some figure legends could more clearly indicate whether the images shown are single z-planes or maximum intensity projections and how the thresholding for anisosome detection was performed.

      We will revise the figure legends to include this information. As for anisosome detection, because they are so obvious, standard thresholding was sufficient to identify them.

      (24) In its current form, the manuscript contains an impressive set of screens and some nicely executed imaging of TDP-43 condensates, highlighting nuclear export among other pathways as a modulator of TDP-43 phase behavior. However, the physiological relevance is undercut by heavy reliance on an acetylation-mimetic, RNA-binding-defective TDP-43 mutant and a homozygous K181E organoid model. The mechanistic link between XPO1 and TDP-43 remains largely inferential and partly at odds with prior work. The conclusion that cytoplasmic TDP-43 aggregation is only a modest contributor to disease is not firmly supported by the available data.

      We agree with the reviewer that the strength of the study is our unbiased approach that identify pathways capable of modulating TDP-43 phase separation behavior. We will revise our paper to carefully discuss the potential physiological relevance of our study and tone down some mechanistic conclusions, as suggested by the reviewer.

      (25) With substantial additional mechanistic work, particularly around XPO1, rigorous validation in more physiological TDP-43 contexts, more sensitive detection of cytoplasmic TDP-43 aggregates, and a tempering of the central claims, this study could make a meaningful contribution to understanding how nucleocytoplasmic transport and other cellular pathways influence TDP-43 phase transitions and aggregation. The work should be reframed as an important screening study that identifies nuclear export as one among several cellular processes that modulate TDP-43 phase behavior in a model system, rather than as a definitive demonstration that nuclear export governs pathological TDP-43 aggregation in disease.

      We will reframe the study as an important screening study that identifies nuclear export among several other pathways as modulators of TDP-43 phase behavior.

      Reviewer #2 (Public review):

      Summary:

      This manuscript addresses an important and timely question in TDP-43 biology by systematically identifying regulators of TDP-43 anisosome formation, with a particular focus on nuclear export via XPO1. Using a combination of unbiased chemical screening, genetic perturbation, and advanced imaging approaches, the authors propose that inhibition of nuclear export modulates the abundance and biophysical properties of TDP-43 anisosomes. The study is conceptually innovative and has potential relevance for neurodegenerative diseases characterized by TDP-43 pathology. However, significant concerns regarding experimental controls, reporting transparency, and model translatability currently limit the strength of the conclusions and the interpretability of several key findings.

      We thank the reviewer for acknowledging the significance and innovation of our study.

      Strengths:

      (1) The study employs an unbiased, hypothesis-free compound screen to identify regulators of TDP-43 anisosome formation, which is a major strength and reduces confirmation bias.

      (2) The authors combine chemical and genetic screening approaches, providing orthogonal validation of key pathways and increasing confidence in the biological relevance of top hits.

      (3) The focus on biophysical properties of TDP-43 assemblies, assessed through imaging and FRAP, moves beyond simple presence/absence of aggregates and provides mechanistic insight into the biophysical states of TDP-43.

      (4) The use of multiple experimental modalities, including live-cell imaging, FRAP, pharmacological perturbation, and transcriptomic analysis, reflects a technically sophisticated and ambitious study design.

      (5) The authors attempt to extend findings beyond immortalized cancer cell lines by incorporating organoid models, demonstrating awareness of disease relevance and translational importance.

      Overall, the manuscript is clearly written and logically structured, making complex experimental workflows accessible and the central hypotheses easy to follow.

      Weaknesses:

      Despite its strengths, the manuscript has several major limitations that affect data interpretation and confidence in the conclusions.

      (1) Lack of appropriate controls for overexpression experiments:

      A central concern is the absence of proper controls for TDP-43 and XPO1 overexpression. Prior studies (including those cited by the authors, Archbold et al.2018) show that overexpression of WT TDP-43 alone is toxic to neurons. Thus, the experimental system itself may induce anisosome formation independently of the mechanisms under study. Similarly, XPO1 overexpression lacks a suitable control (e.g., mCherry alone or mCherry fused to a protein known to be independent of TDP-43). The near-complete colocalization of XPO1 with TDP-43 anisosomes upon overexpression raises the possibility that these structures reflect non-physiological protein accumulation rather than regulated assemblies.

      As mentioned in our response to reviewer 1, point 1, we will add more discussion regarding the use of acetylation mimetics in our study. We agree with the reviewer that these large puncta (both anisosomes and gel-like structures) likely resulted from TDP-43 overexpression. Nevertheless, in a titration experiment done by Yu et al. 2020 (PMID: 33335017), they showed that ectopic TDP-43 undergo demixing even at concentrations lower than endogenous TDP-43, although the demixed puncta were very small. Their result suggested that overexpression per se does not change TDP-43 phase behavior, only enlarging the demixed TDP-43 structures. This is necessary for our screen and imaging-based characterization. We will revise the text to clarify this point.

      For XPO1, we did include mCherry alone control in the study but due to space limit in Figure 5, we did not include it. We can put the data in a Supplementary Figure during revision.

      (2) Insufficient experimental and analytical transparency:

      The manuscript frequently lacks clear reporting of experimental details. In multiple figures, the stated number of independent experiments does not match the number of data points shown, making it difficult to assess statistical validity. Concentrations used in the compound screen are not clearly defined, nor is it stated whether multiple concentrations were tested. It is unclear how many wells, cells, or independent cultures were analyzed. The criteria used to reduce 1,533 screening hits to 211 candidates via STRING analysis are not explained. Knockdown and overexpression efficiencies are not reported.

      We apologize for these omissions. We will add more experimental details to the figure legends and the method part. For the imaging experiments, data points reflect randomly selected individual cells imaged in 2-3 independent biological repeats. For chemical screens, we screened against NCATS libraries first at top concentration (10 mM) to ensure inhibitory efficacy for all compounds. In the follow-up study, we validated the top hits using a series of concentrations, as shown in Figure 1B.

      We will explain the STRING analysis in more detail. We did not check XPO1 knockdown efficiency in high through-put screens (HTS) for several reasons. Firstly, the large number of positive hits makes it impossible to check knockdown efficiency for all these hits. Secondly, the effect of XPO1 knockdown on anisosomes was seen with 6 different siRNAs in two rounds of screens. Thirdly, in the HTS protocol, we routinely included a transfection control (siRNAdeath) to indicate high transfection efficiency. We would only process the data if siRNAdeath control killed > 90% of the cells.

      (3) RNA-seq concerns:

      The RNA-seq experiments are particularly problematic. The number of biological replicates per condition is not stated, and heatmaps suggest that only one sample per group may have been used, which would preclude statistical analysis. No baseline comparison between WT and mutant TDP-43 is shown. Given that TDP-43 is an RNA-binding protein, splicing analyses would be far more informative than gene expression alone, yet no splicing data are presented. Moreover, nuclear retention of TDP-43 does not preclude nuclear aggregation, which may still impair its splicing function.

      We apologize for the lack of clarity regarding the RNA-seq design. For each condition, organoids of two independently differentiated batches were treated in triplicate. We pooled the organoids of the same treatment from the two batches to reduce the impact of batch variation.

      Given the criticisms from both reviewer 1 and 2 on the limitation of the RNAseq study, we plan to remove this data from the revised manuscript.

      (4) Limited translatability to neuronal biology:

      All anisosome analyses are performed in a cancer cell line, raising concerns about relevance to post-mitotic neurons. While organoids are used as a secondary model, the assays performed do not overlap with those used in cancer cells, making it difficult to assess whether anisosome-related mechanisms are conserved. Neuronal toxicity, a critical outcome given known TDP-43 biology, is not assessed. Prior work has shown that WT TDP-43 overexpression alone is toxic to neurons, yet this is not addressed.

      We agree with the reviewer that the model used in this study is not directly relevant to neurodegeneration. However, as pointed out by the reviewer, neurons are much more sensitive to TDP-43-associated toxicity. By contrast, the cell line used in this study can tolerate TDP-43 overexpression with no detectable cytotoxicity. This feature makes it feasible to evaluate how different cellular processes modulate TDP-43 phase behavior without the confounding effect from toxicity. The fact that TDP-43 expression was induced for a short period of time also help minimize the impact of toxicity. Notably, the processes identified by our screens are all house-keeping pathways that is present in neurons. Thus, we believe that the reported findings are likely applicable to neurons, though we will revise our paper to make sure that we don’t overstate the clinical relevance of our work.

      (5) Conceptual and interpretational gaps:

      The authors quantify anisosome number but also report conditions in which anisosome number decreases while size increases. The biological interpretation of larger anisosomes is not discussed, and whether this reflects improvement or worsening of pathology is unclear. Compounds targeting the same mechanism (e.g., nuclear export inhibition) are inconsistently used across experiments (KPT compounds, verdinexor, leptomycin B), raising concerns about reproducibility. In organoids, the experimental paradigm shifts to long-term treatment (35 days vs. 16 hours), further complicating interpretation.

      As pointed out by the reviewer 1 in point 4 above, we do not have evidence to establish a convincing correlation between the size of anisosomes and clinical phenotypes. Regarding the use of different drugs for different experiments, the initial screen identified KPT and Verdinexor because Leptomycin B was not in our library. In the follow-up studies, we switched to Leptomycin B because 1) it is commercially available; 2) it is highly potent and specific; 3) it was more commonly used as inhibitors of XPO1 according to the literature. However, for the organoid study, we had to switch back to KPT because of the toxicity issue associated with long-term application of Leptomycin B.

      (6) Overinterpretation of rescue effects:

      Although the authors state that they aim to test whether nuclear export inhibition rescues neuronal defects, no functional neuronal readouts are provided (e.g., viability, morphology, axon outgrowth, or electrophysiological measures). RNA-seq alone is insufficient to support claims of rescue.

      Our interpretation of the RNA-seq data was that the rescue effect by nuclear export inhibition was limited and likely insignificant. Given that this negative data is not conclusive, we will remove it from the revised manuscript.

      (7) Finally, the model does not appear to exhibit cytosolic TDP-43 aggregation at baseline. It remains unclear whether longer induction would produce cytosolic gel-like assemblies and whether these would be prevented by nuclear export inhibition. Long-term data are shown only in organoids, yet anisosome formation is not assessed there.

      The expression system used in the study reaches a steady state after 48 h of induction. At this point, we did not observe any gel-like structures. We can clarify this point during revision.

      Reviewer #3 (Public review):

      Summary:

      TDP-43 proteinopathy is broadly found in neurodegenerative diseases. This manuscript investigates how nuclear export influences the biophysical properties of TDP-43. The authors use a combination of chemical screening and genome-wide siRNA screening to identify pathways that modulate TDP-43 liquid-to-solid transitions. Overall, the study employs a broad array of approaches and addresses an important question in TDP-43 pathobiology. The identification of nuclear export as a central regulator is compelling and conceptually aligns with the emerging view that TDP-43 nucleocytoplasmic trafficking is a major defect in neurodegeneration.

      Strengths:

      This work integrates chemical and genetic screening to identify novel modifiers. The candidates were validated in both reporter cell lines and iPS-differentiated organoids. The findings support the nucleocytoplasmic transport is important for the biophysical properties of TDP-43.

      We thank the reviewer for acknowledging the significance and strength of our study.

      Weaknesses:

      The mechanisms underlying the connection between nuclear export and phase transition need further clarification. Broader consequences of XPO1 inhibition are not addressed.

      We agree that our study does not address how nuclear export inhibition affect TDP-43 phase behavior. As discussed in the paper, we proposed that the effect of nuclear export inhibition on TDP-43 phase separation is likely indirect. The most likely scenario is that inhibition of nuclear export changes the nuclear environment over time, which affects TDP-43 phase separation. We have tried to isolate nuclear extracts from control and LMB-treated cells and used mass spec to identify proteins that are differentially present in the nucleus. However, knockdown of the identified top candidates did not abolish LMB-induced phase alteration. Considering our observation that RNA splicing is another modulator of TDP-43 phase behavior, it is possible that it is the combined change of RNA and protein composition in the nucleus that alters TDP-43 phase behavior. However, defining the mechanism would require substantial work that is beyond the scope of the current study.

    1. Organizing Ideas When you write, it is helpful when your ideas are presented in an order that makes sense. The writing you complete in all your courses exposes how analytically and critically your mind works. In some courses, the only direct contact you may have with your instructor is through the assignments you write for the course. You can make a good impression by spending time ordering your ideas. Order refers to your choice of what to present first, second, third, and so on in your writing. The order you pick closely relates to your purpose for writing that particular assignment. For example, when telling a story, it may be important to first describe the backstory. Or you may need to first describe a 3-D movie projector or a television studio to help readers visualize the setting and scene. You may want to group your support effectively to convince readers that your point of view on an issue is well reasoned and worthy of belief. In longer pieces of writing, you may organize parts in different ways so that your purpose stands out clearly and all parts of the paper work to consistently develop your main point. Methods of Organizing Writing An outline is a written plan that serves as a skeleton for the paragraphs you write. Later, when you draft paragraphs in the next stage of the writing process, you will add support to create “flesh” and “muscle” for your assignment. The outline will utilize the ideas you developed during the prewriting process. When you write, your goal is not only to complete an assignment but also to write for a specific purpose—perhaps to inform, to explain, to persuade, or to achieve a combination of these purposes. Your purpose for writing should always be in the back of your mind, because it will help you decide which pieces of information belong together and how you will order them. Three common ways to structure a paper are chronological order, spatial order, and order of importance. Choose the order that will most effectively fit your purpose and support your main point. Chronological To tell a story or relate an experience To explain the history of an event or a topic To introduce the steps in a process Spatial To help readers visualize something as you want them to see it To create a main impression using the senses (sight, touch, taste, smell, and sound) Order of Importance To persuade or convince To rank items by their importance, benefit, or significance Organizing Your Writing Descriptive writing is most effective when it is organized well. Use the following information to decide what organization best fits your goals. Chronological order → best for describing events Spatial order → best for describing places Order of importance →  best for describing objects and people Types of Outlines A formal outline is a detailed guide that shows how all your supporting ideas relate to each other. This outline helps you distinguish between ideas that are equally important and ones that are less important. You can build your paper based on the framework you created in the outline. There are two types of formal outlines: the topic outline and the sentence outline. Format both types of formal outlines similarly. Place your introduction and thesis statement at the beginning, under roman numeral I. Use roman numerals (II, III, IV, V, etc.) to identify main points that develop the thesis statement. Use capital letters (A, B, C, D, etc.) to divide your main points into parts. Use arabic numerals (1, 2, 3, 4, 5, etc.) if you need to subdivide any As, Bs, or Cs into smaller parts. End with the final roman numeral expressing your idea for your conclusion. Here is what the skeleton of a traditional formal outline looks like. The indentation helps clarify how the ideas are related. Outlining a Paper

      ways of organizing my ideas.

    2. Place your introduction and thesis statement at the beginning, under roman numeral I. Use roman numerals (II, III, IV, V, etc.) to identify main points that develop the thesis statement. Use capital letters (A, B, C, D, etc.) to divide your main points into parts. Use arabic numerals (1, 2, 3, 4, 5, etc.) if you need to subdivide any As, Bs, or Cs into smaller parts. End with the final roman numeral expressing your idea for your conclusion. Here is what the skeleton of a traditional formal outline looks like. The indentation helps clarify how the ideas are related.

      Roman Numerals = Identify Main Points Capital Letters = Divides Main Points Arabic Numerals = Subdivides Points Within Main Points

    1. S'associer aux enfants pour penser l'avenir : L'approche des Conf'kids

      Résumé Exécutif

      Ce document synthétise les réflexions issues d'un échange sur la nécessité d'inclure les jeunes générations dans les discussions sur les enjeux de transition (écologie, économie, société).

      Le projet Conf'kids, initié par Déborah (ancienne directrice commerciale dans les médias), propose des conférences-discussions d'une heure destinées aux enfants, principalement entre 8 et 12 ans.

      L'objectif est de transformer l'éco-anxiété en puissance d'agir en abordant la complexité du monde sans la simplifier, tout en adaptant le vocabulaire et les références.

      Le constat central est que les enfants sont naturellement "calibrés pour la complexité" et capables de saisir les interconnexions entre les piliers du développement durable.

      En les outillant pour vérifier l'information et en valorisant leurs actions locales, le projet vise à créer une masse critique capable d'influencer positivement l'avenir.

      --------------------------------------------------------------------------------

      1. Genèse et Mission du Projet Conf'kids

      L'origine du concept

      Le projet est né d'un constat paradoxal : les discussions sur les transitions écologiques et sociétales sont majoritairement menées par des décideurs de plus de 60 ans, alors que ces enjeux concernent en priorité les générations qui vivront le plus longtemps dans le futur.

      Objectifs fondamentaux

      Sortir de la passivité : Face à une situation complexe, l'individu a le choix entre "courber le dos" ou agir. Conf'kids choisit la dynamique de l'action.

      Construire avec confiance : Présenter les transitions de manière positive et constructive plutôt qu'anxiogène.

      Inclusion générationnelle : Donner la parole aux enfants non pas comme des "futurs citoyens", mais comme des contributeurs actuels capables de partager leurs usages et modes de pensée.

      --------------------------------------------------------------------------------

      2. Méthodologie et Approche Pédagogique

      Le dispositif Conf'kids repose sur un format structuré d'une heure, adaptable en présentiel ou en ligne.

      L'adaptation sans simplification

      Contrairement aux approches traditionnelles, Conf'kids ne simplifie pas la complexité des sujets. L'approche repose sur :

      Le vocabulaire et le ton : Un travail de médiation pour rendre le discours accessible.

      L'imagerie et les métaphores : Utiliser des référentiels propres à l'enfance (ex: expliquer la surconsommation via le désir d'une nouvelle console de jeux plutôt que via des concepts abstraits de macroéconomie).

      La préparation des experts : Faire intervenir des spécialistes qui n'ont pas l'habitude de s'adresser aux enfants, en les aidant à transposer leur expertise.

      Interactivité et horizontalité

      Le processus privilégie la pensée de l'enfant :

      • Les questions sont autorisées à tout moment pour ne pas couper le cheminement de pensée.

      • L'intervenant et l'animateur apprennent autant des enfants que l'inverse (valorisation des propositions concrètes des jeunes).

      --------------------------------------------------------------------------------

      3. Thématiques et Préoccupations des Jeunes

      L'analyse des retours d'expérience montre que les enfants identifient spontanément les quatre piliers du développement durable.

      | Pilier | Thématiques récurrentes identifiées par les enfants | | --- | --- | | Social | Inégalités, précarité, situation des migrants, inclusion des personnes porteuses de handicap. | | Environnement | Chute de la biodiversité (insectes, océans), pollution plastique, énergies fossiles vs renouvelables. | | Économie | Responsabilité éco-responsable des entreprises (souvent représentées par le pictogramme de l'usine). | | Partenariat | Actions collectives, rôle de la famille, tri des déchets, compostage. |

      Observations clés sur la perception enfantine

      Sensibilité sociale initiale : Les enfants urbains sont d'abord heurtés par les inégalités visibles (SDF, accès aux transports).

      Compréhension systémique : Ils saisissent très bien les chaînes de dépendance (ex: lien entre pollinisation, alimentation et médicaments).

      Pragmatisme et bon sens : Confrontés à des sujets complexes comme la Constitution, les enfants prônent souvent l'application directe des règles existantes ("faites juste ce qu'il y a écrit") plutôt que l'ajout de nouvelles couches de complexité.

      --------------------------------------------------------------------------------

      4. Les Défis de l'Information et de l'Anxiété

      Le paradoxe de l'hyper-connexion

      Les enfants d'aujourd'hui sont nés dans un monde d'hyperliens (réseaux sociaux).

      S'ils sont habitués à la connexion entre les informations, ils présentent des vulnérabilités spécifiques :

      Sur-information : Accès aux mêmes médias anxiogènes que les adultes.

      Déficit de vérification : Difficulté à identifier les angles éditoriaux et à vérifier les sources.

      Algorithmes : Ils sont exposés à des contenus qui entretiennent un climat de méfiance ou de peur.

      Stratégies de désamorçage

      Pour contrer l'anxiété (notamment liée à des sujets graves comme la guerre en Ukraine), Conf'kids propose :

      1. Le journalisme de solutions : Montrer qu'il existe des adultes et des collectifs qui travaillent activement sur les problèmes.

      2. La focalisation sur le cercle d'action : Inviter l'enfant à agir là où il a "la main" (dans sa classe, sa famille) pour éviter le sentiment d'impuissance.

      --------------------------------------------------------------------------------

      5. L'Enfant comme Acteur du Changement

      Le document souligne que le changement social suit souvent un modèle de "masse critique" :

      Le rôle du déviant positif : Une personne qui commence à agir différemment peut entraîner une minorité active, qui finit par devenir une majorité.

      L'effet domino : Une action simple à 8 ans (demander aux parents de trier ou de composter) peut modifier la dynamique familiale et, par extension, sociale.

      Co-design des solutions : On ne peut pas concevoir de solutions pour l'avenir sans intégrer les outils et les plateformes que la jeune génération estime valables.

      --------------------------------------------------------------------------------

      6. Intégration dans le Cadre Scolaire

      Le projet s'adresse également aux enseignants, souvent démunis face au manque de temps et de moyens.

      Opportunités pédagogiques

      Transdisciplinarité : Utiliser les thèmes de transition comme supports pour les matières fondamentales (ex: mathématiques sur les données énergétiques, conjugaison avec des verbes liés à l'écologie).

      Éducation Morale et Civique (EMC) : Travailler sur la démocratie et les projets collectifs (ex: projet "Enfants 2022" permettant un vote fictif pour les présidentielles).

      Pédagogie de projet : Développer une pensée en arborescence où un sujet de conférence devient le point de départ de multiples apprentissages.

      Ressources et partenariats

      Conf'kids collabore avec des partenaires variés (Édtech comme Ode, médias comme Mon Quotidien) pour toucher une grande diversité d'enfants, des milieux favorisés aux jeunes sous main de justice (PJJ), garantissant ainsi une pluralité de regards indispensable à la réflexion sur l'avenir.

    1. # ZSet用过吗

      要实现一个能实时获取近一天内浏览量最高数据的“天维度”排行榜,核心在于解决数据的“时效性”与“持久性”之间的矛盾

      你提到的“数据一旦写入便永久存储”确实是使用Redis ZSET时需要考虑的问题。针对“近一天”这个滑动的时间窗口,业界主要有两种经典的设计模式,你可以根据业务对实时性和精确度的要求来选择。

      方案一:按时间片分割 + 动态聚合(滑动窗口)

      这个方案的核心思想是将数据按照固定的时间片(如小时)切分存储,查询时再动态合并近24小时的数据。它能够精确地反映“从当前时间往前推24小时”这个动态变化的窗口,非常适合需要高实时性和高精度的场景。

      • 存储结构:为每个小时创建一个独立的ZSET Key,例如 views:article:20250401:14。Key中包含日期和小时,这样既清晰又便于管理。
      • 写入逻辑:当有浏览量产生时,通过 ZINCRBY 命令,将对应小时Key中对应文章的分数(score)加1。同时,可以为这个Key设置一个过期时间(例如48小时或72小时),让Redis自动清理旧数据,避免内存无限增长。
      • 读取逻辑(获取实时近24小时榜)
        1. 确定时间范围:计算当前时间前24小时包含了哪些小时段(例如,最多24个Key)。
        2. 合并数据:使用 ZUNIONSTORE 命令,将这些小时Key的分数进行聚合(SUM),结果存储到一个临时的ZSET中。
        3. 获取排行榜:从临时ZSET中通过 ZREVRANGE 获取浏览量最高的前N个文章ID。
        4. 清理临时Key:查询完成后,及时删除这个临时ZSET。

      这种方式的优点是数据精确,但由于需要动态聚合多个Key,如果并发量极高,频繁执行 ZUNIONSTORE 可能会对Redis造成一定压力。作为一种优化,可以考虑将排行榜结果缓存几秒到几十秒。

      方案二:单Key + Key过期(日维度轮替)

      这个方案更加简单直接,以“天”为粒度进行切割。它的逻辑很符合直觉:“今天的排行榜”就用一个单独的Key来存储

      • 存储结构:使用一个Key,例如 ranking:article:20250401,代表某一天的排行榜。
      • 写入逻辑:当产生浏览量时,调用 ZINCRBY ranking:article:20250401 {articleId} 1,累加当天的分数。
      • 读取逻辑:要获取“近一天”的排行榜,其实就是直接获取“今天”这个Key的Top N。因为“昨天”的数据已经不属于“近一天”的范畴了。
      • 数据生命周期管理
        • 设置TTL:在为当天的Key写入第一条数据时,为其设置一个过期时间,例如72小时。这样,昨天的数据在明天过后就会被自动清理。
        • 定时任务:你也可以通过一个定时任务(如每天凌晨),将昨天的排行榜数据持久化到MySQL等数据库中作为历史归档,然后删除或重命名Redis中的旧Key,为当天的新Key做准备。

      这种方案最大的优点是实现简单,查询性能极高(无需聚合)。但它无法做到真正的“实时滑动窗口”,因为每天的榜单在午夜0点会准时重置。如果你所说的“近一天”严格指“过去24小时”(例如,周一上午10点看的是周日上午10点到现在的数据),那么这个方案就不够精确了。

      方案对比与总结

      为了让你更清晰地选择,我将两种方案的核心区别整理如下:

      | 维度 | 方案一:按时间片分割 + 动态聚合 | 方案二:单Key + Key过期(日维度轮替) | | :--- | :--- | :--- | | 时间窗口精度 | 。是真正的、精确到秒的“过去24小时”滑动窗口。 | 。是固定的“自然日”窗口,每天0点重置。 | | 查询性能 | 中等。需要动态聚合多个Key,可能产生额外开销。 | 极高。直接读取单个Key即可。 | | 实现复杂度 | 较高。需要维护多个Key,并编写聚合逻辑。 | 非常简单。逻辑清晰易懂。 | | 内存效率 | 。可以对每个小时Key设置短TTL,及时释放内存。 | 一般。单个Key数据量大,但过期后才会释放。 | | 适用场景 | 对实时性要求极高的场景,如实时热点新闻榜、实时商品热销榜。 | 业务意义与自然日对齐的场景,如每日热销商品榜、每日用户积分榜。 |

      总结建议

      • 如果你的业务要求绝对实时,且“近一天”的定义是严格的“过去24小时”(如微博热搜),那么方案一是更合适的选择。
      • 如果你的业务场景中,“天榜”就是指自然日的排行(如每日销售额榜单),并且希望实现最简单,那么方案二就能很好地满足需求。

      在实际的大型系统中,这两种模式也常常结合使用。例如,可以按小时粒度存储原始数据(方案一),同时通过定时任务提前聚合好“今日榜单”(方案二的结果)以加速查询,从而兼顾精度与性能。

      希望以上的分析和方案能帮助你设计出合适的排行榜系统。如果你对某个方案的代码实现细节,或者如何对聚合查询做进一步优化感兴趣,我们可以继续深入探讨。

    1. Synthèse des Priorités et Défis de la Haute-Commissaire à l'Enfance

      • La Haute-Commissaire à l'Enfance présente sa feuille de route en soulignant la mission fondamentale de son Haut-Commissariat : coordonner les politiques publiques pour placer l'enfant au cœur des réflexions, en sortant des "silos" administratifs habituels.

      La Haute-Commissaire insiste sur l'importance de l'interministérialité et de la pluridisciplinarité professionnelle comme leviers pour répondre aux défis complexes liés à l'enfance.

      1. Mission et Définition de l'Enfance

      • Rôle du Haut-Commissariat : Créé en février, le Haut-Commissariat à l'enfance vise à remédier aux "faiblesses dans nos politiques de protection et de prévention" en renforçant l'interministérialité.

      L'objectif est de "penser autrement ces politiques publiques et donc de coordonner en étant d'une certaine manière le garant que l'enfant était au cœur et du coup il était au centre de ses réflexions".

      Il réunit tous les acteurs (associatifs, administrations, éducatifs).

      • Définition de l'enfant : La définition retenue est celle de l'article 1er de la Convention relative aux droits des enfants de 1989, couvrant "de la naissance à finalement la majorité", incluant la petite enfance (0-3 ans) et l'adolescence.

      2. Priorités Thématiques et Actions Engagées

      La Haute-Commissaire aborde plusieurs chantiers prioritaires, souvent interdépendants :

      Service Public de la Petite Enfance (0-3 ans) :

      • Problématiques : Manque de places et besoin d'améliorer la qualité et la compétence.

      Forte demande de reconnaissance des professionnels.

      • Actions : Travail sur l'attractivité des métiers (VAE inversée, accompagnement des formations), aides aux communes pour la montée en charge des compétences, collaboration avec les fédérations pour la prochaine COG (Convention d'Objectifs et de Gestion), expérimentation de "solutions hybrides" comme les crèches familiales ou scolaires.

      • Écrans en petite enfance : Publication d'un "référentiel qualité" interdisant les écrans dans les lieux d'accueil des 0-3 ans, posant des règles claires et accompagnant les PMI.

      Soutien à la Parentalité :

      • Considéré comme "un des outils (...) les plus puissants en terme de prévention et d'accompagnement".

      • Plan National de Soutien à la Parentalité : En cours de finalisation, il vise à "reposer (...) des repères, des soutiens, des espaces de dialogue avec les parents" face aux nouveaux défis, notamment numériques.

      Refonte du site "Je protège mon enfant".

      Adoption et Accès aux Origines :

      • Adoption : Améliorer les pratiques professionnelles pour accélérer les procédures et la mise en œuvre du "fichier national des familles adoptantes".

      Diffuser les outils législatifs existants (Loi Limon sur l'adoption simple).

      • Accueil durable bénévole/Tiers digne de confiance : Étude des différentes hypothèses, en soulignant l'importance de l'accompagnement spécialisé pour les familles adoptantes si elles envisagent un accueil durable, et la nécessité de "remuscler toutes les possibilités" d'accueil.

      Recherche des "familles de cœur" pour apporter stabilité et favoriser la désinstitutionnalisation.

      • Accès et Droit aux Origines : Réflexion sur la place des tests ADN (actuellement non autorisés en France sauf décision de justice), en raison des demandes des associations d'enfants.

      Reposer la question compte tenu des nouvelles réalités et de l'évolution législative en Europe.

      Lutte contre les Violences Faites aux Enfants :

      • Urgence : Constat d'une augmentation des alertes sur des violences, notamment chez les nourrissons.

      • Actions : Diffusion prochaine d'un questionnaire national pour "objectiver cette évolution et spécifier ces violences" (sexuelles, intrafamiliales, institutionnelles, physiques, psychologiques).

      Poursuite des travaux de prévention, détection des "signaux faibles", et capacité à "mieux accueillir la parole des enfants".

      • Prise en charge : Déploiement des "Unités d'accueil spécialisées pour les enfants" (UEJ) dans toutes les juridictions, coordination du soin médical, psychologique, social et judiciaire.

      Renforcement du 119 (campagne d'information, amélioration des canaux dont le chat, traitement des rappels et priorisation).

      • Loi Votrein : Projet de loi en préparation par la Ministre Votrein proposant des mesures concrètes : autorisation du cumul d'activité pour l'accueil familial, droit au répit, réinterrogation des modalités d'indemnisation (y compris pour l'accueil durable bénévole), reconnaissance du tiers digne de confiance.

      Enjeux Numériques et Écrans :

      • Constat : Présence accrue des écrans (70% des 8-10 ans sur réseaux sociaux), explosion du cyberharcèlement, conséquences documentées (addictions, santé mentale, obésité, troubles de l'apprentissage).

      • Réponses :Éducation numérique : Essentielle, avec des travaux pour une cohérence des messages portés par l'Éducation Nationale et l'éducation populaire.

      • École : Saisine de l'Inspection de l'Éducation Nationale pour un rapport sur les ENT (Environnements Numériques de Travail), visant un "droit à la déconnexion pour les parents et pour les enfants" (pas d'information réactualisée entre 20h et 7h, fermeture le weekend).

      Généralisation de la "pause numérique" (pas de portable au collège).

      • Réseaux Sociaux : Mise en œuvre de la loi Marcangelie interdisant les réseaux sociaux aux moins de 15 ans.

      Négociations européennes (Digital Service Act), avec une "coalition nouvelle" pour aller dans ce sens.

      • Contrôle d'identité et d'âge : Stabilisation de l'outil technique grâce à l'ARCOM et des structures comme Docapost, permettant un contrôle fiable de l'âge.

      Cela a conduit au départ de certaines plateformes pornographiques ne souhaitant pas utiliser ces outils. Projet de "mini wallet européen".

      • Place de l'Enfant dans l'Espace Public : Engagement du Haut-Commissariat pour le respect des droits des enfants et leur place dans le débat public.

      Suivi des travaux de la Convention Citoyenne sur les temps de l'enfant avec la participation d'enfants.

      3. Défis et Critiques Adressées à la Haute-Commissaire

      Plusieurs députés expriment des préoccupations majeures, remettant en question l'action et les moyens du Haut-Commissariat :

      • Crise de l'Enfance en France : Caroline Parmentier dénonce un "état catastrophique" de l'enfance, l'absence de ministre dédié et la multiplication des drames (crèches, pauvreté infantile, dysfonctionnements de l'ASE).

      Elle questionne l'investissement total de la Haute-Commissaire, engagée dans la campagne des municipales.

      • Manque de Volontarisme et de Moyens : Arnaud Bonet juge les propositions de la Haute-Commissaire "dans le bon sens" mais reste "sceptique" faute de "mobilisation de l'ensemble de notre société" et de "moyens réels".

      Il évoque une "cécité volontaire collective" aux violences faites aux enfants.

      • Problématiques Spécifiques :Cyberharcèlement lié aux prénoms : Mme Dubré alerte sur les publications péjoratives en ligne et demande des mesures.

      • Mineurs Non Accompagnés (MNA) : Mme Dubré souligne le manque de données fiables, l'absence de présomption de minorité et la complexité de leur prise en charge.

      La Haute-Commissaire réitère la position de la France d'accueillir les enfants "quelle que soit leur situation".

      • Santé Mentale des Jeunes Placés : Mme Dubré fait état d'un suivi psychologique insuffisant (40% n'en ont jamais bénéficié) et propose une meilleure formation des professionnels, un accès réel aux soins et des liens stables.

      La Haute-Commissaire évoque le déploiement de "Santé Protégée Péas" et le rôle des coordinateurs.

      • Défaillances de la Protection de l'Enfance (ASE) :Mme Hamdane dénonce une "politique de l'enfance symbolique, médiatique mais déconnectée de l'urgence", rappelant que 400 000 enfants sont en danger, que la France ne respecte pas ses engagements internationaux ni ses propres lois (loi Taquet inappliquée).

      Elle cite le rapport accablant de la commission d'enquête sur les "manques de pilotage national, rupture de parcours, recours abusif au placement à l'hôtel".

      • Mme Maximie exprime sa "colère" face à la mort d'une enfant placée (Aiden, 7 ans) et le silence public de la Haute-Commissaire et de la Ministre Votrein.

      Elle dénonce une inaction malgré des constats répétés.

      La Haute-Commissaire répond qu'elle travaille avec les acteurs concernés (ADF, départements, associations) mais ne communique pas systématiquement publiquement.

      Elle insiste sur la responsabilité de chacun.

      • Scolarisation des Enfants Vulnérables : Mme Piron alerte sur les "délais d'inscription et d'affectation scolaire anormalement longs" pour les enfants hébergés en urgence ou vivant dans des habitats précaires, ainsi que sur la situation "dramatique" à Mayotte (5000 enfants privés d'école).

      Elle demande des leviers pour garantir la scolarisation effective et la volonté d'intervenir à Mayotte.

      La Haute-Commissaire prend note des alertes et évoque un travail avec le Ministre des Outre-Mers pour renforcer les équipes et la priorité donnée à ces territoires.

      • Moyens du Haut-Commissariat : Mme Met interroge sur les "moyens humains et financiers" du Haut-Commissariat.

      La Haute-Commissaire précise disposer de six conseillers directs, d'un soutien de l'Éducation Nationale, et d'un renforcement à venir par des représentants des Outre-Mers et de la Justice, pour favoriser l'interministérialité.

      4. Réponse de la Haute-Commissaire aux Critiques

      La Haute-Commissaire se défend des accusations de manque d'investissement ou d'action, affirmant être pleinement engagée dans sa mission de coordination.

      Elle insiste sur :

      • La coordination des acteurs : Sa mission est de "réunir tout ce monde-là et de rappeler à chacun ses responsabilités et ses missions".

      • L'évaluation et le suivi : Provoquer les contrôles nécessaires, évaluer les politiques, suivre la mise en œuvre des engagements (y compris la loi Taquet).

      • La mobilisation collective : Nécessité d'une "prise de conscience générale" et que "chacun prend pleinement sa place et sa part".

      • Le soutien ministériel : Souligne le soutien de la Ministre Votrein et l'organisation d'un "comité interministériel sur l'enfance" pour évaluer les politiques publiques avec des indicateurs de suivi.

      • La complémentarité des actions : Défend la complémentarité entre la lutte contre le "no kids" ou la régulation des écrans et la protection de l'enfance la plus vulnérable. Elle précise que son action ne se limite pas à la communication publique.

      • Les freins à l'accompagnement des jeunes majeurs : Identification de problèmes d'accès à l'identité, au logement, manque de préparation à la sortie, inégalité de traitement entre départements.

      Elle évoque les travaux en cours pour améliorer cet accompagnement (soutien aux associations, parrainage).

      En conclusion, la Haute-Commissaire à l'Enfance se positionne comme une figure de coordination interministérielle, cherchant à décloisonner les politiques publiques pour une approche centrée sur l'enfant.

      Elle met en avant des chantiers concrets sur la petite enfance, la parentalité, l'adoption, la lutte contre les violences et le numérique.

      Néanmoins, elle fait face à des critiques virulentes de députés qui soulignent l'urgence d'une crise de l'enfance, le manque de moyens concrets et des défaillances institutionnelles persistantes, notamment dans la protection de l'enfance, remettant en cause l'effectivité de son action.

    1. Le Jeu comme Laboratoire de la Pensée : Synthèse de l'Intervention de Pauline Stavaux

      Ce document de synthèse analyse les interventions de Pauline Stavaux, philosophe au Pôle Philo (Belgique), concernant l'articulation entre la pratique du jeu et la démarche philosophique.

      Il explore les fondements théoriques, les enjeux pédagogiques et les applications pratiques de la ludo-pédagogie dans le développement de l'esprit critique.

      --------------------------------------------------------------------------------

      Résumé Exécutif

      L'intégration du jeu dans la pratique philosophique ne relève pas du simple divertissement, mais constitue une véritable stratégie de médiation pour rendre l'abstraction accessible.

      L'analyse souligne que le jeu offre un « espace-temps » sécurisant où l'erreur est dédramatisée, permettant aux participants — particulièrement les adolescents et les publics éloignés de la discipline — de s'emparer de concepts complexes par l'expérience vécue.

      Le point central de cette approche réside dans la tension entre le ressort ludique et l'objectif pédagogique. Pour que la philosophie opère, le jeu ne doit pas être un simple « enrobage » (le mythe du chocolat sur le brocoli), mais un moteur de réflexion.

      La phase de débriefing est identifiée comme l'étape la plus critique : c'est là que l'expérience vécue se transforme en savoir structuré et en habileté de pensée.

      --------------------------------------------------------------------------------

      1. Cadre Institutionnel et Objectifs

      L'approche présentée s'inscrit dans les travaux du Pôle Philo, un service de l'ASBL Laïcité Brabant Wallon (Wavre, Belgique).

      Mission : Rendre la philosophie accessible au plus grand nombre.

      Publics : Enfants (via la revue Philoé s’autobule, 8-13 ans), adolescents et adultes (notamment en milieux carcéraux).

      Finalité : Développement de l'esprit critique et des habiletés de pensée (problématisation, conceptualisation, argumentation).

      --------------------------------------------------------------------------------

      2. Ontologie du Jeu : Définitions et Conceptions Philosophiques

      Le jeu est un objet conceptuel complexe, souvent difficile à définir. L'analyse historique et philosophique révèle une évolution de son statut : de l'activité triviale à une dimension paradigmatique de l'humanité.

      Évolution du statut du jeu

      Le renversement de perspective au XVIIIe siècle (notamment avec Schiller) postule que « l’homme n’est pleinement homme que quand il joue ». Ce changement est porté par trois facteurs :

      1. Social : Le jeu devient un fait social à réguler (jeux d'argent, addiction).

      2. Épistémologique : Le jeu est reconnu comme un lieu de déploiement de l'intelligence (mathématiques, stratégie).

      3. Anthropologique : L'enfant est perçu comme un individu à part entière dont le jeu est le vecteur naturel d'apprentissage (Rousseau, Fröbel).

      Typologie des définitions théoriques

      | Auteur | Concepts Clés | Définition du Jeu | | --- | --- | --- | | Huizinga | Homo Ludens | Activité volontaire, limitée dans l'espace et le temps, avec des règles librement consenties mais impérieuses. | | Caillois | Critères du jeu | Activité libre, séparée, incertaine, improductive, réglée et fictive (faire semblant). | | Wittgenstein | Jeux de langage | Impossibilité d'une définition unique ; existence d'« airs de famille » et d'usages contextuels. | | Bernard Suits | Attitude lusoire | Effort volontaire pour surmonter des obstacles inutiles (contraintes superficielles) pour le plaisir. | | Gilles Brougère | Continuum ludique | Le jeu est défini par la simulation, la décision, l'indétermination et la minimisation des conséquences. | | Silva | Quatre niveaux | Matériel ludique, structure (règles), contexte (état psychique) et attitude ludique. |

      --------------------------------------------------------------------------------

      3. La Ludo-pédagogie : Enjeux et Tensions

      L'usage du jeu à des fins pédagogiques crée une tension intrinsèque entre la liberté nécessaire au jeu et l'obligation liée au cadre éducatif.

      Mythes et Précautions

      Le chocolat sur le brocoli : Le risque de ludo-pédagogie de « ruse » où le jeu ne sert qu'à masquer une tâche rébarbative sans cohérence entre le fond et la forme.

      La ludification superficielle : L'ajout de points ou de médailles ne suffit pas à créer une « attitude ludique ». L'intention de l'adulte ne garantit pas l'adhésion de l'apprenant.

      Le rôle de l'animateur : Proche du « maître de jeu », il doit guider sans orienter, soutenir l'effort de pensée sans confisquer la liberté des joueurs.

      Les apports du jeu à la philosophie

      Mise à égalité : Tous les participants vivent la même expérience en temps réel, indépendamment de leur capital culturel.

      Laboratoire de la pensée : Permet de tester des idées, de changer d'avis et d'utiliser l'erreur comme un matériau de construction.

      Dédramatisation : Le jeu brise l'image élitiste et « jargonneuse » de la philosophie.

      Ancrage corporel : Réintroduit la corporéité et le mouvement dans une discipline souvent perçue comme purement cérébrale.

      --------------------------------------------------------------------------------

      4. Outils et Dispositifs Pratiques

      Le document répertorie plusieurs typologies d'outils utilisés par le Pôle Philo pour susciter la réflexion.

      Jeux dédiés (créations originales)

      L’enquête philo : Un groupe d'« experts » (conséquences, exemples, images) doit faire deviner un concept à des « enquêteurs » via des indices basés sur les habiletés de pensée.

      Memory Philo : Travail sur la mémoire et le passé (recherche de doublons déclenchant des dilemmes).

      Les épis de blé : Jeu de l'oie où les participants incarnent des souris aux capacités inégales pour faire vivre les concepts d'inégalité et d'équité.

      Le Chamboule-tout : Transformer des affirmations inscrites sur des boîtes en questions pour pouvoir les abattre.

      Détournement de jeux du commerce

      Certains jeux non étiquetés « philosophie » possèdent des mécaniques ou des thèmes hautement problématisants :

      | Jeu | Thème Philosophique | Ressort Ludique | | --- | --- | --- | | Dixit | Interprétation / Concept | Utilisation d'images oniriques pour incarner des idées abstraites. | | Mito | Morale / Transgression | La triche est obligatoire et régulée par les règles du jeu. | | Hanabi | Confiance / Altérité | Les joueurs ne voient pas leur propre jeu et dépendent totalement du regard des autres. | | Happy City | Bonheur / Utilitarisme | Construction d'une ville en maximisant des indicateurs de plaisir. | | Imagine | Conceptualisation | Superposition de pictogrammes transparents pour créer des concepts visuels. |

      --------------------------------------------------------------------------------

      5. Synthèse des Recommandations Pratiques

      Pour réussir une séance de philosophie par le jeu, l'animateur doit respecter plusieurs principes méthodologiques :

      1. Priorité au débriefing : On apprend moins en jouant qu'en analysant ce qui a été vécu. Le temps de discussion doit être proportionné à la phase de jeu.

      2. Sélection rigoureuse : Choisir des jeux aux règles simples et aux parties courtes pour privilégier le temps de réflexion.

      3. Gestion de l'imprévu (Le Kairos) : Saisir les incidents de jeu (conflits, tricheries, frustrations) comme des opportunités de questionnement philosophique immédiat.

      4. Liberté d'adhésion : Respecter le droit de ne pas jouer. L'observation peut être une forme d'implication ludique et un support de réflexion pour le groupe.

      5. Cohérence fond/forme : La mécanique de jeu (ex: hasard, coopération, compétition) doit faire écho au concept traité (ex: destin, solidarité, justice).

    1. Document de Synthèse : Les Ateliers ADER (Ateliers Démocratiques pour une Expression Réflexive)

      Ce document présente une analyse détaillée du dispositif ADER, conçu par Dominique Mulner.

      Il synthétise les fondements théoriques, les objectifs pédagogiques, le protocole de mise en œuvre et les modalités d'accompagnement de ce dispositif visant à développer l'esprit critique et à améliorer le climat scolaire.

      --------------------------------------------------------------------------------

      Résumé Exécutif

      Les Ateliers Démocratiques pour une Expression Réflexive (ADER) constituent un dispositif pédagogique structuré, fruit de vingt années d'évolution et d'expérimentation, notamment auprès d'élèves "empêchés de penser".

      Conçu par Dominique Mulner, ce dispositif s'adresse à un public large, de la moyenne section de maternelle à la classe de terminale, incluant les réseaux d'éducation prioritaire (REP+) et l'enseignement spécialisé (SEGPA, ULIS).

      L'essence d'ADER réside dans la création d'un espace-temps sécurisant où la parole de chaque élève est reconnue comme celle d'un "interlocuteur valable".

      En s'appuyant sur un protocole ritualisé et une répartition de responsabilités concrètes, le dispositif vise quatre objectifs majeurs : le développement de l'esprit critique, l'amélioration du climat scolaire, le renforcement des compétences psychosociales et l'éducation à la citoyenneté.

      Contrairement à d'autres approches, ADER privilégie une approche systémique impliquant l'ensemble de l'institution scolaire et refuse l'évaluation sommative au profit d'une observation fine des progrès de l'élève.

      --------------------------------------------------------------------------------

      1. Genèse et Fondements Théoriques

      Le dispositif ADER est indissociable du parcours de sa conceptrice, Dominique Mulner, ancienne enseignante spécialisée en milieu hospitalier et membre d'équipes mobiles de sécurité.

      Un héritage pluridisciplinaire

      Le dispositif synthétise les apports de plusieurs courants de la pédagogie et de la psychologie :

      Jacques Lévin : Pour le concept d'enfant "interlocuteur valable" et l'importance de la mise en pensée interne.

      Michel Tozi : Pour les outils d'argumentation, de problématisation et de conceptualisation.

      Célestin Freinet : Pour la pédagogie centrée sur l'élève et la coopération.

      Carl Rogers : Pour les principes de congruence, d'empathie et de bienveillance.

      Lev Vygotski : Pour le rôle de l'interaction sociale dans les apprentissages.

      Jonathan Cohen & Éric Debarbieux : Pour les recherches sur le climat scolaire systémique.

      La lutte contre "l'empêchement"

      ADER a été initialement conçu pour des élèves présentant des troubles du comportement, souvent "empêchés d'être, de parler et de penser".

      L'objectif est de restaurer l'estime de soi par la reconnaissance inconditionnelle de la pensée de l'individu au sein d'un groupe.

      --------------------------------------------------------------------------------

      2. Les Quatre Piliers du Dispositif

      Le dispositif ADER s'articule autour de quatre axes interdépendants qui structurent l'intervention :

      | Pilier | Objectifs Clés | | --- | --- | | Esprit Critique | Distinguer croire et savoir ; raisonner de manière autonome ; lutter contre le dogmatisme et la pensée unique. | | Climat Scolaire | Créer un environnement serein et sécure ; prévenir le harcèlement ; favoriser la cohésion de groupe. | | Compétences Psychosociales | Développer l'empathie, la gestion des émotions et la prise de responsabilité. | | Citoyenneté | Pratiquer la démocratie en acte ; respecter les règles communes ; reconnaître l'autre dans sa différence. |

      --------------------------------------------------------------------------------

      3. Le Protocole Opérationnel de l'Atelier

      La force d'ADER repose sur un cadre solide, strict et ritualisé, qui garantit la sécurité psychologique des participants.

      L'aménagement de l'espace et du temps

      Le Cercle : Les tables sont supprimées. Les chaises sont disposées en cercle ou en ovale pour instaurer une horizontalité totale entre élèves et adultes.

      L'Ordre d'Arrivée : Les élèves entrent un par un et s'asseyent selon leur ordre d'arrivée pour casser les alliances préétablies et prévenir les phénomènes de bouc émissaire.

      La Temporalité : Les séances durent généralement une heure (plus courtes en maternelle).

      Elles ne doivent pas être des interventions ponctuelles ("one shot") mais s'inscrire dans une régularité (cycle de 12 séances).

      Les phases de la séance

      1. Accueil ritualisé : Accueil hors de la salle, dépôt des affaires, entrée calme.

      2. Ouverture : Lecture d'un texte rituel rappelant les règles (pas de moquerie, pas de jugement, ni bonne ni mauvaise réponse).

      3. Distribution des "cavaliers" : Chaque participant (élèves et adultes) possède un carton avec son prénom, posé au sol.

      4. Attribution des responsabilités : Choix volontaire des élèves pour les fonctions de la séance.

      5. Discussion réflexive : Débat autour d'une question universelle (ex: "Peut-on vivre dans un monde sans violence ?").

      6. Tour de cercle final : Chaque élève résume sa pensée ou dit "je réfléchis".

      7. Lecture des notes et clôture : Lecture des idées retenues par les scripteurs et rituel de sortie.

      --------------------------------------------------------------------------------

      4. La Responsabilisation des Élèves

      ADER remplace la notion de "rôles" (souvent perçus comme du jeu) par celle de responsabilités. Ces fonctions sont essentielles à l'autorégulation du groupe :

      Responsables de la parole : Distribuent la parole en priorité à ceux qui ne se sont pas encore exprimés.

      Responsables de la prise de notes : Constituent la mémoire collective de l'atelier en notant les idées (et non les noms).

      Responsables du règlement et du temps : Veillent au respect des règles de communication et gèrent la durée via un chronomètre ou une horloge.

      Note sur l'inclusion : Le dispositif permet à des élèves en difficulté (dyslexiques, élèves d'ITEP) de s'investir.

      L'exemple d'un élève dyspraxique reprenant l'écriture manuscrite pour assumer sa responsabilité de scripteur illustre l'impact de cette valorisation.

      --------------------------------------------------------------------------------

      5. La Posture de l'Adulte et la Formation

      Le passage de la posture d'enseignant "sachant" à celle d'animateur "accompagnant" est au cœur de la réussite du dispositif.

      Une posture spécifique

      L'adulte ne juge pas, ne valide pas les réponses ("c'est bien", "c'est juste") et n'interrompt pas. Il soutient l'effort de réflexion, relance par la maïeutique socratique et garantit le cadre.

      Un accompagnement au long cours

      La formation des adultes ne se limite pas à une présentation théorique :

      6h de théorie initiale.

      6h de pratique accompagnée en classe par un formateur.

      Analyses de pratiques approfondies tout au long de l'année.

      Approche systémique : Implication du chef d'établissement et de l'équipe pédagogique globale pour inscrire l'atelier dans le projet d'établissement.

      --------------------------------------------------------------------------------

      6. Analyse et Évaluation du Dispositif

      ADER refuse la notation chiffrée mais intègre des outils d'évaluation qualitative et d'observation :

      Grilles d'observation : Suivi du nombre de prises de parole et de l'évolution de la qualité argumentative.

      Analyse de pratique : Temps dédié après chaque séance pour que les animateurs évaluent le climat de classe et l'élaboration de la pensée.

      Impacts observés : Diminution des violences et du harcèlement, meilleure gestion des conflits en récréation, et réengagement des élèves allophones ou en grande difficulté scolaire.

      Conclusion sur l'Expression Réflexive

      Comme le souligne Dominique Mulner, l'expression réflexive signifie que la parole de l'un engage la réflexion de l'autre, créant ainsi une "communauté de recherche".

      Ce processus permet de passer de la croyance personnelle au savoir partagé, transformant le groupe-classe en une véritable entité pensante et démocratique.

  2. Feb 2026
    1. Adolescents en Détresse Psychiatrique : Analyse de la Prise en Charge en Milieu Hospitalier

      Résumé Exécutif

      Ce document propose une analyse détaillée du quotidien et des protocoles de soins au sein d'une unité de pédopsychiatrie, basée sur l'observation de jeunes patients au CHU de Besançon.

      L'unité accueille des adolescents âgés de 12 à 18 ans souffrant de pathologies diverses, allant de la dépression sévère et l'anorexie à des troubles psychotiques complexes.

      La stratégie thérapeutique repose sur un équilibre fragile entre le traitement médicamenteux (souvent lourd), la régulation stricte des interactions avec le monde extérieur et la rééducation sociale par des activités encadrées et une scolarisation interne.

      L'objectif central est de stabiliser des esprits en plein développement tout en évitant des diagnostics définitifs prématurés. Le succès de la prise en charge dépend largement de l'implication familiale, bien que de nombreux patients soient issus de contextes sociaux ou médicaux précaires.

      --------------------------------------------------------------------------------

      1. Cadre Institutionnel et Profils Cliniques

      L'unité de pédopsychiatrie du CHU de Besançon est située dans des bâtiments historiques du XVIIe siècle.

      Malgré un environnement marqué par des contraintes physiques (barreaux aux fenêtres), elle constitue un lieu de refuge pour des jeunes dont les solutions de prise en charge antérieures ont échoué.

      Caractéristiques de la population accueillie

      Tranche d'âge : 12 à 18 ans.

      Durée moyenne de séjour : Plus de 4 mois.

      Pathologies observées :

      ◦ Troubles du comportement et accès de violence grave (Maxence).   

      ◦ Dépression sévère et phobie scolaire (Anna).  

      ◦ Troubles alimentaires, notamment l'anorexie mentale (Laura).  

      ◦ Troubles de l'humeur, paranoïa et instabilité émotionnelle (Valentin).  

      ◦ Symptômes de psychose et bipolarité (Sarah).

      La problématique du diagnostic à l'adolescence

      Le corps médical adopte une posture de prudence concernant le diagnostic définitif.

      Le Dr Noël Dark précise que l'esprit de l'adolescent évolue rapidement, justifiant le refus de "mettre les enfants dans des cases".

      L'observation clinique continue prime sur l'étiquetage pathologique prématuré.

      --------------------------------------------------------------------------------

      2. L'Arsenal Thérapeutique : Médication et Régulation

      Le traitement au sein de l'unité combine des approches chimiques et comportementales pour réguler les symptômes les plus invalidants.

      Pharmacologie et effets secondaires

      La plupart des patients reçoivent des traitements médicamenteux, parfois sous une forme injectable (neuroleptiques).

      | Type de Médicament | Usage mentionné | Observations | | --- | --- | --- | | Antipsychotiques | Traitement de la schizophrénie ou psychose. | Utilisés pour Maxence et Sarah. Efficaces sur les crises mais induisent de lourds effets secondaires (tremblements, fatigue, élocution difficile). | | Anxiolytiques | Gestion des crises d'angoisse aiguës. | Utilisés pour Laura lors de crises de paralysie apparente. | | Antidépresseurs / Régulateurs d'humeur | Stabilisation émotionnelle. | Font l'objet d'autorisations spéciales pour les mineurs. |

      Le cadre réglementaire comme outil de soin

      L'isolement du monde extérieur est considéré comme une composante thérapeutique essentielle. Le règlement intérieur impose :

      Régulation des communications : L'accès au téléphone et à Internet est strictement limité pour réduire l'anxiété liée à l'extérieur.

      Contrôle des objets personnels : Les tablettes et téléphones peuvent être saisis en cas de non-respect des règles ou de suspicion d'utilisation inappropriée.

      Permissions de sortie : Elles sont soumises à l'autorisation médicale et servent de test de stabilité pour les patients.

      --------------------------------------------------------------------------------

      3. Activités Thérapeutiques et Réinsertion Scolaire

      Pour détourner les adolescents de leur pathologie, l'unité propose un programme structuré visant à restaurer la confiance en soi et le lien social.

      Activités Physiques (ex. Escalade) : Ces séances permettent aux jeunes de se réapproprier leur corps et de développer une assurance corporelle.

      Pour des patients comme Valentin, cela représente une "aide thérapeutique" cruciale.

      Scolarité Interne : Des professeurs de l'Éducation nationale interviennent pour des sessions de 4 à 5 heures par semaine, en petits groupes (maximum 4 élèves).

      Ce dispositif vise à rompre l'isolement créé par la phobie scolaire et à préparer un retour progressif en milieu ordinaire.

      --------------------------------------------------------------------------------

      4. Gestion de la Crise et Protocoles d'Urgence

      Face à des comportements d'autolyse ou de violence extrême, le personnel soignant doit parfois recourir à des mesures de contrainte.

      La Contention : Utilisée en ultime recours pour protéger le patient contre lui-même (tentative de suicide ou fugue).

      Il s'agit d'un acte médical prescrit qui consiste à attacher le patient à son lit.

      L'augmentation des doses : En phase de crise aiguë (ex. Sarah), les neuroleptiques sont administrés pour provoquer un apaisement chimique rapide.

      --------------------------------------------------------------------------------

      5. La Sphère Familiale : Entre Soutien et Déterminisme

      L'environnement familial joue un rôle ambivalent dans la pathologie et la guérison des adolescents.

      Facteurs de stress : Les séparations parentales ou les antécédents médicaux familiaux sont fréquents.

      Valentin, par exemple, a une mère elle-même hospitalisée en psychiatrie adulte, ce qui limite les visites et complexifie son équilibre émotionnel.

      Implication nécessaire : Les médecins organisent des rendez-vous toutes les trois semaines pour impliquer les parents.

      Pour Jessica, mère d'Anna, l'hospitalisation est perçue comme "un mal pour un bien", une reconnaissance de l'incapacité parentale à gérer seule une détresse profonde.

      Ambivalence relationnelle : Les rapports entre l'adolescent hospitalisé et sa famille sont souvent marqués par la froideur ou l'agressivité, symptômes directs de la pathologie ou de la crise d'adolescence exacerbée par le milieu hospitalier.

      --------------------------------------------------------------------------------

      6. Perspectives de Sortie

      Le retour à une vie "normale" est un processus lent et non linéaire. Le cas d'Anna illustre cette transition : après trois mois d'hospitalisation, une première heure de cours en collège ordinaire représente une victoire majeure, bien que fragile.

      Le succès est conditionné par la capacité du jeune à supporter à nouveau le regard de ses pairs et les exigences du monde extérieur sans basculer dans l'angoisse ou la violence.

    1. L'Intelligence Humaine : Analyse Scientifique, Mythes et Réalités

      Ce document de synthèse s'appuie sur les travaux et les interventions de Franck Ramus, directeur de recherche au CNRS, pour clarifier la notion d'intelligence, sa mesure et les données empiriques accumulées depuis plus d'un siècle.

      Résumé Exécutif

      L'intelligence, bien que perçue de manière intuitive dans le langage courant, possède une définition scientifique rigoureuse centrée sur le facteur g (intelligence générale).

      Contrairement aux idées reçues, les tests de quotient intellectuel (QI) ne sont pas des mesures arbitraires mais des outils statistiquement robustes, stables dans le temps et dotés d'une forte valeur prédictive pour la réussite scolaire, la santé et la longévité.

      La recherche actuelle démontre que l'intelligence résulte d'une interaction complexe entre des facteurs génétiques (environ 50 %) et environnementaux (éducation, milieu socio-économique, facteurs prénataux).

      Enfin, les données infirment le mythe selon lequel les personnes à haut potentiel intellectuel (HPI) seraient plus vulnérables aux troubles psychologiques ou à l'échec scolaire.

      --------------------------------------------------------------------------------

      1. Définitions et Mesures de l'Intelligence

      1.1. Vers une définition scientifique

      L'intelligence est souvent définie intuitivement comme la capacité de raisonnement, d'adaptation ou de culture. Scientifiquement, elle est opérationnalisée de deux manières principales :

      Capacité mentale générale : Selon Linda Gottfredson (1997), elle implique de raisonner, planifier, résoudre des problèmes, penser de manière abstraite, comprendre des idées complexes et apprendre rapidement de l'expérience.

      Le Facteur g (Intelligence Générale) : Découvert par Charles Spearman en 1904, il repose sur le constat statistique que les performances d'un individu dans différentes matières scolaires ou tests cognitifs sont corrélées positivement.

      Le facteur g représente la "part commune" de la variabilité entre les individus dans tous les domaines cognitifs.

      1.2. La mesure par les batteries de tests

      L'intelligence ne se mesure pas par un test unique, mais par une batterie (comme l'échelle de Wechsler : WPPSI, WISC, WAIS) qui explore diverses fonctions :

      Intelligence fluide : Raisonnement logique et abstrait.

      Intelligence cristallisée : Compétences verbales et vocabulaire.

      Capacités visio-spatiales.

      Mémoire de travail et vitesse de traitement.

      1.3. La structure du Quotient Intellectuel (QI)

      Le QI n'est pas un chiffre absolu mais un score relatif à une population de même âge et de même pays.

      Moyenne : Fixée par convention à 100.

      Écart-type : Fixé à 15.

      Distribution : 68 % de la population se situe entre 85 et 115. Un score de 130 (deux écarts-types au-dessus de la moyenne) définit le seuil du "haut potentiel".

      --------------------------------------------------------------------------------

      2. Fiabilité et Validité des Tests

      2.1. Stabilité temporelle

      Le QI est l'une des caractéristiques humaines les plus stables.

      Les études montrent une corrélation de 0,6 dès l'âge de 6 ans avec le score à l'âge adulte.

      Une étude écossaise a même démontré une corrélation forte entre les tests passés à 11 ans et ceux passés à 80 ans, sur un intervalle de 69 ans.

      2.2. Validité prédictive (Vie réelle)

      Le score de QI est un prédicteur robuste de plusieurs indicateurs de vie :

      | Domaine | Observations et Corrélations | | --- | --- | | Réussite scolaire | Corrélation forte (environ 0,5). Prédit mieux la réussite que la motivation seule. | | Revenus | Corrélation positive, bien que plus faible, médiée principalement par le niveau d'études. | | Santé et Longévité | Relation linéaire : un QI plus élevé est associé à une mortalité plus faible (moins d'accidents, meilleure gestion des risques de santé). | | Succès professionnel | Le QI est plus prédictif que le quotient émotionnel (QE), contrairement aux affirmations non étayées de certains auteurs. |

      --------------------------------------------------------------------------------

      3. Analyse des Critiques et Mythes

      3.1. Les Intelligences Multiples (Howard Gardner)

      Cette théorie suggère l'existence de huit intelligences indépendantes (musicale, motrice, etc.). Cependant :

      • Il s'agit souvent d'une redéfinition sémantique de "capacités cognitives" déjà connues.

      • Les tests mesurant ces capacités montrent qu'elles sont en réalité corrélées entre elles, confirmant l'existence du facteur g.

      • Le succès de cette théorie est jugé "démagogique" car il flatte l'idée que "chacun est intelligent à sa manière", sans base empirique solide.

      3.2. Le "Malheur" des Hauts Potentiels (HPI)

      Les médias diffusent souvent l'idée que 70 % des surdoués seraient en échec scolaire ou souffriraient de troubles psychiatriques.

      Les données scientifiques contredisent formellement ces affirmations :

      Biais d'échantillonnage : Les psychologues en cabinet ne voient que les HPI qui consultent pour des problèmes.

      La majorité des HPI ne consultent jamais et réussissent mieux que la moyenne.

      Santé mentale : Les études de cohortes (ex: UK Biobank sur 16 000 HPI) montrent que les personnes à haut QI ont moins de risques de souffrir d'anxiété, de dépression, de schizophrénie ou d'isolement social.

      Réussite : Les élèves HPI obtiennent en moyenne de bien meilleures notes et réussissent mieux leurs examens (ex: brevet des collèges) que les élèves au QI moyen.

      --------------------------------------------------------------------------------

      4. Origines de l'Intelligence : Gènes et Environnement

      L'opposition entre l'inné et l'acquis est jugée binaire et dépassée.

      La recherche montre que les deux facteurs sont indissociables.

      4.1. Facteurs Génétiques

      Héritabilité : Estimée à environ 50 %. Cela signifie que 50 % des différences d'intelligence entre les individus d'une population sont dues à des différences génétiques.

      Preuves : Études de jumeaux (les monozygotes sont plus similaires que les dizygotes) et études d'adoption.

      Génomique : Des milliers de variations génétiques à effets minuscules contribuent à l'intelligence. Il n'existe pas de "gène de l'intelligence" unique.

      4.2. Facteurs Environnementaux

      Environnement biologique : L'exposition prénatale aux drogues, certains médicaments, les infections (toxoplasmose, etc.) ou le manque d'oxygène à la naissance peuvent impacter le QI.

      Scolarisation : Chaque année d'étude supplémentaire augmente le QI d'environ 1 à 5 points.

      Adoption : Un changement radical d'environnement (passage d'un milieu très défavorisé à un milieu favorisé) peut faire gagner jusqu'à 15 à 25 points de QI.

      4.3. L'intrication des facteurs

      Le statut socio-économique (SSE) des parents influe sur le QI de l'enfant, mais le SSE est lui-même partiellement corrélé aux gènes des parents (transmis à l'enfant). Ainsi, environ la moitié de la corrélation entre le milieu social et le QI de l'enfant est d'origine génétique, l'autre moitié étant purement environnementale.

      --------------------------------------------------------------------------------

      5. Conclusions

      L'intelligence générale est une réalité biologique et statistique mesurable.

      Bien que les tests ne capturent pas l'intégralité de l'esprit humain (omission de la créativité pure, de l'humour ou des talents artistiques), ils constituent l'outil le plus fiable et le plus prédictif dont dispose la psychologie pour comprendre les différences individuelles.

      L'intelligence est une ressource protectrice : statistiquement, un QI élevé est un avantage pour la santé, la réussite et le bien-être social.

    1. L'Inclusion Scolaire des Enfants en Situation de Handicap : Enjeux, Obstacles et Modèles d'Espoir

      Synthèse Executive

      Ce document examine la situation critique de la scolarisation des enfants en situation de handicap en France, telle qu'exposée dans le documentaire "L'école de tous les possibles".

      Le constat initial est alarmant : plus d'un quart des enfants en situation de handicap ne reçoivent aucun enseignement, et plus de la moitié bénéficient d'une scolarité réduite au minimum (moins de 6 heures par semaine).

      Face à un système perçu comme défaillant ou purement "occupationnel", des initiatives privées comme le Centre des possibles, fondé par l'enseignante Laura Cobigo, démontrent qu'une pédagogie adaptée et une exigence académique peuvent révéler le potentiel d'enfants souffrant de lourds retards moteurs et cognitifs.

      La réussite de l'inclusion repose sur une "alchimie" complexe mêlant outils de communication alternative, présence d'Accompagnants des Élèves en Situation de Handicap (AESH) formés et volonté politique.

      Cependant, l'épuisement des parents et des professionnels, couplé à une rigidité administrative, fait de l'inclusion scolaire un objectif encore largement inabouti.

      --------------------------------------------------------------------------------

      1. Un État des Lieux Alarmant de la Scolarisation

      Le système éducatif actuel semble opposer des obstacles systématiques aux familles.

      Les chiffres et témoignages soulignent une rupture entre le droit fondamental à l'éducation et la réalité du terrain.

      Statistiques Clés de l'Exclusion

      | Catégorie d'enfants | Niveau de scolarisation | | --- | --- | | Enfants en situation de handicap | Plus de 25 % ne reçoivent aucun enseignement. | | Enfants scolarisés | Plus de 50 % ont moins de 6 heures de classe par semaine. | | Enfants en milieu ordinaire | Souvent limités au minimum légal, avec une sensation de "transparence" dans la classe. |

      Les Défaillances du Système Institutionnel

      L'approche "occupationnelle" : Dans certains Instituts Médico-Éducatifs (IME), la culture du "prendre soin" (hygiène, alimentation) prime sur les apprentissages cognitifs.

      Certains enfants ne reçoivent que 30 minutes à 3 heures d'enseignement par semaine, souvent arrêtées définitivement à l'âge de 16 ans.

      Le manque de formation : Les enseignants et les AESH en milieu ordinaire se sentent souvent démunis, n'ayant pas les outils pour adapter leurs ateliers aux handicaps lourds.

      La ségrégation déguisée : Plusieurs parents dénoncent une inclusion de façade qui s'apparente en réalité à une mise à l'écart, faute de moyens et de volonté pédagogique.

      --------------------------------------------------------------------------------

      2. Le Centre des Possibles : Une Alternative Pédagogique

      Situé à Guidel (Morbihan), le Centre des possibles propose un modèle de rupture.

      Fondé par Laura Cobigo, elle-même mère d'un enfant polyhandicapé, cette structure prouve que le handicap lourd n'exclut pas l'accès au savoir.

      Les Piliers de la Méthode

      La croyance dans le potentiel : Contrairement aux diagnostics médicaux parfois pessimistes, l'école mise sur les capacités d'apprentissage de chaque enfant.

      L'accompagnement "un pour un" : Chaque enfant est guidé individuellement par une éducatrice ou une volontaire, évitant tout "temps mort" et respectant son rythme biologique.

      La Communication Alternative Augmentée (CAA) : L'usage intensif de pictogrammes, de logiciels de synthèse vocale et de classeurs POD (Pragmatic Organisation Dynamic Display) permet aux enfants non-verbaux d'exprimer leurs besoins et d'entrer dans les apprentissages scolaires.

      Pluridisciplinarité : Le travail est couplé à une approche thérapeutique (ergothérapie, psychomotricité) pour libérer les capacités motrices nécessaires aux outils scolaires (tenir un crayon, utiliser un contacteur).

      --------------------------------------------------------------------------------

      3. Obstacles Structurels et Parcours des Familles

      Le document met en lumière le "parcours du combattant" des parents, marqué par l'errance diagnostique et la lutte constante contre l'administration.

      Les Barrières à l'Inclusion Réussie

      1. Le statut des AESH : Leur rôle est pivot, mais leur précarité (salaire dérisoire, manque de formation et de reconnaissance) fragilise tout l'édifice de l'inclusion.

      2. La rigidité administrative : Les délais de l'Éducation Nationale (cycles de 5 à 10 ans) sont jugés incompatibles avec l'urgence du développement de l'enfant.

      3. Le refus de partenariat : Le cas de Maywen illustre l'échec de l'inclusion quand l'école ordinaire refuse l'aide de structures spécialisées pour former ses équipes.

      4. Le manque de places en soin : Les listes d'attente interminables pour les services comme les SESSAD contraignent les parents à assumer eux-mêmes la rééducation en libéral, au prix de leur vie professionnelle.

      L'Impact sur les Aidants

      L'épuisement mental et physique est omniprésent. Laura Cobigo elle-même a dû faire face à un épuisement professionnel ("burnout"), illustrant la fragilité d'un système qui repose sur la bonne volonté individuelle plutôt que sur une structure pérenne.

      Les parents sont souvent catalogués comme "ne voulant pas accepter le handicap" lorsqu'ils demandent plus de stimulations ou de soins.

      --------------------------------------------------------------------------------

      4. Vers une Inclusion Réelle : Conditions de Réussite

      Malgré les difficultés, des succès individuels comme ceux de Gabin ou William montrent la voie à suivre.

      L'inclusion réussie est définie comme une "alchimie" nécessitant :

      • Une pédagogie adaptée : Des supports individualisés et des outils technologiques de pointe.

      • Une expertise partagée : Des enseignants spécialisés venant en appui aux enseignants de classe ordinaire.

      • Un engagement humain : Des AESH investis et formés spécifiquement au polyhandicap.

      • La socialisation par les pairs : L'interaction avec des enfants valides est un moteur puissant pour l'acquisition de compétences (ex: la marche, les codes sociaux).

      Conclusion

      Le Centre des possibles démontre que la scolarisation des enfants les plus lourdement handicapés est non seulement possible, mais essentielle pour leur citoyenneté.

      Toutefois, cette réussite reste précaire car elle dépend encore trop largement de rencontres fortuites et de l'abnégation des familles.

      Pour que l'inclusion ne soit plus un "mirage", une transformation profonde de la culture médico-sociale et une dotation massive de moyens humains et techniques dans l'Éducation Nationale s'avèrent indispensables.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public review:

      Weaknesses:

      (1) Controls for the genetic background are incomplete, leaving open the possibility that the observed oviposition timing defects may be due to targeted knockdown of the period (per) gene but from the GAL4, Gal80, and UAS transgenes themselves. To resolve this issue the authors should determine the egg-laying rhythms of the relevant controls (GAL4/+, UAS-RNAi/+, etc); this only needs to be done for those genotypes that produced an arrhythmic egg-laying rhythm.

      (2) Reliance on a single genetic tool to generate targeted disruption of clock function leaves the study vulnerable to associated false positive and false negative effects: a) The per RNAi transgene used may only cause partial knockdown of gene function, as suggested by the persistent rhythmicity observed when per RNAi was targeted to all clock neurons. This could indicate that the results in Fig 2C-H underestimate the phenotypes of targeted disruption of clock function. b) Use of a single per RNAi transgene makes it difficult to rule out that off-target effects contributed significantly to the observed phenotypes. We suggest that the authors repeat the critical experiments using a separate UAS-RNAi line (for period or for a different clock gene), or, better yet, use the dominant negative UAS-cycle transgene produced by the Hardin lab (https://doi.org/10.1038/22566).

      We have followed the referee advice,repeating the experiments with the dominant negative UAS-cyc<sup>DN</sup>. They nicely confirm our conclusions: the abolition of the cellular clock in LNd neurons rule out the rhythmicity of oviposition. The results are presented in Fig. 3 of the new manuscript, panels H to N. We thank the reviewer for this suggestion that has definitely improved our paper, since it allows us to confirm our result using both a different driver and a different UAS sequence. In addition, we included the required GAL4 controls, which can be found in Panels E, L of the figure as well as average egglaying profiles for all genotypes involved (Panels B, D, F, I, K and M). Regarding the MB122Bsplit-Gal4>UAS-per<sup>RNAi</sup> experiment, we moved it to a supplementary figure (Figure 3S1). The paragraph where the new Figure 3 is discussed has been modified accordingly.

      (3) The egg-laying profiles obtained show clear damping/decaying trends which necessitates careful trend removal from the data to make any sense of the rhythm. Further, the detrending approach used by the authors is not tested for artifacts introduced by the 24h moving average used.

      The method used for the assessment of rhythmicity is now more fully explained and tested in the supplementary material. In particular, the issue of trend removal is treated in the second section of the SM, and the absence of "artifacts" (interpreted as the possibility of deciding that a signal is rhythmic when it is not, or vice versa) shown in figs. S3 to S5.

      (4) According to the authors the oviposition device cannot sample at a resolution finer than 4 hours, which will compel any experimenter to record egg laying for longer durations to have a suitably long time series which could be useful for circadian analyses.

      The choice of sampling every 4 hours is not due to a limitation imposed by the device used. In fact the device can be programmed to move at whatever times are desired. As mentioned in the Material and Methods section, "more frequent sampling gives rise to less consistent rhythmic patterns", because the number of eggs sampled at each time slot become too small. In particular, we have tested sampling at intervals of 2 hours, and we have observed that this doubles the work performed by the experimenter but does not lead to an improvement in the assessment of rhythmicity.

      (5) Despite reducing the interference caused by manually measuring egg-laying, the rhythm does not improve the signal quality such that enough individual rhythmic flies could be included in the analysis methods used. The authors devise a workaround by combining both strongly and weakly rhythmic (LSpower > 0.2 but less than LSpower at p < 0.05) data series into an averaged time series, which is then tested for the presence of a 16-32h "circadian" rhythm. This approach loses valuable information about the phase and period present in the individual mated females, and instead assumes that all flies have a similar period and phase in their "signal" component while the distribution of the "noise" component varies amongst them. This assumption has not yet been tested rigorously and the evidence suggests a lot more variability in the inter-fly period for the egg-laying rhythm.

      As stressed in the paper, and in the new Supplementary Material, the individual egg records are very noisy, which in general precludes the extraction of any information about the underlying period and phase. The workaround we (and others, e.g. Howlader et al. 2006) have used is analyzing average egg records for each genotype. Even though this implies assuming the same period and phase for all individuals, we have observed, using experiments with synthetic data, that small variations in individual periods (of the same amount as those present in real experiments where the period of some flies can be assessed individually) still allow us to use our method to decide if the genotype is rhythmic or not. This issue is discussed at length in the new Supplementary Material. There we also discuss an experiment with real flies, showing the individual records, and the corresponding periodograms, for each fly, for a rhythmic (Fig. S14) and an arrhythmic genotype (Fig. S17).

      (6) This variability could also depend on the genotype being tested, as the authors themselves observe between their Canton-S and YW wild-type controls for which their egg-laying profiles show clearly different dynamics. Interestingly, the averaged records for these genotypes are not distinguishable but are reflected in the different proportions of rhythmic flies observed. Unfortunately, the authors also do not provide further data on these averaged profiles, as they did for the wild-type controls in Figure 1, when they discuss their clock circuit manipulations using perRNAi. These profiles could have been included in Supplementary figures, where they would have helped the reader decide for themselves what might have been the reason for the loss of power in the LS periodogram for some of these experimental lines.

      We have added the individual periodograms of the arrhythmic lines to the Supplementary material (Figs. 3S2, 3S5 and panel G of Fig. 3S1), where they can be compared with their respective controls (Figs 3S3, 3S4, 3S6, 3S7 and panel F of Fig. 3S1).

      (7) By selecting 'the best egg layers' for inclusion in the oviposition analyses an inadvertent bias may be introduced and the results of the assays may not be representative of the whole population.

      We agree that the results may be biased for 'the best egg layers'. We remark however, that the flies that have been left out lay very few eggs, some of them even laying no eggs on a whole day. For these flies it is difficult to understand how one can even speak of egg laying rhythmicity (let alone how one can experimentally assess it). Thus, we think it might be misleading to speak of results as "representative of the whole population". Furthermore, it is even possible that the very concept of egg laying rhythmicity makes little sense if flies do not lay enough eggs.

      (8) An approach that measures rhythmicity for groups of individual records rather than separate individual records is vulnerable to outliers in the data, such as the inclusion of a single anomalous individual record. Additionally, the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity. Therefore, the experimental data used to map the clock neurons responsible for oviposition rhythms would be more convincing if presented alongside individual fly statistics, in the same format as used for Figure 1.

      In general, we have checked that there are no "outliers", in the sense of flies that lay many more eggs than the others in the experiment. But maybe the reviewer is referring to the possibility that a few rhythmic flies make the average rhythmic. This issue is addressed in the supplementary material, at the end of section "Example of rhythmicity assessment for a synthetic experiment". In short, we found that eliminating some of the most rhythmic flies from a rhythmic population makes the average a bit less rhythmic, but still significantly so. Conversely, if these flies are transferred to an arrhythmic population, the average is still non rhythmic.

      Regarding "the number of individual records that are included in a group may become a somewhat arbitrary determinant for the observed level of rhythmicity", we stress that we have not performed a selection of flies for the averages. All of the flies tested are included in the average, independently of their individual rhythmicity, provided only that they lay enough eggs.

      (9) The features in the experimental periodogram data in Figures 3B and D are consistent with weakened complex rhythmicity rather than arrhythmicity. The inclusion of more individual records in the groups might have provided the added statistical power to demonstrate this. Graphs similar to those in 1G and 1I, might have better illustrated qualitative and quantitative aspects of the oviposition rhythms upon per knockdown via MB122B and Mai179; Pdf-Gal80.

      We are aware that in the studies of the rhythmicity of locomotor activity the presence of two significant peaks is usually interpreted as a “complex rhythm”, i.e. as evidence of the existence of two different mechanisms producing two different rhythms in the same individual. In our case, since the periodograms we show assess the rhythmicity of the average time series of several individuals, the two non-significant peaks could also correspond to the periods of two different subpopulations of individuals. However, a close examination of the individual periodograms, now provided as Supplementary Figures 3S2 to 3S9, does not show any convincing evidence of any of these two possibilities.

      Another possibility could be that such peaks are simply an artifact of the method in the analysis of time series that consist of very few cycles and also few points per cycle. In the supplemenatry material we show that this can indeed happen. Consider, for example, periodograms 2 and 4 in Fig. S12 of the SM. Even though both of them display two non significant peaks, these periodograms correspond to two synthetic time series that are completely arrhythmic.

      We have added to the manuscript a paragraph discussing the issue of possible bimodality (next to last paragraph in subsection "The molecular clock in Cry+ LNd neurons is necessary for rhythmic egg-laying").

      Wider context:

      The study of the neural basis of oviposition rhythms in Drosophila melanogaster can serve as a model for the analogous mechanisms in other animals. In particular, research in this area can have wider implications for the management of insects with societal impact such as pests, disease vectors, and pollinators. One key aspect of D. melanogaster oviposition that is not addressed here is its strong social modulation (see Bailly et al.. Curr Biol 33:2865-2877.e4. doi:10.1016/j.cub.2023.05.074). It is plausible that most natural oviposition events do not involve isolated individuals, but rather groups of flies. As oviposition is encouraged by aggregation pheromones (e.g., Dumenil et al., J Chem Ecol 2016 https://link.springer.com/article/10.1007/s10886-016-0681-3) its propensity changes upon the pre-conditioning of the oviposition substrates, which is a complication in assays of oviposition rhythms that periodically move the flies to fresh substrate.

      We agree that social modulation can be important for oviposition, as has been shown in the paper cited by the reviewer. But we think that, in order to understand the contribution of social modulation to oviposition, it is important to know, as a reference for comparisons, what the flies do when they are isolated. Our aim in this work has been to provide such a reference.

      Recommendations for the authors:

      (1) The weaknesses identified in the Public review could be addressed as follows: etc.

      We have followed the suggestions of the editor and addressed each of the weaknesses mentioned (see details above).

      (2) Could the authors comment on their choice of using individual flies for their assay rather than (small) groups of flies? Is it possible that their assay would produce less noisy results with the latter?

      First we want to emphasize that our aim here was to assess the presence of individual rhythmicity, free from any external influences, whether arising from environmental external cues (such as light or temperature changes) or by social interactions (with other females or males). However, we were also curious about the behavior when males were put in the same chamber with each female. We performed a few tests and the results were very similar to what we obtained with single females.

      (3) Minor points:

      (a) Line 57-58 - "around 24 h and a peak near night onset (Manjunatha et al., 2008). Egglaying rhythmicity is temperature-compensated and remains invariant despite the nutritional state": Rephrase to something simpler like temperature and nutrition compensated.

      Corrected.

      (b) Line 56-57 - "The circadian nature of this behavior was revealed by its persistence under DD with a period around 24 h and a peak near night onset (Manjunatha et al., 2008)." A better reference here would be to Sheeba et al, 2001 for preliminary investigations into the egg-laying rhythms of individual flies and McCabe and Birley, 1998 for groups of flies under LD12:12 and DD.

      Suggestion accepted.

      (c) Line 65-67 - "We determined..... molecular clock in the entire clock network reduced the LNv did not." This suggests that it was unknown until now that LNv does not have a role, whereas Howlader et al 2006 already suggested that. The reader becomes aware of this at a later part of the manuscript. Please revise.

      This has been revised, and the citation to Howlader et al 2006 added to the new sentence.

      (d) Line 67 - "impairing the molecular clock in the entire clock network reduced the circadian rhythm of.."; saying "Reduced the power of the circadian rhythm" might be better phrasing."

      Suggestion accepted.

      (e) Line 72 - using the Janelia hemibrain dataset.

      Corrected

      (f) Line 72 typo "ussing", should be 'using'.

      Corrected.

      (g) Line 94: why is the periodic signal the same for all on the first day of DD?

      It is well known that in LD conditions activity is driven by the environmental light-dark cycle, which entrains the endogenous circadian clock of all flies. Even after the transition to DD, the effects of this entrainment persist for a few days, allowing the individual rhythmic patterns set by the light-dark cycle to remain synchronized for at least a few cycles. We are assuming that the same happens with oviposition. A sentence has been added explaining this (beginning of third paragraph of subsection "Egg-laying is rhythmic when registered with a semiautomated egg collection device").

      (h) Figure 1A-D, Were all flies included or only rhythmic flies? Please make this clear. How do you distinguish rhythmic and arrhythmic flies in Figure 1E? Their representative individual plots of egg number graphs are required. Why was the number of flies under DD decreased from 20 to 18?

      Throughout the paper, the analysis of average rhythmicity has been performed including all flies, since we postulate that even flies that individually can be classified as non rhythmic have a rhythm that is corrupted by noise, and that this noise can be partially subtracted by performing an average. The explanation of the characterization of rhythmic and arrhythmic individuals is in the Methods section, under the Data Analysis subsection. This is now fully developed in the Supplementary material, where the individual plots for some of the genotypes are included.

      Regarding the question of the number of flies having "decreased from 20 to 18?", there is a misunderstanding here. The results depicted in Figure 1, and in particular in panel E, correspond to two different experiments: one performed only in LD (7 days, n=20), and a second one performed for 5 days in DD, with one previous day in LD (n=18).

      (i) Figure E and K, Are n=20, 18, and n=30, 22 the total numbers of flies including both rhythmic and nonrhythmic? If so, it would be better to put them in the column, not in the rhythmic column.

      The figure has been corrected.

      (j) Line 107-108, please provide a citation for this statement.

      We have added two references: Shindey et al. 2016, and Deppisch et al. 2022.

      (k) Figure 1, 2, etc., please write a peak value inside the periodogram graph. This makes comparison easier.

      The peak values have been added in all Figures.

      (l) Line 184-185, Figure 2F, tau appears shorter in Clk4.1>perRNAi flies than in control, which suggests that DNp1 may play a role?

      As explained in the Supplementary Material, the particularities of oviposition records (discrete values, noise, few samples per period, etc.) preclude an accurate determination of the period if the record is considered as rhythmic. In particular, Fig. S4 shows that differences of 1 hour between the real and the estimated periods are not unusual.

      (m) Figure 4. Why are 2 controls shown? Please explain. Are they the same strains?

      The two controls shown are the UAS control and the GAL4 control. This information has now been added to the figure.

      (n) Line 314 'that' should be 'than'?

      Corrected.

      (o) Line 73-74 - Phrasing is not clear in: "LNds and oviposition neurons, consisting with, the essential role of LNds neurons in the control of this behavior.""

      Corrected.

      (p) Line 81-84 - "the experiments particularly demanding and labor-intensive. In this approach, eggs are typically collected every 4 hours (sometimes also every 2 hours), which usually implies transferring the fly to a new vial or extracting the food with the eggs and replacing it with fresh food in the same vial (McCabe and Birley, 1998; Menon et al., 2014)." McCabe and Birley had an automated egg collection device designed for groups of flies, which sampled eggs laid every hour for 6 days. Please remove this reference in this context

      Reference removed.

      (q) Line 91-92 - "The assessment of oviposition rhythmicity is challenging because the decision of laying an egg relies on many different internal and external factors making this behavior very noisy." This sentence makes it appear that 'assessment' is the limitation. Even locomotor activity is governed by many internal and external factors, yet we can obtain very robust rhythms. The sentence that follows is also not easy to digest. Can the authors frame the idea better?

      We have rewritten the corresponding paragraph in order to make it more clear (second paragraph of the Results section). Additionally, the Supplementary Material contains now a more detailed explanation and analysis of the method used.

      (r) Line 104-107 - rhythmic (with a period close to 24 h, Figure 1F) although the average egg record is strongly rhythmic with a period around 24 h (Figure 1B). Under DD condition, individual rhythmicity percentages are the same as in LD (Figure 1E) and their average record is also very rhythmic with a period of 24 h (Figure 1D). 'Strongly rhythmic' and 'very rhythmic' are less indicative of what is happening with the oviposition rhythm and can be phrased as robust instead, with a focus on their power measured.

      We have accepted the suggestion.

      (s) Line 108-110 - "Thus, egg-laying displays a much larger variability than locomotor activity, compounding the difficulty of observing the influence of the circadian clock on this behavior." The section discussed here does not illustrate the variability in egg-laying as much as the lack of robustness of the rhythm. The variation in rhythmicity going from CS flies (~70% rhythmic) to yw flies (~50% rhythmic) showcases the variability in this rhythm and how it is difficult to observe when compared to locomotor rhythms, which are usually consistently >90% rhythmic across multiple genotypes. These lines can be placed after the discussion about yw and perS flies. Moreover, previous studies using individual flies have reported that egg-laying rhythm is more variable than others Figure 1, Sheeba et al 2001.

      We have accepted the suggestion, replacing "Thus, egg-laying displays a much larger variability than locomotor activity..." by "This shows that, at the individual level, egg-laying is much less robust than locomotor activity ..."

      (t) Figure 1. Genotype notation within the figure panels is not consistent with the accepted / conventional notation or with the main text or legend notations throughout the manuscript.

      We are sorry for this mistake. We have corrected the genotype names in Figures and text in order to make notation consistent across the paper.

      (u) Supplementary Figure 1 Legend. Error in upper right corner? Not left corner? The photo does not clearly show the apparatus. The authors may wish to consider clearer images and more details about the apparatus including details of the 3D printing of the device and perhaps even include a short video where the motor moves the flies to a new chamber (This is only a suggestion to advertise the apparatus, not related to the review of the manuscript). They could also provide information about what fraction of females survived till the end of each trial when 21 flies were examined with 4-hour sampling across 4-5 cycles.

      In general, more than 80% of the females are alive at the end of a one week oviposition experiment. We have added this information in the Methods section at the end of the corresponding subsection ("Automated egg collection device"). Regarding the eggcollection device, we have replaced the photographs in what is now Supplementary Figure 1S1, and a short supplementary movie showing its operation.

      (v) The results depicted in Figure 2B are that of averaged time series. Hence the reader does not know 'the fact' that knocked-down animals are not completely rhythmic. Is the "not completely arrhythmic" in reference to flies with a power > 0.2 (weakly rhythmic) in their egg-laying rhythm or to the presence of ~40% of male flies (Supplementary Table 1) with a locomotor rhythm after perRNAi silencing of most of their clock neurons? This is confusing because no intermediate category of flies is discussed in Figure 2. Please edit for clarity.

      We were referring to the rhythmicity of the genotype, not of the individuals. We have rewritten the corresponding paragraph in order to make it clearer (last paragraph of the first subsection of the Results section).

      (w) Line 173 - ablation or electrically silencing all PDF+ neurons (Howlader et al., 2006). There were no experiments carried out using electrical silencing of PDF+ neurons in the referenced paper.

      We are sorry for this mistake. This has been corrected (we have deleted the mention to electrical silencing).

      (x) Line 173 - Shortening of period by nearly 3 hours cannot be considered minor.

      We agree, and we have deleted the word "minor".

      (y) Line 332-333 - "We also disrupted the molecular clock (or electrically silenced) in PDFexpressing neurons as well as in the DN1p group with no apparent effect on egg-laying rhythms". There was period shortening observed for pdf GAL4 > perRNAi manipulation so there was an effect on the egg-laying rhythm. Additionally, perRNAi based silencing does not electrically silence PDF neurons as the kir 2.1 was expressed only using Clk4.1 GAL4 in the Dn1ps. This line should be rewritten.

      We have rewritten the paragraph mentioned (third paragraph of the Discussion) in order to make it more accurate.

      (4) Page 22 - Data Analysis

      Since the number of eggs laid by a mated female tend to show a downward trend, we proceeded as follows, in order to detrend the data (see the Supplementary Material for further details). First, a moving average of the data is performed, with a 6 point window, and a new time series T is obtained. In principle, T is a good approximation to the trend of the data. Then, a new, detrended, time series D is generated by pointwise dividing the two series (i.e. D(i)=E(i)/T(i), where i indexes the points of each series)." Can the authors provide a reference for this method of detrending? Smoothing can frequently introduce artifacts in the data and give incorrect period estimates. Additionally, the trend visible in the data, especially in Figure 1, suggests a linear decay that can be easily subtracted. Also, there is no discussion of detrending in the Supplementary material attached.

      We are sorry for the confusion with the Supplementary materials. The method used for subtracting both noise and trend from the data is now fully explained in the new Supplementary Material. All the issues raised by the reviewer in this comment have been addressed there.

      (5) Figure by figure

      Page - Type (Figure or text) - Comment

      (a) Page 6 Figure 1C There is remarkable phase coherence seen in the average egg laying time series for CS flies 5 days into DD and as the authors note in Lines 94-95 in the text "Under light-dark (LD) conditions, or in the first days of DD, it can be that the periodic signal is the same for all flies". Since this observation is crucial to constructing the figures seen later in the paper, a note should be made about why this rhythm could persist across flies, so deep into DD.

      As mentioned above, we have added a couple of lines explaining why we think that the assumption of a synchronized periodic signal is reasonable, at least during the first cycles (second paragraph of the first subsection of section Results).

      (b) Figure 1 G The effect of period/phase decoherence seems to be showing up here in the average profile for yw flies as they seem to completely dampen out after 2 days in DD and yet have a 24-hour rhythm in the averaged periodogram. The authors should make a note here if the LS periodogram is over-representing the periodicity of the first few days in DD or if comparing the first 3 vs. the last 3 days in DD gives different results.

      The dampening observed in average oviposition records is a product of the dampening of the oviposition records, which is well known phenomenon, probably caused by the depletion of sperm in the female spermatheque. One of the aims of the method used in the paper was to avoid the bias introduced by this dampening, by means of a detrending procedure. This is explained in the Materials an Methods, and now full details are given in the new Supplementary Materials.

      (c) Figure 1E, K Is this data pooled across 2-3 experiments, as discussed in lines 500-01 under 'Statistical Analysis'? Also, what test is being performed to check for differences between proportions here, seeing as there are no error bars to denote error around a mean value and no other viable tests mentioned in Statistical Analysis?

      We are sorry for this omission. For the comparison of proportions we used the 'N-1' Chisquared test. We have added a sentence detailing this at the end of the Statistical analysis section.

      (d) Figure 1 F, L Can the total number of weakly and strongly rhythmic values be indicated in the scatter plot?

      Corrected.

      (e) Figure 1F, L (legend) Is the Chi-squared test being performed on the proportion values of Figure 1(E, K) or for Figure 1(F, L)?"

      The chi-squared test mentioned was used for Fig1 F-L. As explained above, for the comparison of proportions we used 'N-1' Chi-squared test. This has now been added to the legend of the figure

      (f) Page 8 Figure 2B Seeing as individual flies with a LS periodogram power < 0.2 are considered weakly rhythmic in Figure 1 F, L can Clk856 > perRNAi flies on average also be considered weakly rhythmic, as the peak in the periodogram is above 0.3?

      We prefer to use the weakly rhythmic class only for individual flies. Nevertheless, we agree that this periodogram shows that the genotype analyzed is not completely arrhythmic, and that this might be due to some remaining individual rhythmicity. As mentioned above, we have rewritten the last paragraph of the first subsection of section Results in order to discuss this.

      (g) Figure 2D Can the authors comment on why there is a shorter period rhythm when PDF neurons have a dysfunctional clock, whereas previous evidence (Howlader et al., 2004) suggested that these neurons play no role in egg-laying rhythm? They should also refer to McCabe and Birley, 1998 to see if their results (where they observed a shorter period of ~19h with groups of per0 flies), might be of interest in their interpretations.

      We have added a line commenting this in the corresponding subsection ("LNv and DN1 neurons are not necessary for egg-laying rhythmicity") of the Results, as well as a discussion of this in the third paragraph of the Discussion. In a nutshell, even though Howlader et al did not find a shortening when PDF neurons are ablated, they did find it in pdf01 flies.

      (h) Figure 2 F, H As the authors mention in their Discussion on Page 16, lines 340-45, the manipulation of DN1p neurons might abolish the circadian rhythm in oogenesis as reported by Zhang et al, which is why they looked at this circuit driven by Clk4.1 neurons and comment that "The persistence of the rhythm of oviposition implies that it is not based on the availability of eggs but is instead an intrinsic property of the motor program". However, no change in fecundity is reported for either kir2.1 or perRNAi-based manipulations of these neurons, to help the reader understand if egg availability (at the level of egg formation) is playing any role in the downstream (and seemingly independent) act of egg laying. The authors should report if they see any change in total fecundity for either set of flies w.r.t their respective controls. Also, is the reduction in power seen with electrical silencing vs perRNAi expression of any relevance? Does the percentage of rhythmic flies change between these two manipulations?

      In the line mentioned by the reviewer what we meant is that our results show that the rhythm of oviposition does not seem to be based in the rhythmic production of oocytes, which is not necessarily connected with the total number of eggs produced. We have modified the corresponding line in the paper, in order to avoid this misunderstanding. Regarding the "reduction in power" mentioned, it must be stressed that, in general, the height of the peak is correlated with the fraction of rhythmic individuals. The problem is that this fraction is a much more noisy output, and that is the reason why we have chosen to work with periodograms of averages.

      (i) Figure 2 E and G, a loss of rhythmicity could also be due to a decrease in fecundity in the experimental lines. Since the number of eggs laid for each genotype is already known, can the authors show statistically relevant comparisons between the experimental lines and their respective controls? In this vein, can the averaged time series profiles also be provided for all the genotypes tested (as seen previously in Figure 1 A, C, G, I), perhaps in the supplementary?

      We did not focus on fecundity in the present work. However, our observations do not seem to show any definite relationship with rhythmicity. We plan to address the issue of fecundity more systematically in a future work. The averaged time series profiles have now been added to the figure.

      (j) Scatter plots showing the average period and SEM as seen in Figure 1 (F, L) would help in understanding if these manipulations have any effect on variation in the period of the egg-laying rhythm across flies. Particularly for pdf GAL4 > perRNAi flies which have a net shorter period, (but this might vary across the 34 flies tested).

      We have added a Supplementary Figure (2S1) that shows that the shortening of oviposition period can be also observed at the individual level. We have also added a line commenting this in the corresponding subsection ("LNv and DN1 neurons are not necessary for egg-laying rhythmicity") of the Results, as well as a discussion of this in the third paragraph of the Discussion.

      (k) Page 11 Figure 3B Does the presence of two peaks in the LS periodogram at a power > 0.2 indicate the presence of weakly rhythmic flies with both a short(20h) and a long(~27h) period component or either one? The short-period peak is nearly at p < 0.05 level of significance. So then, do most of the flies in MB122B GAL4 > perRNAi line show a weakly rhythmic shorter period?

      (l) Figure 3D A similar peak is observed again at 20h (LS power > 0.2 and nearly at p < 0.05 significance level again) and a different longer one at (~30h) though this one is almost near 0.2 on the power scale. Given the consistency of this feature in both LNd manipulations, the authors should comment on whether this is driven by variation in periods detected or the presence of complex rhythms (splitting or change in period) in the oviposition time series for these lines.

      (m) Figure 3 General scatter plots showing average period {plus minus} SEM could help explain the bimodality seen in the periodograms. Additionally indicating just how many flies are weakly rhythmic vs. strongly rhythmic can also help to illustrate how important the CRY+ LnDs are to the oviposition rhythm's stability.

      For these three comments (k, l and m), we note that the issue of bimodality has been addressed above, in our response to Weakness 9.

      (o) Figure 4B Same as comments under Figure 1, what is the statistical test done to compare the proportions for these three genotypes?

      As mentioned above, for the comparison of proportions we used the 'N-1' Chi-squared test. We have added a sentence detailing this at the end of the Statistical analysis section.

      (p) Figure 4C Are all flies significantly rhythmic? The authors should also provide an averaged LS periodogram measure for each genotype, to help illustrate the difference in power between activity-rest and egg-laying rhythms.

      Yes, the points represent periods of (significantly) rhythmic flies. This has been added to the caption, to avoid misunderstandings. The differences that arise when assessing rhythmicity in activity records vs. egg-laying records is addressed at length in the Supplementary Material (see e.g. Fig S1).

      (q) Page 15 Figure 5 - general As the authors discuss the possible contribution of DN1ps to evening activity and control over oogenesis rhythm, investigating the connections of the few that are characterized in the connectome (or lack thereof) with the Oviposition neurons, can help illustrate the distinct role they play in the female Drosophila's reproductive rhythm.

      This information was in the text and the Supplementary Tables. Lines 273-275 of the old manuscript read: "The full results are displayed in Supplementary Tables 2 and Table 3, but in short, we found that whereas there are no connections between LNv or DN1 neurons and oviposition neurons..."

      (r) Minor: The dark shading of the circles depicting some of the clusters makes it difficult to read. Consider changing the colors or moving the names outside the circles.

      Figure corrected.

      (s) Line 38: The estimated number of clock neurons has been revised recently (https://www.biorxiv.org/content/10.1101/2023.09.11.557222v2.article-info).

      Thank you for the reference. We have corrected the number of clock neurons in the Introduction of the new manuscript.

    1. Reviewer #2 (Public review):

      Summary:

      The authors introduce a generalised HGF featuring (1) volatility coupling (rate of change), value coupling (phasic or autoregressive drift) [and 'noise coupling', which is a volatility parent of an outcome state] (2) parameters: volatility coupling κ, tonic volatility ω, value coupling α, tonic drift ρ, {plus minus}auto-regressive drift λ (3) inputs at irregular intervals (but still discrete time steps, unlike continuous time belief evolution in predictive coding) (4) states with multiple parents or parents with multiple child states (5) value parents by default have a volatility parent, and volatility parents have a value parent (or none) (6) linear or non-linear (including ReLU) functions (7) also beliefs can be any exponential family distribution (incl binary, categorical), hence can also model POMDPs

      They describe the 3 steps involved in updating (for both value and volatility): (1) prediction (2) update posterior (entails passing both pwPE and prediction precision from lower to upper node - the latter is not found in other predictive coding schemes) (3) prediction error NB this makes the network modular, so nodes can be added/removed without recomputing all the update equations.

      They give some examples of models working using simulated data: (1) sharing of parent nodes can generalise an update from one context to another (2) sharing of child nodes enables multisensory cue combination (e.g. auditory-visual, or interoceptive-exteroceptive).

      The authors further discuss a potential shortcoming of the HGF - its discretisation of timesteps - which is less naturalistic but nevertheless makes it very amenable to fitting trial-wise experimental data. They propose to extend the HGF to modelling within-step dynamics in future, which could make testable continuous time neuronal predictions.

      Strengths:

      Overall, I think the paper is excellent - it contributes an important extension to a popular modelling tool which substantially increases the number of potential applications. It is well written, and I have almost no criticisms to make.

      Weaknesses:

      The authors state that this generalised HGF will "make it easy to build large networks with considerable hierarchical depth", comparable to neural network architectures. The examples they give are extremely simple; however, it would be good to see a more complex one.

    1. Reviewer #2 (Public review):

      Summary:

      The manuscript by Foucault, Weber, and Hunt examines human learning behavior across change-point and continuously changing environments. The authors suggest that humans normatively adjust their learning dynamics to the current environmental dynamics. Moreover, they argue that humans not only track the means of the outcome-generating process, but also the variance, which extends recent work in this domain. The present results suggest that human learners are well able to distinguish the two moments and adjust their behavior accordingly.

      Strengths:

      (1) The paper is clearly written, and the figures demonstrate the results well. The authors clearly explain the two key results and their implications for the field.

      (2) The paper uses a common modeling framework for the two environments. This makes it less likely that differences in learning behavior between the two environments are driven by general model properties rather than the specific learning mechanisms.

      Weaknesses:

      (1) Interpretation in terms of normative learning

      (1.1) Perseveration and paddle movement

      The model presented in the main manuscript is equipped with a response-probability mechanism that controls whether the paddle is updated. Especially on smaller prediction errors, the paddle is often not updated (perseveration). I wonder whether this mechanism truly reflects normative updating behavior or rather a heuristic strategy. Not moving the paddle is non-normative. A fully Bayesian model would hardly ever show a learning rate of exactly zero (one could argue only when the error is itself zero or after a massive amount of trials). This is partly apparent in Supplementary Figure 1, where the lowest learning rates are around alpha = 0.2 (change-point environment) and 0.5 (random walk).

      Supplementary Figure 1 shows the learning rate for the normative model without the response-probability mechanism. Primarily in the random-walk environment, but to some extent also in the change-point condition, the shape of the learning rate changes quite dramatically compared to Figure 4. In the random-walk environment, the learning rate appears relatively stable, with a value slightly larger than 0.5. In the change-point case, the learning rate is somewhat higher in the range of smaller prediction errors. Doesn't this speak against the interpretation that the model in the main manuscript is really behaving in a purely normative fashion? The tendency to perseverate might reflect a simplified strategy, which is sometimes described as "satisficing". That is, in line with the authors' description of the mechanism, perseveration occurs when it seems "good enough" (Simon, 1956), which has been demonstrated in a belief updating context before (Bruckner et al., 2025; Gershman, 2020; Nassar et al., 2021).

      Supplementary Figure 3 suggests that humans show quite a lot of this type of behavior. It indicates that in the change-point condition, in only 20% of the trials in the minimal prediction error range, participants update their prediction (i.e., in 80% of these trials, they perseverate on the previous prediction). This update probability increases as a function of the prediction error. In the random-walk condition, update probabilities are higher, starting at around 40% and also increasing as a function of the error.

      Indeed, Supplementary Figure 4 suggests that the shape of the learning rate for true update trials is much shallower for humans and the "perseverative" model compared to the model in Supplementary Figure 1. This suggests that the curve in Figure 4 (main manuscript), hinting at a continuous increase in the learning rate, could be the result of a mixture of perseveration (alpha = 0) and higher learning rates compared to the normative model without the response-probability mechanism.

      (1.2) Control models

      One might reply that the response-probability mechanism just adds noise, while the actual learning mechanism is still normative. However, a standard Rescorla-Wagner model with the same response-probability mechanism might also show increasing apparent learning rates as a function of prediction error (when perseveration trials and regular update trials are averaged as a function of the prediction error).

      Therefore, I suggest adding a control analysis with a Rescorla-Wagner model. One version with the same response mechanism yielding perseveration, and one standard Rescorla-Wagner model without this mechanism. This should help identify how well the present analyses can distinguish true learning-rate dynamics from averaging artifacts due to perseveration.

      (1.3) Discussion of the possibility of non-normative learning mechanisms

      Given the considerations above, I suggest a more balanced discussion of potential non-normative influences on learning, in particular, perseveration. Several previous papers have similarly shown that perseveration prominently characterizes human learning and decision-making (Bruckner et al., 2025; Gershman, 2020; Nassar et al., 2021), and in my opinion, it would be relevant to discuss how normative and non-normative mechanisms might jointly shape learning.

      (2) Model description

      The Bayesian model is quite central to the paper. However, the mathematical details are sparse, and I did not fully understand the differences between the model variants and how they were implemented. In particular, what approximations were used to make the model tractable? And how does the variance inference work? Is the learning rate directly computed, similar to the Nassar model, or is it derived from updates and prediction errors?

      (3) Apparent learning rates in humans

      The main learning-rate analyses compute the fraction of updates and prediction errors. For quality assurance, it would be useful to see a few supplementary histograms of the apparent learning rates. It would be great to have one plot across all participants and a few example plots for single participants. These analyses will reveal the distribution of learning rates and the proportion at the boundaries, which can sometimes be a source of bias.

      References:

      Bruckner, R., Nassar, M. R., Li, S.-C., & Eppinger, B. (2025). Differences in learning across the lifespan emerge via resource-rational computations. Psychological Review, 132(3), 556-580. https://doi.org/10.1037/rev0000526.

      Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394. https://doi.org/10.1016/j.cognition.2020.104394.

      Nassar, M. R., Waltz, J. A., Albrecht, M. A., Gold, J. M., & Frank, M. J. (2021). All or nothing belief updating in patients with schizophrenia reduces precision and flexibility of beliefs. Brain, 144(3), 1013-1029. https://doi.org/10.1093/brain/awaa453.

      Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129-138. https://doi.org/10.1037/h0042769.

    2. Author response:

      We thank the reviewers for their constructive feedback and careful evaluation of our manuscript. We are encouraged that the study was viewed as well designed and clearly presented, that its computational modeling approach was recognized as a strength, and that the key findings were appreciated. We agree that some claims would benefit from additional support and clarification. Below, we outline the main revisions we will undertake to strengthen the manuscript and address the points raised in the reviews. These revisions are intended to strengthen the evidential support for our conclusions and clarify aspects of the results and modeling.

      (1) Statistical support.

      Some claims were judged to lack sufficient statistical support [Reviewer 1]. In the revised manuscript, we will carefully review all inferential claims and ensure that they are supported by appropriate statistical analyses. Where necessary, we will implement additional statistical tests and expand statistical reporting to ensure that differences between conditions, models, or behavioral measures are formally evaluated and that key aspects of the data are appropriately described.

      (2) Modeling clarification.

      Some aspects of the modeling were considered insufficiently clear, particularly regarding how the models were implemented [Reviewers 1 and 2]. We will expand the Methods section to provide a clearer and more complete description of the Bayesian models and their implementation. In particular, we will clarify that full probability distributions were computed (without reduced approximations such as those used in simplified Bayesian variants), and that the only approximation concerns numerical discretization of continuous state spaces at fine resolution. We will clarify that variance is part of the joint multidimensional state space and is inferred jointly with the mean. We will also explicitly state that apparent learning rates are derived from predicted paddle responses in the same way as for participants, and are not directly computed within the Bayesian inference process.

      (3) Model fitting.

      The absence of direct model fitting to individual participants was identified as a limitation [Reviewers 1 and 3]. In response, we will implement individual-level model fitting (to the extent feasible in practice) and conduct formal model comparison based on the fitted models. We will further validate the fitted models by examining whether they reproduce the main behavioral signatures observed in the data.

      (4) Normative interpretation and control analyses.

      The interpretation of the models as normative was questioned in light of the response-probability mechanism [Reviewer 2]. In the revision, we will clarify the distinction between the normative inference component of the model and the response-level mechanism. We will revise the framing of the results accordingly and ensure that normative claims are restricted to the inference component. We will also expand the discussion to integrate relevant literature on perseveration and satisficing, and clarify how normative and non-normative mechanisms may jointly shape behavior. In addition, following the reviewer’s suggestion, we will include control analyses using standard Rescorla–Wagner models, with and without the response-probability mechanism, to evaluate whether the observed signatures can be accounted for by simpler learning rules.

      (5) Additional points.

      We will also address the additional points raised in the reviews. Specifically, we will include supplementary histograms of apparent learning rates [Reviewer 2]. We will provide additional clarification and analyses regarding the effects of stochasticity on learning [Reviewer 1]. Finally, we will explore hybrid or mixture models and strategies and expand the discussion of this possibility [Reviewer 3].

      We believe that these revisions will substantially strengthen the support for our claims and address the concerns raised in the current assessment. We are grateful for the reviewers’ engagement with our work and for their comments, which will allow us to significantly improve the clarity and strength of the manuscript.

    1. Reviewer #2 (Public review):

      Summary:

      This manuscript addresses a clear and widely relevant question: how ongoing fluctuations in alertness during wakefulness relate to large-scale patterns of coordinated brain activity. The authors combine high-field magnetic resonance imaging with simultaneous pupil measurements, and they compute an edgewise measure of arousal-related coupling for every pair of regions. Their main contribution is to show that arousal-related coupling is low-dimensional and organized into seven reproducible "connectivity communities", each with characteristic network pair compositions. A secondary contribution is the observation that these communities exhibit systematic but community-specific hemispheric asymmetries, including a striking left/right dissociation within the ventral attention network, where the left side participates broadly across communities while the right side forms a more cohesive, segregated arousal-responsive module. A final contribution is cross-context generalization: the same organizational structure and lateralization signatures are largely preserved during naturalistic movie watching.

      Strengths:

      (1) The paper moves beyond state contrasts and quantifies arousal-related modulation continuously within wakefulness, directly addressing a gap highlighted in the Introduction.

      (2) The hemispheric asymmetry result is not framed as a crude global dominance effect; the authors explicitly test and argue that the key signal lies in structured spatial heterogeneity rather than mean shifts.

      (3) The cross paradigm replication in movie watching is a strong design choice and supports the claim that the organizational motifs are not limited to unconstrained rest.

      Weaknesses:

      (1) Arousal effects on BOLD signals and on pupil size can have different delays, so it would be valuable to test lagged relationships (for example, shifting the pupil series forward and backward) to show that the main community structure and lateralization results are not sensitive to an arbitrary temporal alignment.

      (2) Pupil diameter covaries with blinks, eye closure, and other factors that can covary with head motion and physiological noise. The Methods include substantial quality control and denoising, including motion regression and scrubbing, plus exclusions for eye closure.

      (3) The dataset is described in terms of runs retained (for example, 485 resting runs), and runs are treated as observations in clustering after z-scoring across runs. If multiple runs come from the same individuals, the manuscript would benefit from explicitly showing that results replicate at the participant level (for example, community structure stability within participant across runs, and participant-level summary statistics used for inference), rather than relying primarily on pooled run-level patterns.

      (4) Time-resolved connectivity is estimated using a 30-second sliding window and 5 second step. It is reasonable to wonder whether the same conclusions hold with alternative estimators that do not rely on fixed windows. The Discussion acknowledges this limitation, but adding a small robustness analysis would make the paper more definitive.

    1. Reviewer #3 (Public review):

      This manuscript presents Neuroplex, a technically rigorous and carefully validated pipeline that links miniscope calcium imaging in freely behaving animals with high-dimensional fluorophore-based cell-type identification using in vivo multiplexed spectral confocal imaging through the same implanted GRIN lens. The work overcomes a major practical limitation of head-mounted microscopy by enabling the identification of up to nine projection-defined neuronal populations within the same animal, without post-fixation histology. The approach is well motivated and supported by extensive calibration and simulation. While the biological results are primarily illustrative, the methodological contribution is clear and likely to be broadly useful.

      Major comments

      (1) The approach relies on the assumption that fluorophore identity assigned during anesthetized confocal imaging accurately reflects the identity of neurons recorded during prior behavioural sessions. While the use of the same GRIN lens and in vivo co-registration mitigates many concerns, the manuscript would benefit from a more explicit discussion, or empirical demonstration, if available, of the stability of fluorophore assignments across time. Even limited repeat spectral imaging in a subset of animals would strengthen confidence in longitudinal applicability.

      (2) Fluorophore identity is determined using thresholding of linear unmixing coefficients relative to an empirically defined baseline, followed by a second adaptive pass for over-represented fluorophores. While this heuristic is extensively validated via simulations, it remains ad hoc from a statistical perspective. The authors should more explicitly justify this choice and discuss its limitations relative to probabilistic or likelihood-based classifiers, particularly with respect to uncertainty estimation at the single-ROI level.

      (3) Identifiability of fluorophores is demonstrated empirically, but the manuscript does not explicitly quantify spectral separability (e.g., similarity metrics between basis spectra or conditioning of the unmixing matrix). A brief analysis of spectral independence or sensitivity of beta estimates to noise would provide mathematical reassurance, especially given the reliance on linear regression in a high-dimensional feature space.

      (4) The spectral unmixing treats CNMF-derived ROIs as fixed supports. I wonder whether ROI boundaries, neuropil contamination, and partial overlap can introduce structured uncertainty that could bias spectral estimates. If so, the authors should acknowledge this dependency more explicitly and discuss how ROI quality or overlap might influence false negatives or false positives, particularly in densely labelled regions.

      (5) The manuscript reports meaningful rates of secondary fluorophore detection, but also nontrivial false-positive rates for secondary labels under realistic conditions. The authors appropriately caution against over-interpretation, but the Discussion should more clearly delineate when dual-label assignments are likely to be biologically interpretable versus methodologically ambiguous, and how experimental design (e.g., fluorophore pairing) should be optimized accordingly.

      (6) I suspect that Neuroplex will be most effective in certain regimes (moderate convergence, bright and spectrally distinct fluorophores) and less reliable in others. A more explicit discussion of best practices, anticipated failure modes, and experimental scenarios where the method may be inappropriate would increase the practical value of the paper for adopters.

    1. Reviewer #2 (Public review):

      This manuscript investigates how people's perceptual reports are influenced by events and trials in the past, and how this long-range dependence relates to broader learning across locations in a visual learning task. The authors present clear and internally consistent analyses showing that extended temporal integration is associated with greater generalization of learning. The study is thought-provoking and may contribute meaningfully to understanding how short-term influences and long-term improvement interact, although several interpretational points would benefit from clarification.

      Strengths:

      (1) The manuscript identifies unusually long-range perceptual biases extending up to ten trials back, which is a striking and potentially important finding.

      (2) The association between strong long-range dependence and greater learning generalization is clearly documented and supported by consistent analyses.

      (3) The dataset is large and rich, and the authors apply repeated and well-controlled analyses that give confidence in the stability of the effects.

      (4) The writing is generally clear, and the manuscript raises interesting conceptual links between temporal integration and generalization of learning.

      Weaknesses / Points Requiring Clarification:

      (1) The manuscript repeatedly equates generalization with increased efficiency, but this relationship is not universally true. In some populations or tasks, excessive generalization can reduce task-specific efficiency. The authors should discuss this context-dependence to clarify when generalization is beneficial versus detrimental.

      (2) Serial dependence is also present, though smaller, in the central fixation task. It remains unclear whether this bias could contribute to the serial dependence observed in the main task. The authors should clarify whether the two biases are independent or whether the central-task bias might partially influence orientation judgments in the main task.

      (3) Several figure captions and labels contain minor inconsistencies in formatting and terminology. Careful proofreading would improve clarity.

    1. Reviewer #1 (Public review):

      Summary:

      The authors attempt to use a combination of behavioural and EEG analyses in order to investigate whether expectation of task difficulty influences spatial focus narrowing in the context of a spatially cued task, alongside an expected attention-related amplitude effect. This distinguishes the experiment from previous tasks, which looked at this potential spatial narrowing in the context of more non-cued diffuse attention tasks. The authors present two major findings:

      (1) Behaviourally, they analysed the effects of cue validity and difficulty expectation on response accuracy, and found that participants displayed an effect of difficulty expectation in validly cued trials, showing relatively enhanced behaviour to Hard Expectation trials, but no effect of expectation in invalidly cued trials.

      (2) Inverted encoding modelling on broadband EEG showed greater pre-target attentional processing in the Hard Expectation blocks. They go on to show that this enhancement comes in the form of greater amplitude of the Channel Tuning Functions (CTFs) approximately 300 to 400ms post-cue, in the absence of any spatial tuning specificity enhancement (as would be evident in a difference in CTF fit width).

      Together, these results provide valuable findings for those investigating the separable effects of expectation and attention on target detection in visual search.

      Strengths:

      (1) This is a very solidly performed experiment and analysis, with different streams of evidence convincingly pointing in the same direction, i.e. a gain effect of Expectation in the absence of a spatial tuning effect.

      (2) EEG is competently analysed and interpreted, and the paper is well written and simple in its motivation.

      (3) The authors report appropriately on the results in the Discussion, without overreaching.

      Weaknesses:

      I mainly have a few minor issues for the authors to clarify, which I will leave to Recommendations. However, a few analyses need further work:

      (1) The GLMM method used has very large degrees of freedom (pages 6 and 7) of 34542. I assume this is the number of trials minus the number of parameters? This would imply that random slopes were not modelled in the analyses. However, looking at the Methods, it is reported that they were modelled. The authors should clarify exactly what was done here and why, including the LMM model.

      (2) Figure 4 shows an "example CTF fit". Why only one? You could put transparent lines in the background for each individual fit, followed by the grand average, or show each fit in the supplementary section?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work presents a GUI with SEM images of 8 Utah arrays (8 of which were explanted, and 4 of which were used for creating cortical lesions).

      Strengths:

      Visual comparison of electrode tips with SEM images, showing that electrolytic lesioning did not appear to cause extra damage to electrodes.

      Weaknesses:

      Given that the analysis was conducted on explanted arrays, and no functional or behavioural in vivo data or histological data are provided, any damage to the arrays may have occurred after explantation. This makes the results limited and inconclusive (firstly, that there was no significant relationship between degree of electrode damage and use of electrolytic lesioning, and secondly, that electrodes closer to the edge of the arrays showed more damage than those in the center).

      We agree insofar as we could not fully control the circumstances of each array during explantation. However, array explantation is potentially damaging, but not universally damaging, as demonstrated by some largely intact arrays in this paper. If electrolytic lesions were damaging to the array, they would be observed. All arrays examined in this paper were carefully stored as described in the paper. All analyses of this type require an explant surgery [?????]. Our conclusions remain as strong as any of the results of these analyses.

      Overall, these results do not add new insight to the field, although they do add more data and reference images.

      We respectfully disagree, as there is no extant SEM analysis on electrode arrays used for lesioning.

      Reviewer #2 (Public review):

      In this study, the authors used scanning electron microscopy (SEM) to image and analyze eleven Utah multielectrode arrays (including eight chronically implanted in four macaques). Four of the eight arrays had previously been used to deliver electrolytic lesions. Each intact electrode was scored in five damage categories. They found that damage disproportionately occurred to the outer edges of arrays. Importantly, the authors conclude that their electrolytic Lesioning protocol does not significantly increase material degradation compared to normal chronic use without lesion. Additionally, the authors have released a substantial public dataset of single-electrode SEM images of explanted Utah arrays. The paper is well-written and addresses an important stability issue for long-term chronically implanted array recordings and electrolytic lesioning, which is relevant to both basic science and translational research. By comparing lesioning and non-lesioning electrodes on the same array and within the same animal, the study effectively controls for confounds related to the animal and surgical procedures. The shared dataset, accessible via interactive plots, enhances transparency and serves as a valuable reference for future investigations. Below, we outline some major and minor concerns that could help improve the work.

      Major concerns:

      (1) Electrode impedance is a critical measurement to evaluate the performance of recording electrodes. It would be helpful if the authors could provide pre-explant and post-explant impedance values for each electrode alongside the five SEM damage scores. This would allow the readers to assess how well the morphological scores align with functional degradation.

      We agree, electrode impedance is very important in determining electrode performance. However, due to the multi-year, multi-subject nature of this work, we unfortunately do not have this data.

      (2) The lesion parameters differ across experiments and electrodes. It would be helpful if the authors could evaluate whether damage scores (and/or impedance changes) correlate with total charge, current amplitude, duration, or frequency.

      Thank you for this recommendation. We have included additional analyses in Supplementary Materials.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) ‘Both in vitro and in vivo testing of electrode arrays revealed environmental damage to these materials, such as cracking, textural defects, and degradation in response to the brain’s temperature and salinity [32]. The immune response of the brain also damages the electrodes due to effects like glial scarring (gliosis) and inflammation [33, 34]. This damage may be exacerbated by the surgical techniques used during implantation, which include pushing the electrode array into cortex and tethering the implant to the skull [33, 35, 36].’

      In the above text, several relevant references have been left out, e.g.:

      Barrese et al., 2013

      Patel et al., 2023

      Woeppel et al, 2021

      Chen et al., 2023

      Bjanes et al., 2025

      Thank you for this recommendation. This section has been updated.

      (2) ‘Aggressive electrical stimulation is known to dissolve platinum-based electrodes [37, 38]. Other studies have shown iridium oxide to be more resistant to stimulation-related damage, but not completely insusceptible [39, 40].’ Reference number 25 is relevant here.

      Thank you for this recommendation. This section has been updated.

      (3) ‘F’s and C’s PMd arrays were used for electrolytic lesioning experiments Monkey U was implanted with three 96-channel arrays; two in M1 and one in PMd.’ There seems to be a punctuation mark missing.

      Thank you for this recommendation. This section has been updated.

      (4) Methods: How much charge was injected via the electrodes that were used for lesioning? What current amplitudes, voltages, durations, and number of pulses were used? If more than 1 pulse was applied, what were the frequencies? Was the pulse cathode-only/ anode/only? What were the electrode impedance values at the time of stimulation? How many electrodes were used for lesioning at any given moment? How long after lesioning did the arrays remain in the tissue?

      Thank you for your questions. An additional supplemental table (Supplemental Table 6) detailing specific NHP lesions parameters has been added. A summary of the lesion procedure (DC, bipolar, two electrodes at a time) has also been included in Methods. All arrays remained in the subject until explant, which ranged between hours (same-day lesion and explant) to several years. Further details on the lesioning procedure are available in citation [?]. Explant dates are available in Supplemental Table 1. Unfortunately, we do not have the impedance values at time of lesioning as this is not a measure we record frequently after implant, though we agree the data would be useful to have.

      (5) Caption for Figure 1: ‘All array images are displayed with the wire bundle to the right side.’ I recommend adding this text from Figure 2 to the caption of Figure 1: ’electrode tips facing viewer’.

      Thank you for this recommendation. This section has been updated.

      (6) ‘Electrodes used for electrolytic lesioning are denoted with blue dots.’ Was stimulation carried out across all these electrodes simultaneously?

      No, stimulation was not carried out across all electrode simultaneously. Pairs of electrodes were stimulated at the same time to create lesions. Lesions were performed on different days. We have updated our methods section to reflect this. See the Methods section and citation [?] for more details.

      (7) For the control array, in Figure 1: ‘Click each column to view a close-up of the 5th row (from top to bottom) of electrodes:’ . It would be clearer to state: ’Click each column to view a close-up of a single electrode in the 5th row (from top to bottom):’.

      Thank you for this recommendation. This section has been updated.

      (8) Figure 2 caption: ‘Blank electrodes and electrodes with shank fractures are ignored and displayed in black, as they are not scored.’. What is a ‘blank’ electrode?

      A ‘blank’ electrode is an electrode on the array that physically exists but is not wire bonded at time of manufacture to produce recordings. The corner electrodes of the Utah array are all blank electrodes. We have updated this wording to ‘unwired’ for clarity.

      (9) I recommend incorporating Supplementary Figure 1 into Figure 2, so that the reader can immediately see where the rings are, without referring to the Supplementary Materials.

      Thank you for this recommendation. We have chosen to keep these figures separate for stylistic reasons.

      (10) Supplementary Figures: The figures should have the word ’Supplementary’ in the title, i.e., ‘Supplementary Figure X,’ not just ‘Figure X.’

      Thank you for this recommendation. These captions have been updated.

      (11) Throughout the results, the text is overly focused on the type of statistical test used and the p-values, e.g.: ‘When comparing lesioning and non-lesioning electrodes within the same array, each of the two nonparametric statistical tests (Mann-Whitney U-test, Levene Test) returned insignificant p-values for each category of damage as well as for total damage scores for all four arrays used in lesioning experiments.’.

      To make the findings more digestible for the reader, the text should be rephrased in terms of whether the metrics being compared were significantly different or not. E.g.: ‘For each category of damage, as well as for the total damage score, no significant difference was found between electrodes that were or were not used for lesioning (either the mean or the variance of the scores).’.

      Thank you for this recommendation. We have rephrased the text to reflect this note.

      (12) ‘In Monkey H, the Mann-Whitney U test resulted in an insignificant p-value for coating cracks and parylene C delamination scores, while the Levene test resulted in an insignificant p-value for abnormal debris, coating cracks, and parylene C cracking scores. In Monkey F, the Mann-Whitney U test resulted in an insignificant p-value for parylene C delamination scores, while the Levene test resulted in an insignificant p-value for coating cracks, parylene C delamination, and parylene C cracking scores. In Monkey U, the Mann-Whitney U test resulted in significant p-values for all scores, while the Levene test resulted in an insignificant p-value for abnormal debris, tip breakage, and coating cracks scores. Finally, in Monkey C, the Mann-Whitney U test resulted in an insignificant p-value for parylene C delamination and parylene C cracking scores, while the Levene test resulted in an insignificant p-value for abnormal debris, parylene C delamination, and parylene C cracking scores.’

      To point out another example, this chunk of text is highly repetitive and is unnecessary, as the reader can simply refer to Supplementary Table 4. It should be completely rephrased and summarized, to deliver the key message, i.e. briefly describe what kinds of damage occurred for which arrays. Also, what is the point of the two statistical tests? What are the authors trying to conclude?

      Thank you for this recommendation. We have rephrased and pared down the text to reflect this note.

      (13) Discussion: ‘Similarly, other work did not show significant differences in SEM-visible degradation between both platinum and iridium oxide coated electrodes used for stimulation [24, 25].’ What differences are being referred to here? Differences in degradation between stimulated Pt versus stimulated IrOx electrodes? Or between stimulated Pt and unstimulated PT electrodes? Stimulated IrOx and unstimulated IrOx? Or something else?

      Thank you for your questions. We are comparing platinum against iridium oxide in this sentence. The wording of our original text has been updated to clarify our intention.

      (14) Supplementary Tables: P-values lower than .05, .01, and .001 should simply be replaced with ¡.05, ¡.01, and ¡.001. The alpha value after a Bonferroni correction should be stated somewhere in each table or table caption.

      Thank you for this recommendation. We have edited the tables to reflect this note.

      (15) Title: ‘Material Damage to Multielectrode Arrays after Electrolytic Lesioning is in the Noise’ I don’t understand what the title means. What is in the noise? And what is ‘the noise’?

      “In the noise” is a colloquialism referring to how background information (“noise”) may obscure or distract from other features. This title conveys how material damage to multielectrode arrays due to electrolytic lesioning is largely obscured by the general damage observed on multielectrode arrays after implant and explant.

      (16) This reference has been left out altogether: Chen et al., 2014. The effect of chronic intracortical microstimulation on the electrode-tissue interface.

      Thank you, this reference is now included.

      Reviewer #2 (Recommendations for the authors):

      (1) The number of lesion electrodes is low, especially since there are only 2-10 lesion electrodes on three of the four arrays, yielding limited statistical power.

      We agree that the low number of lesioned electrodes limits statistical power. However, due to ethical considerations, it is unlikely for arrays to contain much more than this number of lesion electrodes.

      (2) The dataset includes both platinum and iridium oxide-coated electrodes. A direct comparison of their damage profiles would be informative.

      Thank you for this recommendation. We have included this additional analysis in Supplementary Materials.

      (3) It is unclear what “is in the Noise” in the title means without reading the manuscript. It is helpful to improve the clarity of the title.

      Thank you for this recommendation.

      (4) Please spell out “PMd” and “M1” at first mention to facilitate reading.

      Thank you for this note. The text has been updated to reflect this recommendation.

    1. Reviewer #2 (Public review):

      Summary:

      The authors investigate single-neuron activity in rhesus macaques during model-based (MB) and model-free (MF) reinforcement learning (RL). Using a well-established two-step choice task, they analyze neural correlates of MB and MF learning across four brain regions: the anterior cingulate cortex (ACC), dorsolateral PFC (DLPFC), caudate, and putamen. The study provides strong evidence that these regions encode distinct RL-related signals, with ACC playing a dominant role in MB learning and caudate updating value representations after rare transitions. The authors apply rigorous statistical analyses to characterize neural encoding at both population and single-neuron levels.

      Strengths:

      (1) The research fills a gap in the literature, which has been limited in directly dissociating MB vs. MF learning at the single unit level and across brain areas known to be involved in reinforcement learning. This study advances our understanding of how different brain regions are involved in RL computations.

      (2) The study used a two-step choice task Miranda et al., (2020), which was previously established for distinguishing MB and MF reinforcement learning strategies.

      (3) The use of multiple brain regions (ACC, DLPFC, caudate, and putamen) in the study enabled comparisons across cortical and subcortical structures.

      (4) The study used multiple GLMs, population-level encoding analyses, and decoding approaches. With each analysis, they conducted the appropriate controls for multiple comparisons and described their methods clearly.

      (5) They implemented control regressors to account for neural drift and temporal autocorrelation.

      (6) The authors showed evidence for three main findings:

      (a) ACC as the strongest encoder of MB variables from the four areas, which emphasizes its role in tracking transition structures and reward-based learning. The ACC also showed sustained representation of feedback that went into the next trial.

      (b) ACC was the only area to represent both MB and MF value representations.

      (c) The caudate selectively updates value representations when rare transitions occur, supporting its role in MB updating.

      (7) The findings support the idea that MB and MF reinforcement learning operate in parallel rather than strictly competing.

      (8) The paper also discusses how MB computations could be an extension of sophisticated MF strategies.

      Weaknesses:

      (1) There is limited evidence for a causal relationship between neural activity and behavior. The authors cite previous lesion studies, but causality between neural encoding in ACC, caudate, and putamen and behavioral reliance on MB or MF learning is not established.

      (2) There is a heavy emphasis on ACC versus other areas, but is unclear how much of this signal drives behavior relative to the caudate.

      (3) The authors mention the monkeys were overtrained before recording, which might have led to a bias in MB versus MF strategy.

      (4) The authors have responded to the weaknesses appropriately in the manuscript.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using single-unit recording in 4 regions of non-human primate brains, the authors tested whether these regions encode computational variables related to model-based and model-free reinforcement learning strategies. While some of the variables seem to be encoded by all regions, there is clear evidence for stronger encoding of model-based information in the anterior cingulate cortex and caudate.

      Strengths:

      The analyses are thorough, the writing is clear, and the work is well-motivated by prior theory and empirical studies.

      Weaknesses:

      My comments here are quite minor.

      The correlation between transition and reward coefficients is interesting, but I'm a little worried that this might be an artifact. I suspect that reward probability is higher after common transitions, due to the fact that animals are choosing actions they think will lead to higher reward. This suggests that the coefficients might be inevitably correlated by virtue of the task design and the fact that all regions are sensitive to reward. Can the authors rule out this possibility (e.g., by simulation)?

      We fully agree with the reviewer that the task design has in-built correlations between transition and reward, and thus the correlation between neural selectivity for feedback and transition (Figure 3E) may be due to the different reward expectation after common or rare transitions. We did try to make this point in the manuscript:

      This suggests that the brain treats being diverted away from your current objective equivalent to losing reward, which is sensible as the subject would normally expect lower rewards on rare trials if their reward-seeking behaviour was efficient.

      We’ve now updated the wording of this statement to try and better make this point and avoid confusion that any non-reward-related encoding is involved:

      “As the reward expectation will be higher on common compared to rare trials, this demonstrates that the brain encodes being diverted to an area with a lower reward expectation equivalent to actually receiving a low reward (and vice versa).”

      We have also adjusted the significance test of this correlation to use a circular permutation test that accounts for correlations between the regressors. This test still found there to be significant correlation in all areas.

      We have described this new permutation test in Methods:

      “For comparing correlations between weights for different features (i.e., between transition and reward coding, Figure 3E), the null distribution of correlations observed in circularly shifted data was compared to the correlation seen in the actual data. This accounts for any correlations between features that existed in the task by preserving the structure of the design matrices.”

      And updated the text in Results accordingly:

      “All regions, but particularly ACC, encoded a common transition (at the time of transition) similar to a high reward (at the time of feedback), as there was a positive correlation between the coefficients for reward and transition (the transition parameter was signed such that common and rare transitions were equivalent to high and low rewards, respectively) (ACC r=0.4963, DLPFC r=0.3273, caudate r=0.4712, putamen, r=0.5052; all p<0.002 except DLPFC where p=0.006, circular permutation test; Figure 3E, S5).”

      The explore/exploit section seems somewhat randomly tacked on. Is this really relevant? If yes, then I think it needs to be integrated more coherently.

      We thank the reviewer for this comment. We agree that the motivation for the explore/exploit analysis was not sufficiently clear in the original version.

      Our aim was not to introduce this as a separate or tangential effect, but rather to highlight how the task’s reward structure (with outcome levels stable for 5–9 trials) naturally created alternating periods favoring exploitation of a known high-value option versus exploration when outcomes changed. This feature of the task is tightly linked to MB-RL computations, as it requires integration of state-transition knowledge and updating across trials.

      Importantly, we show previously in the manuscript that ACC encoded state-transition structure (i.e., common versus rare transition) and MB-value estimates (at choice epoch). However, here we aimed to highlight that the same region also modulated choice encoding as a function of whether the subject was in an exploratory or exploitative regime – by knowing another feature of the task that relies on state-transition and outcome. We have revised this section to better integrate it into the main logic of the paper:

      “In our task, the outcome level (high, medium, low) of each second-stage stimulus remained the same for 5-9 trials before potentially changing. This design naturally created periods where subjects could ‘exploit’ the same Choice 1 to maximize reward for several trials; and other periods where they had to ‘explore’ different second-stage stimuli to optimize reward (as contingencies shifted). In classical MB-RL, the transition between reward states can be learned by keeping counts of observed transitions from a current state-action pair to a subsequent state, yielding a maximum-likelihood estimate of the environment’s dynamics [42]. In fact, knowledge about the reward contingency schedule could support decision-making in both exploitation – by enabling efficient choice when rewards are stable; and exploration – by guiding alternative behaviour most likely to yield improved outcomes (this is different from MF learning, where exploration is more random since the agent lacks explicit state-transition knowledge).

      We thus repeated our decoding analysis of choice 1 stimulus identity, but this time limited trials to those where they had not received a high reward for the previous two trials (‘explore’ trials), and those where the previous two rewards had been the highest level (‘exploit’ trials). All regions encoded choice 1 for some duration of the choice epoch for both explore (p<0.002 in all cases, permutation test; Figure 7A) and exploit (p<0.002 in all cases; Figure 7B) conditions, but decoding accuracy was strongest in ACC. Choice 1 was less strongly decoded – particularly in ACC – in the former condition compared to the latter (p<0.002 for at least 140 ms in all cases, permutation test on differences observed; Figure 7C); and, also during exploitation, the ACC encoded choice 1 before the choice was even presented to the subject (Figure S8). This pre-choice ACC encoding in exploit trials may reflect the need to allocate cognitive (or attentive) resources to features – i.e., choice 1 stimulus identity – that are most certain predictors of important outcomes. As a control, we also decoded the direction of the Choice 1 (where choice was indicated via joystick movement), which was randomised each trial and therefore orthogonal to the stimulus that was chosen. Again, all four regions encoded its direction in both explore (p<0.002 in all cases; Figure 7D) and exploit (p<0.002 in all cases; Figure 7E). However, there were minimal differences in the strength of the representation between explore and exploit conditions (ACC, p=0.088, cluster-based permutation test; DLPFC p=0.016; caudate p=0.32; putamen p=1; Figure 7F). Therefore, exploit behaviour specifically upregulated relevant task parameters that were worth remembering across trials.”

      Reviewer #2 (Public review):

      Summary:

      The authors investigate single-neuron activity in rhesus macaques during model-based (MB) and model-free (MF) reinforcement learning (RL). Using a well-established two-step choice task, they analyze neural correlates of MB and MF learning across four brain regions: the anterior cingulate cortex (ACC), dorsolateral PFC (DLPFC), caudate, and putamen. The study provides strong evidence that these regions encode distinct RL-related signals, with ACC playing a dominant role in MB learning and caudate updating value representations after rare transitions. The authors apply rigorous statistical analyses to characterize neural encoding at both population and single-neuron levels.

      Strengths:

      (1) The research fills a gap in the literature, which has been limited in directly dissociating MB vs. MF learning at the single unit level and across brain areas known to be involved in reinforcement learning. This study advances our understanding of how different brain regions are involved in RL computations.

      (2) The study used a two-step choice task Miranda et al., (2020), which was previously established for distinguishing MB and MF reinforcement learning strategies.

      (3) The use of multiple brain regions (ACC, DLPFC, caudate, and putamen) in the study enabled comparisons across cortical and subcortical structures.

      (4) The study used multiple GLMs, population-level encoding analyses, and decoding approaches. With each analysis, they conducted the appropriate controls for multiple comparisons and described their methods clearly.

      (5) They implemented control regressors to account for neural drift and temporal autocorrelation.

      (6) The authors showed evidence for three main findings:

      (a) ACC as the strongest encoder of MB variables from the four areas, which emphasizes its role in tracking transition structures and reward-based learning. The ACC also showed sustained representation of feedback that went into the next trial. b) ACC was the only area to represent both MB and MF value representations.

      (c) The caudate selectively updates value representations when rare transitions occur, supporting its role in MB updating.

      (7) The findings support the idea that MB and MF reinforcement learning operate in parallel rather than strictly competing.

      (8) The paper also discusses how MB computations could be an extension of sophisticated MF strategies.

      Weaknesses:

      (1) There is limited evidence for a causal relationship between neural activity and behavior. The authors cite previous lesion studies, but causality between neural encoding in ACC, caudate, and putamen and behavioral reliance on MB or MF learning is not established.

      We agree with the reviewer that the present study does not establish causal relationships, and we do not claim otherwise in the manuscript. Our work was designed as a comprehensive characterization of neural activity across ACC, DLPFC, caudate, and putamen during reward-seeking decision-making. By systematically comparing MB- and MF- RL signals across these regions, we provide new insights into the division of labor and cooperative interactions within cortico-striatal networks.

      While causal manipulations (e.g., lesions, inactivations, stimulation) are indeed required to directly establish necessity or sufficiency, correlational studies such as ours play a crucial role in identifying where and how computationally relevant signals are represented. Importantly, our findings align with and extend prior causal work, for example showing that ACC and striatal lesions disrupt MB control. Thus, our study contributes a detailed functional mapping of MB and MF RL encoding across multiple nodes of this circuit, which serves as an important foundation for future causal investigations (e.g., using transcranial ultrasound stimulation).

      (2) There is a heavy emphasis on ACC versus other areas, but it is unclear how much of this signal drives behavior relative to the caudate.

      We appreciate the reviewer's observation regarding this matter. Our intention was not to place a heavy emphasis on ACC, rather this came naturally from the data. The ACC demonstrated considerably more robust and enduring neural activity compared to other brain regions – for instance, reward-related signals in the ACC continued well beyond individual trials (Fig. 2A-B), and encoding of state transitions remained active from the initial transition through to the feedback phase (Fig. 3A-B). By comparison, distinctions among other regions were less pronounced, which naturally resulted in the ACC receiving greater attention in our analytical findings.

      We acknowledge that the caudate plays an essential and complementary role in driving behavior, and we believe that this is emphasized in the two key subsections of our “Results”. First, caudate neurons encoded model-based choice values (Fig. 4A, 4C) and uniquely remapped these values following rare transitions (Fig. 5), reflecting flexible adjustment of action values. Second, decoding analyses showed that both ACC and caudate populations predicted first-stage choices (Fig. 6C), linking their activity directly to behavioral decisions. In the Discussion section, we also highlight that “the distinctive caudate signal of updating (flipping) the value estimates of the currently experienced option on rare trials” goes beyond a “general temporal-difference RPE” and rather supports “the role of caudate in MB valuation”.

      (3) The role of the putamen is somewhat underexplored here.

      Our analyses were conducted in an identical manner across all four recorded regions (ACC, DLPFC, caudate, and putamen), and we consistently reported the results for putamen alongside the others. For example, in the Results section we describe how “both caudate and putamen encoded the reward from the previous trial negatively during the feedback period of the current trial” (Fig. 2F-G), and that “all regions had a significant population of neurons that encoded MB-, but not MF-, derived value” including putamen (Fig. 4F). Similarly, we show that putamen, like caudate, encoded a dopamine-like RPE signal at feedback (“both caudate and putamen neurons clearly responded at feedback with the parametric features of a dopamine-like RPE”; Discussion). These findings align with previous work linking the putamen to MF learning and are discussed explicitly in the context of MF-MB dissociations. We therefore believe that the putamen was not underexplored, but rather that its contribution was more circumscribed relative to ACC and caudate because the signals observed were quantitatively weaker and less distinctive for MB computations.

      (4) The authors mention the monkeys were overtrained before recording, which might have led to a bias in the MB versus MF strategy.

      We agree that extensive training can influence the balance between MB and MF in choice behaviour and neuronal responses.

      In a previous comprehensive behavioral analysis of the same dataset (Miranda et al., 2020, PLoS Computational Biology - ref. 36, Figure S6B) we showed that both MB and MF strategies contributed to behavior, with MB dominance stable across weeks of testing – supporting that overtraining did not eliminate MF influences (but rather stabilized a mixed strategy with robust MB contributions).

      In the same manuscript, we have also: i) cautioned the readers when comparing our results to data from the original human studies; ii) acknowledged that our extensive training cannot address earlier phases of learning in which sensitivity to the task structure is first acquired; and iii) also provided task-related reasons for such MB dominance – as training made the transition structure well learned (making MB computationally less costly and faster to implement) and the non-stationary outcomes favored the flexibility of MB strategies.

      In the present manuscript, we also have acknowledged that overtraining may have shifted neural signals toward stronger MB representations, or alternatively enabled more sophisticated task representations:

      “On the other hand, MF-based estimates were neither as striking nor as specific to striatal regions as expected and observed in previous studies [18]. The monkeys were extensively trained on the task before recordings commenced, which may have caused a shift towards both MB behaviour and MB value representation within the striatum. Alternatively, this training may have allowed more sophisticated representations to occur, such as using latent states to expand the task space [54].”

      Importantly, we strongly believe that this possibility does not detract from our main finding that both MB and MF signals were present across regions, with ACC showing the strongest multiplexing of the two.

      (5) The GLM3 model combines MB and MF value estimates but does not clearly mention how hyperparameters were optimized to prevent overfitting. While the hybrid model explains behavior well, it does not clarify whether MB/MF weighting changes dynamically over time.

      We appreciate this comment and would like to note that, for completeness, we have on several occasions directed the reader to our prior behavioural analysis of the same dataset (Miranda et al., 2020, PLoS Computational Biology, ref 36). In that work, we provide a full and detailed description of both the task and the computational modeling approach (see particularly the “Model fitting procedures” section). Furthermore, our model-fitting was grounded in the MF/MB RL framework used in the original human two-step study (Daw et al., 2011); and the fitting procedures also followed previous studies (Huys et al., 2011).

      Hyperparameters – including the MB/MF weighting parameter (ω) - were estimated using maximum likelihood under two complementary approaches and with priors providing regularization across sessions. First, we performed a fixed-effects analysis, in which parameters were estimated independently for each session by maximizing the likelihood separately; secondly, we conducted a mixed-effects analysis, treating parameters as random effects across sessions within each subject. The effect of the prior procedure reduces the risk of overfitting by constraining parameters based on their empirical distributions, rather than allowing unconstrained session-by-session estimates. Finally, all model fitting procedures were verified on surrogate generated data.

      With regard to dynamic weighting, our approach – consistent with most two-step studies – assumed ω to be constant across trials within each session. This was a deliberate choice, both for comparability with prior work and because our subjects were extensively trained, making session-level stability of strategy weights a reasonable assumption. Indeed, our analyses showed no systematic drift in ω across sessions, suggesting that MB/MF balance was stable over sessions. While approaches that allow dynamic ω estimation are possible, we believe such extensions would likely have minimal impact in the current dataset.

      (6) It was unclear from the task description whether the images used changed periodically or how the transition effect (e.g., in Figure 3) could be disambiguated from a visual response to the pair of cues.

      All images were kept constant across sessions. Common/Rare transitions themselves were not explicitly cued, but rather each second-stage state was associated with a specific background colour, followed ~1s later by the presentation of two specific second-stage choice cues (Figure 1B). Hence the subject could infer whether they were transitioned down a Rare or Common path by the background colour, which can be disambiguated in time from the visual responses to the second-stage cues. We’ve updated the Results text to make this clearer:

      “Tracking the state-transition structure of the task is imperative for solving the task as a MB-learner. All four regions encoded whether the current trial’s first-stage choice transitioned to the common or rare second-stage state (which could be inferred by a change in background colour immediately after choice indicating which second stage state they had just entered, Figure 1A).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 7 appears to be missing.

      We thank the reviewer for pointing this out. Figure 7 was inadvertently omitted in the previous version and has now been included in the revised manuscript.

      (2) No stats reported in the section on explore/exploit.

      We apologise for this oversight. This section now also reports the relevant statistics:

      “We thus repeated our decoding analysis of choice 1 stimulus identity, but this time limited trials to those where they had not received a high reward for the previous two trials (‘explore’ trials), and those where the previous two rewards had been the highest level (‘exploit’ trials). All regions encoded choice 1 for some duration of the choice epoch for both explore (p<0.002 in all cases, permutation test; Figure 7A) and exploit (p<0.002 in all cases; Figure 7B) conditions, but decoding accuracy was strongest in ACC. Choice 1 was less strongly decoded – particularly in ACC – in the former condition compared to the latter (p<0.002 for at least 140 ms in all cases, permutation test on differences observed; Figure 7C); and, also during exploitation, the ACC encoded choice 1 before the choice was even presented to the subject (Figure S8). This pre-choice ACC encoding in exploit trials may reflect the need to allocate cognitive (or attentive) resources to features – i.e., choice 1 stimulus identity – that are most certain predictors of important outcomes. As a control, we also decoded the direction of the Choice 1 (where choice was indicated via joystick movement), which was randomised each trial and therefore orthogonal to the stimulus that was chosen. Again, all four regions encoded its direction in both explore (p<0.002 in all cases; Figure 7D) and exploit (p<0.002 in all cases; Figure 7E). However, there were minimal differences in the strength of the representation between explore and exploit conditions (ACC, p=0.088, cluster-based permutation test; DLPFC p=0.016; caudate p=0.32; putamen p=1; Figure 7F).”

      (3) Make sure that error bars are explained in all figure captions where appropriate.

      We apologise that this information was absent. Error bars always represent the standard error of the mean. This has now been added to all relevant figure legends.

      Reviewer #2 (Recommendations for the authors):

      Overall, I think this is a great manuscript and was presented clearly and succinctly. I have some minor suggestions:

      (1) Typo: Abstract "ACC, DLPFC, caudate and striatum" I think should be "caudate and putamen".

      We have amended this incorrect reference in the introduction:

      “One such task that does enable the dissociation of MB and MF computations is Daw et al. (2011)’s ‘two-step’ task [18]. It contains a probabilistic transition between task states to uncouple MF learners (who would assign credit to which state was rewarded regardless of the transition) from MB learners (who would appropriately assign credit based on the reward and transition that occurred). Rodents [19], monkeys [36], and humans [18] all use MB-like behaviour to solve the task. Evidence in rodents suggests dorsal anterior cingulate cortex (ACC) tracks rewards, states, and the probabilistic transition structure, and that ACC is essential in implementing a MB-strategy [37]. Here, we compare primate single neuron activity of 4 different subregions implicated in reward-based learning and choice (ACC, dorsolateral PFC (DLPFC), caudate, and putamen) during performance of the classic two-step task, and demonstrate signatures of MB-RL primarily in ACC, and MF-RL signatures most notably in putamen.”

      (2) Could the authors provide a rationale for why they did the single-level encoding the way they did, instead of running an ANOVA?

      We thank the reviewer for this point. We are not entirely certain which specific ANOVA approach is being suggested, but our rationale for using a GLM-based encoding analysis is that such approach allows us to model continuous, trial-by-trial variables (e.g., value signals, prediction errors, transitions) while simultaneously controlling for multiple correlated predictors. This approach is widely used in systems neuroscience (particularly in decision-making research) offering analytical flexibility and comparability with prior approaches.

      (3) How were the 20 iterations for decoding decided? That seems low.

      We do not agree that 20 repetitions of 5-fold cross validation is low. The error bars in panels 6C-E demonstrate what low variance occurred across these 20 repetitions. It is the average of these low variance repetitions against which we performed statistics by performing a permutation test where these 20 repetitions were repeated a further 500 times.

      (4) It was unclear to me how the authors reached the conclusion "Thus, caudate activity appeared to represent the value of the state the subject was currently in." when the state value wasn't computed directly. I don't see how encoding the chosen and unchosen option is the same as the state the animal is in, which should also incorporate where the animal is in a block of trials or session, and the knowledge regarding the chosen and unchosen option.

      We agree with this point and have tempered this statement:

      “Thus, caudate’s encoding of an option’s value also reflected the availability of the option.”

      (5) Figures 1C, D, and E were not legible to me even at 200% zoom.

      We apologise for this oversight. We’ve now updated panels 1C-E to a more readable size:

      (6) There is a Figure 2H in the figure legend, but the panel appears to be missing from Figure 2.

      This text has been removed.

      (7) Figure 2: It would've been nice to see F and G for all areas.

      We have now added this data as additional panels in Figure 2.

      (8) Figure 3: How is the transition disambiguated from a visual response to the set of images?

      This was indicated by the background changing colour to that of the learned second stage state before the actual choices were presented. We’ve updated the Results text to make this clearer:

      “Tracking the state-transition structure of the task is imperative for solving the task as a MB-learner. All four regions encoded whether the current trial’s first-stage choice transitioned to the common or rare second-stage state (which was indicated by a change in background colour before the second stage choices were presented, Figure 1A).”

      (9) Figure 4F: Is this collapsed across time points? So neurons that were significant at any time? I'm confused how Figure 4A relates to 4F, as 4A shows much lower percentages of significant neurons.

      Figure 4F counts the total number of neurons that had a significant period of encoding at any timepoint over the epoch (as assessed with a length-based permutation test). Whereas, 4A shows the amount of significant encoding neurons at any one time point. Investigating this further, we found that the encoding was dynamic with different neurons encoding different parts of the epoch. We have now added a new supplementary figure to highlight this and refer to it in Results:

      “Examination of the strongest signal observed, ACC’s encoding of MB Q-values, showed a dynamic pattern with different neurons encoding the signal at different parts of the epoch (Figure S6). When aggregating the number of significant coders throughout the epoch, and examining the specificity of MB versus MF coding, we found that all regions had a significant population of neurons that encoded MB-, but not MF-, derived value (30, 18.72, 23 and 24% of neurons in ACC, DLPFC, caudate and putamen respectively; all p<0.0014 binomial test against 10% (as the strongest response to either of the two options was used); Figure 4F).“

      (10) Data/ code could be made publicly available instead of upon request.

      All data and code to reproduce figures are now available at https://github.com/jamesbutler01/TwoStepExperiment. The manuscript has been updated to reflect this:

      Data and materials availability:

      All data and code to reproduce figures are available at https://github.com/jamesbutler01/TwoStepExperiment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors' goal was to advance the understanding of metabolic flux in the bradyzoite cyst form of the parasite T. gondii, since this is a major form of transmission of this ubiquitous parasite, but very little is understood about cyst metabolism and growth. Nonetheless, this is an important advance in understanding and targeting bradyzoite growth.

      Strengths:

      The study used a newly developed technique for growing T. gondii cystic parasites in a human muscle-cell myotube format, which enables culturing and analysis of cysts. This enabled the screening of a set of anti-parasitic compounds to identify those that inhibit growth in both vegetative (tachyzoite) forms and bradyzoites (cysts). Three of these compounds were used for comparative Metabolomic profiling to demonstrate differences in metabolism between the two cellular forms.

      One of the compounds yielded a pattern consistent with targeting the mitochondrial bc1 complex and suggests a role for this complex in metabolism in the bradyzoite form, an important advance in understanding this life stage.

      Weaknesses:

      Studies such as these provide important insights into the overall metabolic differences between different life stages, and they also underscore the challenge of interpreting individual patterns caused by metabolic inhibitors due to the systemic level of some of the targets, so that some observed effects are indirect consequences of the inhibitor action. While the authors make a compelling argument for focusing on the role of the bc1 complex, there are some inconsistencies in the patterns that underscore the complexity of metabolic systems.

      We agree with reviewer #1 that metabolic fingerprints are complex to interpret and we did try to approach this problem by including mock treatment and non-metabolic inhibitors as controls. We address specific concerns below.

      Reviewer #2 ( Public review):

      Summary:

      A particular challenge in treating infections caused by the parasite Toxoplasma gondii is to target (and ultimately clear) the tissue cysts that persist for the lifetime of an infected individual. The study by Maus and colleagues leverages the development of a powerful in vitro culture system for the cyst-forming bradyzoite stage of Toxoplasma parasites to screen a compound library for candidate inhibitors of parasite proliferation and survival. They identify numerous inhibitors capable of inhibiting both the disease-causing tachyzoite and the cyst-forming bradyzoite stages of the parasite. To characterize the potential targets of some of these inhibitors, they undertake metabolomic analyses. The metabolic signatures from these analyses lead them to identify one compound (MMV1028806) that interferes with aspects of parasite mitochondrial metabolism. The authors claim that MV1028806 targets the bc1 complex of the mitochondrial electron transport chain of the parasite, although the evidence for this is indirect and speculative. Nevertheless, the study presents an exciting approach for identifying and characterizing much-needed inhibitors for targeting tissue cysts in these parasites.

      Strengths:

      The study presents convincing proof-of-principle evidence that the myotube-based in vitro culture system for T. gondii bradyzoites can be used to screen compound libraries, enabling the identification of compounds that target the proliferation and/or survival of this stage of the parasite. The study also utilizes metabolomic approaches to characterize metabolic 'signatures' that provide clues to the potential targets of candidate inhibitors, although falls short of identifying the actual targets.

      Weaknesses:

      (1) The authors claim to have identified a compound in their screen (MMV1028806) that targets the bc1 complex of the mitochondrial electron transport chain (ETC). The evidence they present for this claim is indirect (metabolomic signatures and changes in mitochondrial membrane potential) and could be explained by the compound targeting other components of the ETC or affecting mitochondrial biology or metabolism in other ways. In order to make the conclusion that MMV1028806 targets the bc1 complex, the authors should test specifically whether MMV1028806 inhibits bc1-complex activity (i.e. in a direct enzymatic assay for bc1 complex activity). Testing the activity of MMV1028806 against other mitochondrial dehydrogenases (e.g. dihydroorotate dehydrogenase) that feed electrons into the ETC might also provide valuable insights. The experiments the authors perform also do not directly measure whether MMV1028806 impairs ETC activity, and the authors could also test whether this compound inhibits mitochondrial O2 consumption (as would be expected for a bc1 inhibitor).

      We thank the reviewer for highlighting this important aspect. To further investigate the effect of MMV1028806 on the mETC, we adapted a commercial oxygen consumption assay and demonstrated that MMV1028806, like Atovaquone and Buparvaquone, inhibits the ETC, leading to reduced oxygen consumption similar to Antimycin A, which inhibits the bc1-complex. These results are now included in the revised manuscript (Methods, lines 210–233; Results, lines 460–468).

      (2) The authors claim that compounds targeting bradyzoites have greater lipophilicity than other compounds in the library (and imply that these compounds also have greater gastrointestinal absorbability and permeability across the blood-brain barrier). While it is an attractive idea that lipophilicity influences drug targeting against bradyzoites, the effect seems pretty small and is complicated by the fact that the comparison is being made to compounds that are not active against parasites. If the authors are correct in their assertion that lipophilicity is a major determinant of bradyzoicidal compounds compared to compounds that target tachyzoites alone, you would expect that compounds that target tachyzoites alone would have lower lipophilicity than those that target bradyzoites. It would therefore make more sense to (statistically) compare the bradyzoicidal and dual-acting compounds to those that are only active in tachyzoites (visually the differences seem small in Figure S2B). This hypothesis would be better tested through a structure-activity relationship study of select compounds (which is beyond the scope of the study). Overall, the evidence the authors present that high lipophilicity is a determinant of bradyzoite targeting is not very convincing, and the authors should present their conclusions in a more cautious manner.

      Thank you for raising this excellent point. We performed a statistical test of tachyzoidal and both bradyzoidal and dually active compounds and find indeed no significant difference (P = 0.06). We altered the results text line 367-368 and the figure S2B caption to explicitly mention this.

      (3) Page 11 and Figure 7. The authors claim that their data indicate that ATP is produced by the mitochondria of bradyzoites "independently of exogenous glucose and HDQ-target enzymes." The authors cite their previous study (Christiansen et al, 2022) as evidence that HDQ can enter bradyzoites, since HDQ causes a decrease in mitochondrial membrane potential. Membrane potential is linked to the synthesis of ATP via oxidative phosphorylation. If HDQ is really causing a depletion of membrane potential, is it surprising that the authors observe no decrease in ATP levels in these parasites? Testing the importance of HDQ-target enzymes using genetic approaches (e.g. gene knockout approaches) would provide better insights than the ATP measurements presented in the manuscript, although would require considerable extra work that may be beyond the scope of the study. Given that the authors' assay can't distinguish between ATP synthesized in the mitochondrion vs glycolysis, they may wish to interpret their data with greater caution.

      We thank the reviewer for addressing this important point. The enzymatic assay used in our study cannot distinguish whether ATP is produced via glycolysis or mitochondrial respiration. However, we minimized glycolytic ATP production in bradyzoites by starving them for one week without glucose. After this period, amylopectin stores are depleted, forcing the parasites to utilize glutamine via the GABA shunt to fuel the TCA cycle and generate ATP predominantly through respiration. While minor ATP production via gluconeogenic fluxes cannot be excluded, the main ATP supply under these conditions is expected to originate from the mitochondrial electron transport chain. Indeed, ATP levels are lower in HDQ-treated bradyzoites, which we attribute to the compound’s impact on electron-supplying enzymes upstream of the bc1 complex, although this inhibition is not sufficient to fully abolish ATP production as observed with Atovaquone treatment.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an exciting 400-drug screening using a MMV pathogen box to select compounds that effectively affect the medically important Toxoplasma parasite bradyzoite stage. This work utilises a bradyzoites culture technique that was published recently by the same group. They focused on compounds that affected directly the mitochondria electron transport chain (mETC) bc1-complex and compared them with other bc1 inhibitors described in the literature such as atovaquone and HDQs. They further provide metabolomics analysis of inhibited parasites which serves to provide support for the target and to characterise the outcome of the different inhibitors.

      Strengths:

      This work is important as, until now, there are no effective drugs that clear cysts during T. gondii infection. So, the discovery of new inhibitors that are effective against this parasite stage in culture and thus have the potential to battle chronic infection is needed. The further metabolic characterization provides indirect target validation and highlights different metabolic outcomes for different inhibitors. The latter forms the basis for new studies in the field to understand the mode of inhibition and mechanism of bc1-complex function in detail.

      The authors focused on the function of one compound, MMV1028806, that is demonstrated to have a similar metabolic outcome to burvaquone. Furthermore, the authors evaluated the importance of ATP production in tachyzoite and bradyzoites stages and under atovaquone/HDQs drugs.

      Weaknesses:

      Although the authors did experiments to identify the metabolomic profile of the compounds and suggested bc-1 complex as the main target of MMV1028806, they did not provide experimental validation for that.

      In our updated manuscript we performed additional experiments such as oxygen consumption assay to further qualify the bc1 complex as the target. We also toned down some of our statements to make sure that no false claims are made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Introduction: It would be helpful to briefly describe what the pathogen Box is, what compounds are in it, and the rationale for using a drug screen to better understand mitochondrial function in cysts.

      Thank you for this suggestion, we added an introduction of the MMV pathogen box and outlined our rationale for our experimental approach in lines 90 to 99.

      Please explain why dual-active drugs were useful for understanding differences, rather than just seeking drugs that might target bradyzoites alone.

      We focused on dually active compounds for two reasons. First, these are the most promising and potent targets to develop drugs against. Both stages might occur simultaneously and these dually active drugs may eliminate the need for treatment with a drug combination. Second, we speculated that monitoring the responses to inhibition of the same process in both parasite stages would reveal its functional consequences. Dually active compounds enable this direct comparison. Bradyzoite-specific compounds may be interesting from a developmental perspective but may require a reverse genetic follow-up to compare differences between stages. The lack of a well-established inducible expression system in bradyzoites that allows short term and synchronized knock-down makes metabolomic approaches difficult. We added these two points in brief to the results section (line 378 – 381).

      Figure 4: this is a very important figure in understanding the significance of the work, but it is not well described in the legend. Even if these graphics have been used in other manuscripts, it would be helpful to provide better annotation in the figure legend.

      Thank you for pointing this out. We expanded the figure legend to explain the isotopologues data in more detail. Line 793 to 802.

      B,D: Explain what the three columns for each drug category represent.

      Addressed

      C,E: Explain what isotopologues are, what the M+ notation means, and what the pie charts represent. Other main figures have suitable legends.

      Addressed

      Discussion: there are several places where the reasoning is a bit hard to follow, and rearrangement to provide a clear logical flow would be helpful. In particular, the reasoning for why HDQ impairs active but non-essential processes could be laid out more clearly.

      We added additional clarifications to the discussion section and re-wrote the HDQ paragraph. We hope that our reasoning is now easier to follow.

      Abbreviations: A list of abbreviations for the entire manuscript would be helpful.

      This is a good idea and we now provide an abbreviations list.

      Minor typos:

      P12, 2d paragraph: sentence beginning with: Consistent with this hypothesis... "cysts" is used twice

      Corrected

      P15, top of the second paragraph: "nano" and "molar" should be one word

      Corrected

      Reviewer #2 (Recommendations for the authors):

      Major comments (not already covered in the weaknesses section of the public review)

      (1) Figure 2 and the related description of these experiments in the methods section (page 3). The approach for calculating IC50 values for the compounds against tachyzoites is unclear. How did the authors determine the time point for calculating IC50 vacuoles? Was this when the DMSO control wells reached maximum fluorescence? This could be described in a clearer manner. A concern with calculating IC50 values on different days is that parasites will have undergone more lytic cycles after 7 days compared to 4 days, which means that the IC50 values for fast- vs slow-acting compounds might be quite different between these days. As a more minor comment on these experiments, the methods section does not describe whether the test compound was removed after 7 days, as the experimental scheme in Figure S1A seems to imply. Please clarify in the methods section.

      This is a very good point and we clarified this in the methods section, line 157–160. In brief, we choose the latest time point when exponential growth could be observed in the fastest growing cultures, generally this was in mock treated cultures and at day 4 post infection. We also clarified that we changed media and removed treatment after 7 days.

      Minor Comments

      (2) Page 2. "we employed a recently developed human myotube-based culture system to generate mature T. gondii drug-tolerant bradyzoites". What makes these bradyzoites 'drug-tolerant' or to which drugs are they tolerant? This isn't clear from the description.

      We added these details in the introduction (line 94 to 96) and state that these cysts develop resistance against anti-folates, bumped kinase inhibitors and HDQ, a Co-enzyme Q analog.

      (3) Figure 1E. The number of compounds in this pie chart adds up to 384, whereas the methods describe that 371 compounds were tested. What explains this discrepancy in numbers?

      We understand the confusion. We now updated the pie chart to reflect only compounds that were included in the primary screen (371) as reflected in Supplementary Table S1. We separately analysed 29 compounds that were previously tested against tachyzoites by Spalenka et al., and found an additional 13 compound, that were originally included in the pie chart. In a secondary test the activity of 10 of these 13 compounds could be confirmed. All in all we found the 16 compounds shown in Fig. 2 E-G.

      (4) Page 3. The resazurin assays for measuring host cell viability could be explained in a clearer manner. What host cells were used? Were the host cells confluent when the drug was added (and the assay conducted) or was the drug added when the host cells were first seeded? How long were the host cells cultured in the candidate inhibitors before the assays were performed? What concentration (or concentration range) were the compounds tested? The host inhibition data are not easily accessible to the reader - the authors might consider including these data as part of Table S2D.

      The necessary information was added to the methods section (line 145 to 153). We tested for host toxicity in both HFF and KD3 myotubes during the primary screen at 10 µM in triplicates. The colorimetric assay was performed after tachyzoite growth assays in HFFs 7 days post infection and after completion of the 4 week re-growth phase of bradyzoites in myotubes. The resulting data is already part of Supplementary File 1. In addition, we performed concentration dependent resazurin assays after secondary concentration dependent growth inhibition assays and also included data in Supplementary File 1. For the bradyzoite growth assay we performed visual inspection after drug exposure for one week and before tachyzoite re-growth to detect missing or damaged monolayer. Also, this data is included in the Supplementary File 1. We also included the cytotoxicity data as suggested into Table S2D.

      (5) Page 7. "Except for four compounds (MMV021013, MMV022478, MMV658988, MMV659004), minimal lethal concentrations were higher in bradyzoites". The variation in these data seems quite large to be making this claim. Consider a statistical analysis of these data to compare potencies in tachyzoites vs bradyzoites.

      With this sentence we aimed to describe the results and not to make a statement. We toned down the sentence to “… minimal lethal concentrations appear generally higher in bradyzoites… “ line 344 to 347. We also added a line 1 µM in the charts to facilitate easier comparison of compound efficacies.

      (6) It would be helpful to readers to include the structures of hit compounds in the figures (perhaps as part of Figure 3).

      This is a good idea and would improve the manuscript. To not overburden figure 3 we added structures to Fig S3.

      (7) Page 8. "Infected monolayers were treated for three hours with a 3-fold of respective IC50 concentrations". 3-fold higher than IC50 concentrations? This isn't clear.

      Thank you for noticing this: We clarified the sentence and also corrected the concentration, corresponding to five times their IC50s as stated in the methods section: “Infected monolayers were treated for three hours with compound concentrations five times their respective IC<sub>50</sub> values or the solvent DMSO.” Line 374 - 376

      (8) Page 9. "buparvaquone, which we found to be dually active against T. gondii tachyzoites and bradyzoites, targets the bc1-complex in Theileria annulata (McHardy et al. 1985) and Neospora caninum (Müller et al. 2015) and was recently found active against T. gondii tachyzoites (Hayward et al. 2023)." The latter paper showed that buparvaquone targets the bc1 complex in T. gondii tachyzoites as well.

      Yes, it was found to inhibit O2 consumption rate in tachyzoites. We changed the sentence accordingly. Line 407 to 411.

      (9) Page 9. "Anaplerotic substrates were also affected by all three treatments, most notably a strong accumulation of aspartic acid." It is interesting that the M+3 isotopologue of aspartate (presumably synthesised from pyruvate) is the predominant form (rather than the M+2 and M+4 isotopologues that would derive from the TCA cycle, and as the diagram in Figure 4A seems to suggest). Given that aspartate is a precursor of pyrimidine biosynthesis that is upstream of the DHODH reaction, it is conceivable that its accumulation is related to the depletion of pyrimidine biosynthesis (so would tie into the point about the accumulation of DHO and CarbAsp noted earlier in the paragraph).

      Yes, we assume the same. We altered the text and summarized the changes in Asp as a result of DHOD inhibition, as we also already do in the next paragraph using <sup>15</sup>N-glutamine labelling. Line: 416 - 418

      (10) Figure 6 and Page 10. Regarding the metabolomic experiments that show increased levels of acyl-carnitines. The authors note that "Since [beta-oxidation] is thought to be absent in T. gondii, we attribute these changes to inhibition of host mitochondria". This is conceivable, although the T. gondii genome does encode homologs of the proteins necessary for beta-oxidation (e.g. see PMID 35298557). If the carnitine is coming from host mitochondria, is host contamination a concern for interpreting the metabolomic data? Or do the authors think that parasites are scavenging carnitine from host cells? It is curious that the carnitine accumulation is observed in parasites treated with buparvaquone (and MMV1028806) but not atovaquone, even though buparvaquone and atovaquone (and possibly MMV1028806) target the same enzyme. Do the authors have any thoughts on why that might be the case?

      Yes, thank you for raising this point. We changed the discussion elaborating on this and included the debated presence of beta-oxidation: line 640: “We also detect elevated levels of acyl-carnitines in BPQ and MMV1028806 treated bradyzoites. These molecules act as shuttles for the mitochondrial import of fatty acids for β-oxidation. However, this pathway has not been shown to be active and is deemed absent in T. gondii (35298557, 18775675). The presence of acyl-carnitines in bradyzoites might reflect import from the host. It is conceivable that their elevation in response to buparvaquone and MMV1028806 indicates compromised functionality of the host bc1-complex and subsequently accumulating β-oxidation substrates. Indeed, BPQ has a very broad activity across Apicomplexa (Hudson et al. 1985) and kinetoplastids (Croft et al. 1992).“ Regarding the existence of beta-oxidation: some potential enzymes might be conserved, but those could in part take part in branched chain amino acid degradation pathways. On a separate note: we looked extensively on beta-oxidation using stable isotope labelling and became convinced that any activity occurred in the host cell only but not in the parasite (unpublished).

      (11) Page 11. "the mitochondrial [electron] transport chain in bradyzoites".

      Corrected.

      (12) Figure S6B. Were these optimization experiments performed in tachyzoites or bradyzoites? If the former, and given that bradyzoites have apparently smaller amounts of ATP per parasite (Figure 7C), are these values in the linear range for 10^5 bradyzoites?

      Yes, we do think that the assay remains linear for these lower concentrations. Tachyzoites give a linear response starting from 10^3 parasites per sample. In the actual experiment we used 10^5 parasites, both tachyzoites and bradyzoites. Under the tested conditions bradyzoites maintain 10% of the ATP pools of tachyzoites, which should be well within the linear range of the assay. Also in Atovaquone-treated bradyzoites ATP concentration could be lower to 10% and still remain in the linear range of the assay. For practical reasons, we simply acknowledge this limitation and consider it acceptable within the scope of this study.

      Reviewer #3 (Recommendations for the authors):

      Major comments

      (1) The authors should provide a negative control for the experiment on Figure 5. I would suggest doing the same experiment with an inhibitor that has no effect on mitochondrial potential.

      We addressed this criticism by repeating the assay on tachyzoites and additionally including inhibitors that do not have the mitochondrial electron transport chain as their primary target (Pyrimethamine, Clindamycin, 6-Diazo-5-oxo-L-norleucin). The results are summarized in the supplementary Fig S5, line 445 – 449) and show that there is no effect of these inhibitors on the mitochondrial membrane potential. This supports the specificity of the assay and suggests that MMV1028806 and BPQ indeed target a mitochondrial process in this stage. Also, in this repetition ATQ, BPQ and MMV1028806 did significantly deplete the Mitotracker signal.

      (2) Figure 5 - Did the authors perform this experiment in 3 biological replicates? This requires clarification of the figure legend.

      No, we did not perform the experiment in 3 biological replicates. After establishing the assay thoroughly, we performed it once on tachyzoites and bradyzoites. The sampling was done on every vacuole we encountered during microscopy going through the slide from left to right. That is the reason the sample size varies from treatment to treatment. The sample size is mentioned in the caption of figure 5. However, we repeated the experiment with additional controls (see Fig. S5), which showed that the Mitotracker signals were significantly depleted in a very similar manner in ATQ, BPQ and MMV1028806 treated parasites.

      (3) The authors identify that MMV1028806 has bc1-complex as the main target. I suggest that they should perform a complex III activity assay to affirm this. Also, it would be good to test if other mETC complexes are affected by this compound to prove its specificity. There is only one paper showing complex III activity in tachyzoites (PMID:37471441) and no papers in bradyzoites. So if the authors cannot do this assay, I suggest that they should change the text indicating that bc-1 complex could be the main target of the compound but more experimental validation is needed.

      We hope to have satisfied the reviewer’s request by performing an oxygen consumption assay on tachyzoites. Together with metabolic profiling and labelling data, this shows that both upstream and downstream processes are impacted by MMV1028806 and strongly suggest the bc1-complex as a target (Fig 5E).

      (4) Figure S5 - Are the differences shown in the EM experiment statistically supported?

      We analyzed 28 images and measured the areas in 12 to 26 images. We substituted the table of means in Fig S6B by a graph showing individual values. These areas are indeed statistically different between DMSO and ATQ / MMV treated parasites. We changed the wording in the results section accordingly “Analysis by thin section electron microscopy revealed a largely unaffected sub-mitochondrial ultrastructure but the areas of mitochondrial profiles were changed in comparison to control after exposure with ATQ and MMV1028806 but not with BPQ (Fig. S6)“. The description of Fig S6B was changed to “(B) Measured areas of mitochondrial profiles from 21, 12, 15 and 26 images showing DMSO, ATQ, BPQ and MMV1028806 treated parasites (* denotes p < 0.05 in Mann-Whitney tests)”.

      Minor comments:

      (1) What was the criteria to choose the example compounds in Figure 1B and 1D? The authors should clarify this in the text.

      These graphs are shown for illustrative purposes and were chosen based on their display of different drug efficacies. We considered this helpful for interpreting the screening data.

      (2) Figure 2G - add statistical analysis.

      We added Mann-Whitney tests and updated the figure legend and results text accordingly in line 344 – 347.

      (3) The authors should provide more insights in the discussion about why this new compound is the next step in drug discovery compared to atovaquone or burvaquone - for example, do you expect better availability in the brain, etc.

      We used MMV1028806 and the other hits ATQ and BPQ to make the point that the bc1-complex is a good target in bradyzoites that allows curative treatment. We do not suggest that the compound itself is a good starting point. We point to other actively developed candidates such as ELQ series in the discussion, line 719.

      (4) Scale bars in Figure 5 should be aligned and have equal thickness.

      We re-formatted the scale bars and aligned them when not obscuring parasites.

      (5) The authors should be consistent with font sizes and styles in all the figures.

      We adjusted the font styles to match each other.

    1. Mères Solos : Réalités, Précarité et Défis d'une France Invisible

      Résumé Exécutif

      Ce document synthétise les témoignages et les analyses issus d'un groupe de mères isolées, mettant en lumière la situation critique des familles monoparentales en France.

      Représentant une famille sur quatre et touchant un enfant sur trois, la monoparentalité est majoritairement portée par les femmes (85 % des cas).

      Le constat est sans appel : ces mères font face à une "double peine" caractérisée par un traumatisme émotionnel lié à l'abandon, une précarité économique structurelle et un épuisement physique et psychique profond.

      Le système social et juridique actuel, encore calqué sur le modèle de la famille traditionnelle, échoue à offrir un soutien adéquat, laissant ces femmes gérer seules la charge totale de l'éducation, du financement et de la survie domestique, souvent au détriment de leur propre santé.

      --------------------------------------------------------------------------------

      1. Le Traumatisme de la Rupture et l'Abandon

      Les récits mettent en évidence une récurrence brutale : la séparation intervient souvent au moment où la parentalité devient concrète (grossesse ou premiers mois de l'enfant).

      L'abandon durant la vulnérabilité : Plusieurs témoignages décrivent des pères qui se désengagent dès l'annonce de la grossesse ou juste après l'accouchement.

      Une mère relate ainsi le départ du père seulement deux semaines après lui avoir avoué ses doutes, alors qu'elle gérait une convalescence post-césarienne et deux enfants en bas âge.

      La lâcheté et le choc : Le départ est souvent soudain, laissant la mère dans un état de choc psychologique.

      Une intervenante raconte avoir été quittée par SMS après avoir annoncé sa grossesse, le père disparaissant du jour au lendemain.

      L'état de "pilote automatique" : Pour survivre et s'occuper du nouveau-né, ces mères entrent dans un mode de fonctionnement mécanique.

      Ce "cloisonnement" du cerveau permet d'assumer les tâches quotidiennes mais entraîne souvent une perte de souvenirs sur les premiers mois de l'enfant, le traumatisme n'étant traité que bien plus tard.

      2. Épuisement Physique et Charge Mentale Totale

      La solitude dans l'éducation et la gestion du quotidien mène à un épuisement que l'entourage, même bienveillant, peine à concevoir.

      L'impact sur la santé

      Burn-out maternel : Le cumul d'un travail à temps plein, des horaires de crèche et de la gestion domestique sans relais conduit fréquemment à l'effondrement.

      Une mère rapporte avoir perdu énormément de poids avant que son médecin ne l'arrête.

      Santé physique dégradée : La fatigue chronique et le stress post-grossesse déclenchent des pathologies physiques sérieuses.

      Certaines mères évoquent des "maladies de vieux" dues à l'épuisement systémique.

      Détresse psychologique : L'isolement est tel que des pensées suicidaires peuvent apparaître, non par désir de mort, mais par saturation face à l'ampleur de la tâche ("se jeter du cinquième étage").

      La gestion du quotidien au travail

      Invisibilité de la fatigue : Les mères solos s'efforcent de "camoufler" leur état (maquillage, sourire) pour paraître productives et éviter d'être perçues comme un poids pour leur équipe.

      Injonction à la performance : Le monde du travail exige le même investissement d'une mère isolée que d'un collègue en couple, ignorant les nuits hachées depuis plusieurs années.

      3. Précarité Économique et Paradoxes Sociaux

      Le passage à la monoparentalité s'accompagne presque systématiquement d'un déclassement financier et d'une lutte pour les besoins de base.

      | Dispositif / Situation | Réalité constatée dans les sources | | --- | --- | | RSA Majoré | Plafonné à environ 1 040 €, mais réduit par les autres aides de la CAF. Le reste à vivre peut descendre sous les 600 €, soit moins qu'un célibataire sans enfant. | | Reconversion forcée | Nécessité de quitter des métiers prenants (ex: commerciale) pour des postes plus "souples" mais souvent moins rémunérés (ex: gestionnaire de paie). | | Accès au logement | Difficulté majeure malgré le statut de famille prioritaire, souvent théorique. | | Pauvreté infantile | Les enfants ressentent la précarité (peur de manquer de nourriture ou de jouets), ce qui crée une anxiété précoce. |

      4. Stigmatisation et Modèle Social Inadapté

      La société française reste structurée autour du modèle "papa, maman, les enfants", ce qui marginalise les mères solos.

      Le sentiment de honte : Beaucoup de mères vivent leur situation comme un "échec complet" et craignent d'être étiquetées comme "cas social".

      Cette honte les pousse parfois à mentir sur leur situation au parc ou à l'école pour maintenir une apparence de normalité.

      Le regard social différencié : Il existe un biais de perception entre les genres.

      Un père célibataire suscite souvent de l'empathie ("le pauvre, sa femme est partie"), tandis qu'une mère isolée est plus facilement stigmatisée.

      L'isolement dominical : Les moments de loisirs (parcs, balades) renvoient violemment à la solitude face aux familles "nucléaires", provoquant un sentiment d'exclusion sociale.

      5. Défaillances Juridiques et Absence des Pères

      Le cadre légal actuel est jugé inique et inefficace par les mères concernées.

      Droit vs Devoir : Le système protège les droits des pères (autorité parentale, droit de visite) mais n'impose quasiment aucun devoir, hormis une pension alimentaire souvent difficile à recouvrer.

      Un père peut décider de ne pas venir chercher son enfant "parce qu'il est fatigué" sans conséquence légale, bloquant ainsi toute organisation pour la mère.

      L'autorité parentale partagée comme obstacle : Bien que le père soit absent, il conserve un pouvoir de décision sur les choix importants (école, santé), ce qui entrave la gestion quotidienne de la mère qui assume pourtant 100 % de la charge.

      Lenteur judiciaire : Les délais pour obtenir un jugement (souvent plus d'un an) laissent les familles dans un "flou artistique" où le parent le moins investi peut agir à sa guise.

      Conclusion : L'importance du Soutien Collectif

      Face à l'absence de réponses politiques et administratives satisfaisantes, les groupes de parole et les associations apparaissent comme les seuls refuges.

      Ces espaces permettent de :

      Rompre l'isolement : Échanger avec des personnes vivant la même réalité sans avoir besoin d'expliquer l'inexplicable.

      Valider le vécu : Comprendre que la fatigue et le sentiment d'échec ne sont pas des failles personnelles mais des conséquences systémiques.

      Retrouver de l'estime de soi : Par le biais d'activités comme la sophrologie ou la danse, pour se réapproprier un corps et une identité au-delà du seul statut de mère nourricière.