10,000 Matching Annotations
  1. Last 7 days
    1. Coding agents do not fully utilize robot resources when they are reading logs, writing code, debugging, or waiting for the language-model backbone

      这个发现揭示了一个尚未解决的效率问题:物理资源(机器人)和计算资源(GPU)被智能体的思考时间浪费了。智能体在阅读日志、写代码、等待LM响应时,机器人闲置。这与多处理器计算中的同步开销是同一个问题的变体——如何让智能体的认知周期和物理执行周期更好地交叉重叠,是这类系统真正走向规模化的关键工程挑战。

    2. We evaluate the physical autoresearch capability of three coding agents: Codex with GPT-5.5, Claude Code with Opus 4.7, and Kimi Code with Kimi K2.6

      这个三方对比是论文里最有竞争情报价值的部分。值得注意的是评估框架:不是问哪个模型写的代码更好,而是问哪个编程智能体在有限时间内能把机器人策略的成功率提升得更高——这是一个端到端的、以物理世界结果为标准的评估。这类基准比纯代码生成基准更接近真实价值,也更难被单纯的参数规模优势所碾压。

  2. clavis-nxt-user-guide-clavisnxt-erste-dev.apps.okd.dorsum.intra clavis-nxt-user-guide-clavisnxt-erste-dev.apps.okd.dorsum.intra
    1. back as Kirkpatrick in 1894.

      I guess that they were right back than but I feel like there is more too be learned as too different obsticals as in genetic code, mental disorders, prior experience, etc.

    1. Running local models is good now
      • Evolving Quality: Local Large Language Models (LLMs) have achieved major milestones in accuracy, utility, and speed over the past six months, transitioning from simple "personalized Google" documentation lookups to handling localized agentic software development workflows.
      • Hardware Requirements: Running larger models effectively requires high-spec hardware (e.g., Apple M-Series with 64 GB+ unified RAM) to maintain an expansive Key-Value (K-V) cache and avoid critical performance degradation.
      • Top Performing Architecture: Recent open-weights families, such as Gemma 4 (specifically the gemma-4-26b-a4b and the faster gemma-4-12b-qat), have successfully reached roughly 75% of the accuracy and speed found in cloud-hosted frontier API models.
      • Agentic Workflows: Local models can now successfully loop and interact with local environments to orchestrate non-trivial tasks like refactoring code, writing unit tests, and bootstrapping full application repositories.
      • Secure Execution: Running developer-facing local agents poses local file system security risks, making a decoupled architecture—such as isolating the agent harness inside a containerized Docker Sandbox with restricted shell permissions—an essential security best practice.
      • Persistent Ecosystem Bottlenecks: Despite massive progress, challenges remain around slow initial token pre-fill, limited context windows bounded by local hardware constraints, prompt template mismatches on release, and the heavy compute strain that maximizes GPU and RAM workloads.

      Hacker News Discussion

      • Operational Friction: Many users argue that local models remain painful to run effectively. They note a stark divide between smart but slow dense models (e.g., Qwen 27B, Gemma 31B) and fast but error-prone Mixture of Experts (MoE) models.
      • The Quantization Trap: Commenters point out that many users run low-bit quantizations (like 4-bit) to save RAM, which effectively lobotomizes the model's capacity for complex tool calling. Industry recommendations favor a minimum of 5-bit for dense models and 6-bit for MoEs.
      • Hardware & Comfort Trademarks: Running these workloads locally often transforms high-end laptops or desktops into loud, hot, and energy-churning machines, making the physical development environment uncomfortable.
      • Privacy and Data Sovereignty: A heated debate emerged regarding hosted vs. local options. While some demand local setups due to data-collection practices and copyright concerns of major tech providers, others prefer private API gateways or hosted "open model clouds" (like OpenRouter or specialized European hosters like OVH) that guarantee Zero Data Retention (ZDR).
    1. the share of sessions spent fixing broken code fell by nearly half, from 33% to 19%

      这个数据点显示了编程工作模式的重要转变:修复代码的时间占比从33%下降到19%,减少了近一半。这表明随着AI代理能力的提升,用户可能减少了调试时间,转而专注于更高层次的任务。这一趋势与文章中提到的任务价值增长(平均27%)相呼应,暗示AI代理正在将用户从低价值维护工作转向高价值创新工作。然而,文章未解释这种转变的具体原因,可能是AI能力提升,也可能是用户技能提高。

    2. Claude Code users now spend an average of 20 hours per week using the tool.

      这个数据点表明Claude Code用户每周平均使用时间为20小时,这是一个相当高的使用频率。这表明用户对该工具有较高依赖度,可能将其整合到日常工作中。然而,文章脚注2明确指出这测量的是Claude Code活跃运行的时间,而非用户实际输入的时间,这可能高估了用户参与度。20小时/周的数字与典型工作周(40小时)相比,意味着用户可能将一半的技术工作时间花在这个工具上。

    3. we introduce a framework for studying interactive agentic coding based on a privacy-preserving analysis of ~400,000 Claude Code sessions from between October 2025 and April 2026.

      这个数据点表示研究基于约40万个Claude Code会话,时间跨度为7个月(2025年10月至2026年4月)。这是一个相当大的样本量,增强了研究结果的统计可靠性。然而,文章未明确说明这些会话是如何被筛选或分类的,以及是否代表了所有Claude Code用户群体的完整情况。40万个会话对应约23.5万用户,平均每位用户约1.7个会话,这可能表明用户参与度相对有限。

    4. people make about 70% of the planning decisions but only 20% of the execution decisions

      这个70/20的比例揭示了人机协作的明确分工模式:人类主要负责决策规划,AI则负责具体执行。这一比例表明AI在执行任务方面已经相当自主,但在战略规划上仍依赖人类。这一数据点与同类研究相比显示出较高的人机协作水平,可能反映了Claude Code的设计理念和用户使用习惯。

    5. we introduce a framework for studying interactive agentic coding based on a privacy-preserving analysis of ~400,000 Claude Code sessions from between October 2025 and April 2026.

      这个数据点显示了研究的样本规模为约40万次Claude Code会话,时间跨度为7个月。这是一个相当大的数据集,增强了研究结果的可靠性。然而,我们不知道这40万次会话是否代表了所有用户,或者是否存在样本偏差。此外,研究仅限于Claude Code的使用,可能无法推广到其他AI编码工具。

    1. Leider scheiterte ich daran in der SItzung. Mir war es nicht möglich beide Sensoren zweitgleich auszulesen. Ich gehe davon aus, dass entweder die Spannungsversorgung nicht ausreichend ist um beide zu betreiben. Weniger im SInne dessen das diese Sensoren zu viel Last haben und ehr dahin gehened, dass die Spannungsspitzen sobald sie angesprochen werden zu nah nach einander kommen und dann zu gpch sind punltuell. Alternativ wäre es auch möglich dass in meinem Code ein denkfehler passiert ist und mehrmals die gleichen Pins angesprochen werden.

      In der vergangenen Sitzung hat Elisa dieses Problem mit den beiden Sensoren lösen können. Es ist wichtig, dass die Daten im Pico richtig hexadezimal kodiert werden und im Codec des Device profiles dann wieder ausgepackt werden.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In response to the reviewers’ comments, we have made revisions to the manuscript. Specifically, we have:

      (1) Increased the sample size in the whole-brain imaging and demixed principal component analysis (dPCA) analyses presented in Figures 1 and 3, strengthening the statistical support for our conclusions;

      (2) Revised the presentation of Figure 3B to clarify that the displayed dPC1 traces were scaled for visualization purposes only (dPC1 / max(dPC1)), rather than normalized for quantitative comparison across animals;

      (3) Expanded the main text and supplementary figures to provide more intuitive explanations and geometric illustrations of dPCA and hyperbolic space analysis, and clarified the interpretation of correlation matrices and principal-angle analyses to improve readability;

      (4) Substantially expanded the sections on Bayesian multidimensional scaling and hyperbolic embedding, including additional methodological details and validation analyses to strengthen the computational framework and its interpretation;

      (5) Expanded the Discussion to incorporate recent studies and discuss potential mechanisms underlying DRN 5-HT-mediated motor suppression.

      We believe that these revisions have substantially strengthened the manuscript and addressed the major concerns raised during peer review.

      Reviewer #1 (Public review):

      The wide-ranging serotonergic projections emerging from the Dorsal Raphe nucleus (DRN) are suggestive of a central role in regulating brain-wide activity and behavioural states. DRN activity has been associated with diverse functions, ranging from mood, motivation and pain regulation to sleep and cognitive flexibility. Its far-reaching connectivity made it challenging to assess the brain-wide effect of its activation, especially during behaviour.

      The present study by Qi et al. addresses these challenges by combining state-of-the-art tracking microscopy with the whole-brain accessibility of the larval zebrafish model. To investigate the effect of DRN activation, the authors leveraged the Tg(tph2:ChrimsonR) line to optogenetically activate tph2-positive neurons in the DRN, while monitoring changes in brain-wide activity, locomotion and auditory-stimuli evoked responses.

      Optogenetic activation had a suppressing effect on locomotion, which the authors distinguished from inducing sleep by the maintenance of posture and its sleep disturbing effect of nighttime stimulations. Further, the authors report a distinct effect of DRN activation on motor-related, but not auditoryrelated neuronal subspaces, identified by demixed principal component analysis.

      In addition, rather than affecting all motor-correlated neurons similarly, tph2+ DRN-mediated suppression focused on neurons encoding high-amplitude or turning motion.

      In summary, the work of Qi et al. provides solid evidence for a predominant role of the DRN in wake-state motor suppression by aptly combining the vast data-acquisition possibilities of the larval zebrafish model with computational methods to extract relevant information.

      The brain-wide scope of the analysis is a key strength, reducing bias, confirming the involvement of known motor and auditory regions, and providing a valuable dataset for future analyses.

      While the results well support the conclusion of the authors, certain biological and technical aspects demand discussion.

      We thank you for the positive and thoughtful evaluation of our work. We also appreciate your constructive comments on the biological and technical aspects of the study. We have carefully considered these concerns and addressed them point-by-point below, with corresponding revisions to the manuscript.

      Reviewer #1 (Recommendations for the authors):

      (1) Further samples required:

      Figure 1D relies on n=3 with lots of variability; the author should add more Ns to illustrate their point (typically 10-15 fish used per study to show reliability across fish).

      Figure 3 also relies only on 5 fish in each condition; the authors should increase to 10-15 to show variability.

      Thank you for this valuable suggestion. To address this concern, we have increased the sample size in the revised manuscript. Specifically, the number of animals in Figure 1D has been increased from n = 3 to n = 5, and additional statistical analyses have been included to strengthen the quantitative support for our conclusions. Note that the error bars are plotted as standard deviation (SD), which may make the variability appear larger. In Figure 3, the number of animals was also increased from n = 5 to n = 8.

      In addition, our findings are consistent with previous work showing a strong association between elevated dorsal raphe nucleus (DRN) activity and reduced locomotion in zebrafish [1, 2, 3]. Importantly, across animals, the variance explained by the dPCA components and the rapid modulation of whole-brain state remain highly consistent, supporting the robustness and reproducibility of our observations.

      Given this increased sample size together with consistency across animals and convergence with prior studies, we believe the current dataset provides sufficient statistical and biological support for our conclusions.

      (2) Further steps to be added to the analysis to fully support the claim:

      It appears that the individual brains are registered and individually clustered into areas by combining highly-correlated nearby neurons.

      dPCA is then computed for individual brains. Evidence for our interpretation of individual dPCA spaces:

      (1) Figure 3A depicts separate dPCs for different fish.

      (2) Line 488–489 describes normalization of the value range of dPCs to compare across fish, which implies separate dPCs.

      While the authors normalize the projections onto the principal components, the dPCA spaces remain individual, as does the meaning of their components. It is thus questionable how to conclude from data across fish in a rigorous manner.

      Instead, we recommend that the authors build voxels for each individual’s brain and calculate dPCA across all brains, not individual ones, so that components could become truly comparable across the brains of given individuals.

      We thank the reviewer for this important comment. We would like to clarify that our analysis does not aim to construct a shared dPCA space across animals or to quantitatively compare dPC scores between individuals. In this analysis, dPCA was performed separately for each fish to capture the dominant low-dimensional population dynamics within each individual brain.

      The purpose of Figure 2 is to demonstrate that DRN activation induces a rapid and robust transition in whole-brain activity, rather than to define a common population subspace across animals.

      We also attempted to register and pool data across animals for a joint analysis, as suggested by the reviewer. However, our dataset includes zebrafish at slightly different developmental stages (6–12 dpf). Although the behavioral effects of DRN activation (including motor suppression and global brain-state modulation) were robust across this age range, developmental differences introduced substantial anatomical variability in brain size and morphology, which reduced registration accuracy and made voxel-wise correspondence across animals unreliable.

      We realize that our previous description of “normalization” may have caused confusion. To clarify, the dPC1 traces shown in Figure 2 were only scaled for visualization by dividing each fish’s projection by its maximum value (dPC1 / max(dPC1)), so that trajectories from different fish could be displayed on the same axis. This scaling does not alter the underlying dPCA space, does not constitute normalization for cross-animal comparison, and was not used for any quantitative analysis.

      Importantly, despite being computed independently for each fish, we observed a consistent temporal pattern across animals: DRN activation was reliably accompanied by a rapid transition captured by dPC1 in each individual fish. We have revised the Methods and corresponding text in the manuscript to make this distinction explicit and avoid ambiguity.

      Reviewer #2 (Public review):

      Summary:

      The authors examine the effects of activating the dorsal raphe nucleus serotonergic system using a combination of calcium imaging and optogenetics in freely moving larval zebrafish. Their findings show that optogenetic stimulation induces a state of behavioral quiescence.

      They further investigate whether this state corresponds to sleep or reduced motor activity. Analyses of posture and sleep-related paradigms indicate that serotonergic activation primarily suppresses motor output rather than promoting sleep. Notably, this suppression appears to be bout type-dependent, with stronger effects on neurons associated with larger tail amplitudes and turning angles.

      In addition, auditory stimulation experiments reveal no significant impact of serotonin on sound encoding.

      We thank the reviewer for the careful and thoughtful summary of our work.

      Strengths:

      The study combines advanced experimental techniques with state-of-the-art analytical methods, enabling precise and compelling insights into the role of serotonergic modulation. The experiments and analyses are well aligned with the questions being addressed, and the results appear robust and reliable.

      Moreover, the implementation of experiments that combine calcium imaging and optogenetics in freely moving animals is technically challenging and appears well justified in the context of the research questions.

      We thank you for the positive assessment of our work and for recognizing the technical and analytical strengths of our experimental approach.

      We address the reviewer’s specific comments in detail below.

      Weaknesses:

      While the analytical techniques employed are sophisticated and appear to be appropriately applied, their presentation makes the manuscript difficult to follow. Although the explanations are provided in the Methods section, including more guidance in the main text, such as how to interpret each analytical approach and what outcomes would be expected under different scenarios, would help readers who are less familiar with these techniques.

      Providing this context would better guide the reader in navigating the figures, broaden the accessibility of the work, and ultimately increase its impact.

      We thank you for this important suggestion. To improve clarity and accessibility, we have revised the main text to provide more intuitive explanations of both demixed principal component analysis (dPCA) and hyperbolic space analysis, with additional emphasis on how to interpret their outputs and what different outcomes imply biologically.

      Additionally, we have included new supplementary figures (Figure S2 and Figure S6) with geometric illustrations and simplified examples to provide a more visual and conceptual understanding of these methods. We hope these revisions make the analytical framework easier to follow and improve the accessibility and impact of the manuscript.

      While the authors discuss different quiescent states mediated by serotonin reported in previous studies, their interpretation is limited to stating that “a common feature shared by these distinct behavioral states is a pronounced reduction in movement,” and consequently proposing that activation of dorsal raphe nucleus is not sufficient to specify a particular behavioral state, but rather plays a primary role in driving motor suppression.

      In my view, a more thorough attempt to determine whether the observed state corresponds to any of the previously described forms of quiescence, or represents a subset or variant of them, would strengthen the manuscript. This would help better integrate the findings with the existing literature.

      For example, given that the authors have access to whole-brain activity data, it would be valuable to examine and discuss whether there are shared patterns of activation with previously reported quiescent states.

      Thank you for the insightful suggestion. To address this, we compared our whole-brain activity patterns with key neural signatures reported in previously characterized zebrafish quiescent states.

      A recent study reported that exposure to conspecific alarm substance (CAS) induces a quiescent but vigilant state associated with elevated DRN 5-HT activity and low-frequency synchronized forebrain activity [3]. In our dataset, although DRN 5-HT activation similarly induced robust locomotor suppression, we did not detect comparable low-frequency synchronized forebrain dynamics during the stimulation period. These results suggest that while DRN 5-HT activation is sufficient to induce motor suppression, it does not recapitulate the full neural signature of CAS-induced vigilant quiescence. We have incorporated this comparison and its interpretation into the Discussion section of the revised manuscript.

      Following the termination of optogenetic stimulation, we observed a gradual recovery of locomotory speed, consistent with the behavior in an earlier study [3], although our recovery was much faster. Interestingly, whole brain imaging also revealed a transient increase in forebrain activity. This elevated forebrain activity gradually returned to baseline as locomotor activity recovered. In accordance with the reviewer’s suggestion, we propose that these forebrain dynamics represent a common motif that facilitates the transition out of the DRN-induced quiescent state (Author response image 1.).

      The manuscript largely avoids discussing the mechanisms underlying the observed motor suppression. For instance, is this effect driven directly by serotonin release onto target neurons? Is it mediated by glial activity, as suggested in other studies? Are additional neuromodulatory systems being recruited?

      While addressing these questions may require substantial further work, potentially beyond the scope of the present study, the availability of whole-brain data provides an opportunity to at least explore or

      Author response image 1.

      Forebrain activity increases following termination of DRN optogenetic stimulation. (A) Following the termination of optogenetic stimulation of DRN 5-HT neurons, locomotor speed in Tg(tph2:ChrimsonR) zebrafish gradually recovered and returned to control levels. (B) Neural activity in forebrain regions showed a transient increase immediately after stimulation offset and gradually returned to baseline as locomotor activity recovered. discuss these possibilities. In particular, it would be interesting to examine the recruitment of regions not directly stimulated but known to be associated with other neuromodulatory systems or promoting glial activation (e.g., the locus coeruleus).

      We thank you for this important suggestion. In the revised Discussion, we now frame our findings in relation to several candidate mechanisms.

      Our results are most consistent with a direct neuromodulatory action of serotonin on downstream motor-related circuits. This is supported by the known projection patterns of DRN 5-HT neurons [4], which target midbrain and hindbrain regions involved in motor control, as well as by prior serotonin imaging studies showing elevated 5-HT levels in hindbrain regions during low-motor states, where inhibitory HTR1-family receptors are enriched [5]. In addition, recent voltage imaging studies have shown that DRN serotonergic neurons are embedded within a broader motor-state-dependent circuit, in which they are dynamically regulated by local GABAergic inputs [6]. We have incorporated a discussion of these potential mechanisms into the revised Discussion.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 91-97 page 2.

      “dPCA separates neural population activity into components tied to specific experimental variables, allowing us to isolate DRN-dependent changes (Methods). Components associated with DRN activation explained significantly more variance in Tg(tph2:ChrimsonR) zebrafish than in controls (Fig. 3A), indicating a strong serotonergic impact on brain-wide neural activity. The small stimulation-related variance in controls likely reflected visual responses to laser.”

      Directly stimulated neurons are not included, as stated in the Methods, but I think it would be better to mention this explicitly in the main text.

      We thank you for this helpful suggestion. We agree that explicitly stating this point in the main text improves clarity. In our analysis, neurons directly stimulated by the laser were excluded (as described in the Methods) to ensure that the identified components reflect whole brain responses rather than direct optogenetic activation. We have now added a clarifying sentence in the Results section to make this explicit.

      (2) Lines 113 - 115 page 3.

      “To examine how DRN 5-HT neuron activation affects sensorimotor processing (Fig. 4C), we next recorded whole-brain neural activity in head-fixed, tail-free larvae embedded in agarose to capture transient calcium signals with minimal motion artifacts.”

      Lines 117-119 page 3.

      “Because head-fixed larvae rarely enter natural sleep, we applied 1 mM mepyramine, a sleep-promoting antihistamine, to induce a sleep-like state (41), which markedly changed auditory responses (Fig. 4E, Fig. S2C)”

      Why not perform these experiments in freely moving fish instead? To what extent do movements in freely moving animals affect segmentation? Is it actually problematic to apply dPCA in that case? You used it in the previous section.

      We thank the reviewer for raising this important point. In principle, freely moving preparations would provide a more natural behavioral context. However, reliable application of dPCA requires stable neuron identification and accurate trial alignment across time, both of which are substantially compromised in freely moving larvae due to motion-induced imaging noise and segmentation errors.

      In our hands, whole-brain calcium imaging in freely moving fish introduces significant variability in segmentation and signal extraction, which in turn leads to unstable and noisy low-dimensional decompositions, preventing robust estimation of task-related components. By contrast, the head-fixed preparation enables consistent neuron tracking and precise alignment to sensory stimuli, which are critical for dPCA.

      We have now clarified in the manuscript that all dPCA analyses were performed on head-fixed animals.

      (3) Line 117 page 3.

      Why do you use cosine similarity? Are the results different when using other metrics?

      I can see the matrix, but what exactly are you looking for in it to support the claim ”DRN activation preserved the structure of the auditory population code”? I think explaining some of these concepts more clearly, or at least providing expectations or interpretations for the different metrics and analyses, would make the manuscript easier to follow.

      We thank you for this question. Cosine similarity is widely used to quantify similarity between population activity patterns because it captures relative activity across neurons while ignoring overall gain.

      In our analysis, each trial is a population activity vector, and the cosine similarity matrix encodes pairwise relationships between these vectors. We assess preservation of the auditory population code by testing whether this similarity structure (i.e., the geometry of population responses) remains consistent across conditions. We have expanded the text to clarify how these matrices are constructed and interpreted.

      In addition, we computed alternative similarity measures based on Pearson correlation, which is equivalent to the cosine similarity of two vectors after they have been centered (subtracting the mean of each vector) (Author response image 2A). We further quantified pairwise trial distances using the Euclidean chord distance on the unit hypersphere, defined as

      D<sub>ij</sub> = √2(1−C<sub>ij</sub>), where C<sub>ij</sub> is Pearson correlation; smaller distances indicate higher similarity (Author response image 2B). Both alternative measures yielded qualitatively consistent results, showing that DRN 5-HT neuron activation preserves the similarity structure across trials.

      (4) Figure 4D.

      If “significant alignment between DRN activation and motor-related neural subspaces, with the sound related subspace being nearly orthogonal” is correct, shouldn’t there be some visible overlap between blue and red, and little to no overlap with yellow? This is not easy to see. Perhaps plotting all three in a single panel would help.

      We thank you for this helpful suggestion. We would like to clarify that the “alignment” we refer to is defined in terms of the angle between neural subspaces, rather than the spatial overlap of neurons. In other words, significant alignment indicates that the corresponding population activity patterns occupy similar directions in a high-dimensional activity space.

      As a result, even statistically significant aligned subspaces (see further exposition below) do not necessarily involve overlapping sets of neurons with large PC weights. This distinction is important because subspace geometry is defined at the population level and cannot be directly inferred from spatial overlap in low-dimensional visualizations. In addition, the visualization shown in Fig. 4D highlights only brain regions containing neurons with relatively high weights for illustrative purposes.

      We also note that the current visualization is based on a maximum intensity projection of a 3D volume, which can create the appearance of overlap in two dimensions even when the underlying neurons are spatially segregated in three dimensions. To provide a clearer spatial reference, we have re-plotted the three subspaces in a three-dimensional representation.

      (5) Figure 4F.

      Do the arrows represent the values for each combination? This is not clear to me. Perhaps it could be clarified in the paragraph. Most of the values, including those being compared, are around 87 plus minus 2 degrees, i.e., mostly orthogonal. Does this imply no overlap between patterns (again, this is hard to see in Figure 4D)? The values are different from the null model but still close to orthogonal. The phrase “significant alignment between DRN activation and motor-related neural subspaces” could be interpreted as strong alignment, but the values do not seem to support that, do they?

      Author response image 2.

      Alternative similarity measures reveal preserved trial-to-trial similarity structure. (A) Trial-by-trial similarity matrix quantified using Pearson correlation. Higher correlation indicates greater similarity between trials (B) Pairwise trial distances quantified using the Euclidean chord distance on the unit hypersphere (D<sub>ij</sub> = √2(1−C<sub>ij</sub>)), where smaller distances indicate greater similarity between trials.

      Author response image 3.

      Three-dimensional visualization of DRN activation-, motor-, and sound-related subspaces. Threedimensional rendering of the high-weight neurons in the DRN 5-HT activation, motor-related, and sound-related subspaces. Colors are consistent with Figure 4D.

      We thank the reviewer for this important clarification.

      We agree that the phrase “alignment” could be interpreted as implying strong spatial overlap in the anatomical space, which is not what we intend to convey. In our analysis, “alignment” refers to a statistically significant deviation from a null distribution.

      In high-dimensional spaces, random vectors are expected to be nearly orthogonal, with angles tightly concentrated around 90°. To demonstrate this phenomenon, we conducted simulations using random vectors over a range of dimensionalities (100–10,000 dimensions) and observed that the expected angle distribution over 1000 trials becomes progressively more concentrated around 90° as the dimensionality increases (Author response image 4). Therefore, even modest deviations from 90° reflect a systematic bias and indicate structured overlap beyond chance. So, “significantly aligned” means the motor–DRN angle is significantly less than the random baseline, and “significantly orthogonal” for sound–DRN means the angle is significantly closer to 90° than the random baseline. We will revise the text to clarify this point and avoid potential misinterpretation.

      Regarding Figure 4D, we agree that the meaning of the arrows was not sufficiently clear. The arrows represent the mean angle, computed across all fish, between the DRN 5-HT activation subspace and the motor-related subspace (left), and between the DRN 5-HT activation subspace and the sound-related subspace (right). We will update the figure legend to explicitly define these elements.

      Author response image 4.

      Random vectors become increasingly orthogonal in high-dimensional spaces. Simulated distributions of pairwise angles between random vectors across different dimensionalities (100–10,000 dimensions; 1000 repetitions per dimensionality). As dimensionality increases, the angle distribution becomes increasingly concentrated around 90°.

      (6) Lines 125 - 126 page 5.

      “After detecting bouts, we computed each bout’s direction and amplitude and classified them into 12 types.”

      It would be interesting to see how the distribution of bouts looks in the direction-amplitude space, in order to better visualize the 12 bout types (perhaps using different colors). It might also be useful to include examples of the 12 bout types in the supplementary material.

      We thank you for this helpful suggestion. To better visualize the distribution of bouts and the definition of the 12 bout types, we have added a new supplementary figure showing the distribution of all bouts in the direction–amplitude space, with each bout color-coded according to its assigned category, consistent with the scheme used in the main text.

      We further quantified the frequency of each bout type across the dataset, which comprises 1,493 bouts from 7 animals. Among these, 4 animals exhibited all 12 bout types and were therefore included in subsequent regression analyses that require complete coverage of all categories.

      In addition, we have included examples of representative bout types in the supplementary material. These additions improve the clarity and interpretability of the bout classification scheme.

      (7) Lines 131 - 133 page 5.

      “Some neurons exhibited activity related to all bout types with similar amplitudes, yielding low coefficient variability, whereas others responded selectively to specific bout types - typically those with larger tail amplitudes and turning angles - exhibiting higher variability in regression coefficients (Fig. 5B).”

      I would appreciate some quantification of “typically.”

      We thank you for this suggestion. Fig. 5B (bottom) shows a neuron with large variability in regression coefficients across bout types, quantified by the coefficient of variation (CV). Bout types with large amplitudes and turning angles (e.g., type 12) have larger regression coefficients than others. We will remove “typically” from the text.

      (8) Lines 546 - 547 page 15.

      “Fish whose baseline tail movements were insufficient to cover all 12 bout types were excluded from further analysis.”

      It would be useful to report the number or proportion of animals that did not exhibit all 12 bout types. Which types of bouts are less frequently observed?

      Thank you for this helpful suggestion. In the full dataset (n = 7 fish), 4 animals exhibited all 12 bout types. We have now added a supplementary figure showing the occurrence probability of each bout type across all animals.

      (9) Line 147 page 5.

      Honestly, the Bayesian multi-dimensional scaling is difficult to follow, and it is not clear what new insight it provides. I assume that ”hyperbolic geometry indicates complex hierarchical organization” is the main point, but its meaning in this context is not sufficiently explained. This paragraph would benefit from being rewritten for clarity or potentially removed if it does not contribute essential information.

      We appreciate your insightful comments. In response, we have substantially expanded the section on Bayesian multidimensional scaling. First, we now provide an intuitive exposition (see Figure S6) of hyperbolic geometry and multidimensional scaling, clarifying why this framework constitutes a powerful approach for uncovering the geometric and functional organization of neuronal populations. Second, we show that multidimensional scaling in a curved hyperbolic space more accurately captures the correlation structure among neurons than embeddings in a flat Euclidean space. Third, and most notably, we find that the inferred curvature of the hyperbolic embedding space tightly scales with the degree of quiescence: fish in which dorsal raphe nucleus (DRN) stimulation nearly abolished locomotor activity exhibit the largest curvatures (new Figure 5F). Collectively, these computational analysis indicate that the curvature of the embedding space serves as a quantitative signature of the quiescent state.

      References

      (1) J. C. Marques, M. Li, D. Schaak, D. N. Robson, J. M. Li, Internal state dynamics shape brainwide activity and foraging behaviour. Nature 577, 239–243 (2020).

      (2) V. Choudhary, C. R. Heller, S. Aimon, L. de Sardenberg Schmid, D. N. Robson, J. M. Li, Neural and behavioral organization of rapid eye movement sleep in zebrafish. bioRxiv pp. 2023–08 (2023).

      (3) Y. Zhao, C.-X. Huang, Y. Gu, Y. Zhao, W. Ren, Y. Wang, J. Chen, N. N. Guan, J. Song, Serotonergic modulation of vigilance states in zebrafish and mice. Nature Communications 15, 2596 (2024).

      (4) Z. Song, C.-X. Huang, H. Zhang, C. Ye, N. Guan, J. Song, Integrated single-cell atlases unveil the operation principles of whole-brain 5-ht neuronal subsystems. Science Advances 11, eadv8128 (2025).

      (5) R. Haruvi, R. Barbara, I. Shainer, A. Rosenberg, L. Moshe, D. Malamud, J. Toledano, D. Braun, H. Baier, T. Kawashima, Global and compartmentalized serotonergic control of sensorimotor integration underlying motor adaptation. BioRxiv pp. 2024–09 (2024).

      (6) T. Kawashima, Z. Wei, R. Haruvi, I. Shainer, S. Narayan, H. Baier, M. B. Ahrens, Voltage imaging reveals circuit computations in the raphe underlying serotonin-mediated motor vigor learning. Neuron (2025).

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors present MiPS, a platform combining DMD-based patterned illumination, automated microscopy, retrained DeLTA segmentation, and mother-machine microfluidics to selectively inhibit or eliminate cells based on dynamic phenotypes. The system enables targeted UV or red-light illumination in real time using segmentation-informed projection masks, allowing selective enrichment directly within mother-machine devices. The manuscript demonstrates proof-of-concept enrichment of mCherry cells from mixed GFP/mCherry populations, characterizes off-target effects, and performs computational simulations of iterative enrichment rounds. Overall, the engineering and systems integration are impressive, and the platform has strong potential for applications in directed evolution, biosensor optimization, and dynamic phenotype-based selection workflows.

      Overall, I believe the work is suitable for publication after minor revisions and clarification of several aspects of the manuscript. In particular, the paper would benefit from additional context in the Introduction and Methods sections, clearer positioning relative to existing platforms, improved figure readability/captions, and a more careful revision of the English throughout the manuscript.

      Major comments:

      1. The manuscript should better position MiPS relative to recent microscopy-based and DMD-enabled selection/control systems, particularly Lugagne et al., Nature Communications (2024), DOI: 10.1038/s41467-024-46361-1. That work also combines mother-machine microfluidics, DeLTA-based real-time image analysis, and DMD projection. The key distinction here appears to be physical selection/enrichment through targeted killing rather than optogenetic control, and this difference should be stated more explicitly.
      2. The manuscript currently compares MiPS mostly to FACS/MACS. However, the more relevant comparison may be recent image-based and microfluidic photoselection systems. A dedicated comparison table discussing throughput, temporal phenotyping, iterative selection, dynamic phenotype tracking, and enrichment capabilities would strengthen the paper.
      3. The enrichment experiment in Figure 4 represents a relatively simple classification problem (GFP vs mCherry). Since the proposed applications involve subtle continuous phenotypes, it would considerably strengthen the manuscript to include at least one experiment selecting for high vs. low expressors within a single fluorescent reporter population.
      4. The strongest enrichment result (~170-fold enrichment in Figure 5) is entirely simulation-based. Since the manuscript already states that ~45 min is sufficient between rounds for growth evaluation, a real 2-3-round enrichment experiment seems feasible and would substantially strengthen the platform's practical relevance. This experiment appears realistic within a relatively short time investment.
      5. The bimodal distributions in Figure 2 suggest that a fraction of cells may be stress-resistant rather than simply surviving randomly. It would be useful to discuss whether repeated rounds could progressively enrich UV-resistant subpopulations.
      6. The manuscript repeatedly uses the term "killed," although the data shown in Figures 2 and 4 mostly demonstrate strong growth arrest/inhibition. Please clarify how the cutoff of division rate <0.4 h⁻¹ was selected and whether an independent viability assay was performed.
      7. The off-target analysis in Figure 3 is one of the strongest parts of the paper and should probably be emphasized more. The conclusion that the dominant effects are global rather than local is interesting, but additional discussion about optical scattering, ROS diffusion, or device-wide coupling effects would strengthen the interpretation.
      8. UV exposure is inherently mutagenic in E. coli, and untargeted cells still receive a substantial fraction of the UV dose at high targeting fractions. Please discuss whether the MB/red-light modality may be preferable in applications where preserving genotype integrity is important.
      9. The manuscript discusses that methylene blue (MB) improves the on:off target ratio, but MB also appears to reduce baseline growth by ~40% even without red-light exposure. This is potentially important for iterative selection workflows. Please discuss whether this effect is reversible after washout and how rapidly cells recover.
      10. The manuscript states that the retrained DeLTA model used ~3,000 annotated fluorescence images, but no train/validation/test split or segmentation performance metrics are reported. Since segmentation directly impacts phenotype classification and projection targeting, these details are important for reproducibility.
      11. The manuscript would benefit from a stronger Methods description regarding DMD calibration, alignment procedures, projection accuracy validation, and computational timing requirements for the real-time analysis pipeline.

      Minor comments:

      1. In Figure 1, it would help to better distinguish the imaging optical path from the photoselection/UV projection path.
      2. The manuscript claims submicron projection precision (<0.5 µm), but it would help to relate this more directly to trench dimensions and actual biological targeting accuracy.
      3. In Figure 3, please include trench spacing and trench geometry information, since these parameters are important for interpreting local leakage and off-target illumination effects.
      4. The fitted off-target scaling factor (m = 0.26) becomes central to the simulation framework later in the paper, but no uncertainty or confidence interval is reported for this fit.
      5. In Figure 4, please clarify more explicitly how mixed or unidentified trenches were handled computationally before projection.
      6. The enrichment shift from 1:1 to 3.8:1 in Figure 4D is promising, but the number of biological replicates should be stated. If this were a single experiment, additional replicates with error bars would increase confidence in the enrichment result.
      7. Several figure captions would benefit from additional context and clearer definitions of technical terms and abbreviations. In multiple cases, interpreting the figure panels was difficult without returning to the main text.
      8. Please define all abbreviations directly in the figure captions, even if they are introduced earlier in the manuscript.
      9. In several figures, the color coding is not fully explained in the captions. Please make sure all colors, dashed lines, highlighted regions, and overlays are explicitly defined.
      10. The captions should more clearly describe what readers are expected to conclude from each figure, not only what is shown.
      11. Figure 2 caption issue: the manuscript references "Figure 2F," but Figure 2 only contains panels A-E.
      12. The manuscript does not currently clarify whether the software, DMD calibration routines, or retrained DeLTA weights will be publicly released. Clarifying code and software availability would improve reproducibility.
      13. There are several grammatical and readability issues throughout the manuscript. The technical ideas are strong, but some sentences are difficult to follow and would benefit from careful proofreading and language editing.

      Significance

      General assessment:

      This is a creative and technically impressive study that combines mother-machine microfluidics, automated microscopy, real-time image analysis, and DMD-based photoselection into a unified platform for dynamic, phenotype-based enrichment. The strongest aspects of the work are the systems integration, the quantitative characterization of off-target effects, and the conceptual demonstration that dynamic microscopy-derived phenotypes can be linked to physical enrichment workflows.

      The main limitations are that the biological validation remains largely proof-of-concept and the most compelling enrichment results are currently simulation-based rather than experimentally demonstrated across multiple rounds. In addition, the manuscript would benefit from stronger positioning relative to recent image-based and DMD-enabled microfluidic control systems.

      Advance:

      The study extends the field of single-cell microfluidics and image-based selection by introducing a platform that links longitudinal microscopy measurements directly to physical enrichment decisions within mother-machine devices. To my knowledge, the combination of iterative feedback-driven selection, DMD-based targeted elimination, and dynamic phenotype tracking in this context is novel.

      The closest related systems appear to be recent DMD-enabled mother-machine platforms for real-time optogenetic control, particularly those reported by Lugagne et al. (Nature Communications 2024, DOI: 10.1038/s41467-024-46361-1). However, MiPS introduces a distinct conceptual advance by using patterned illumination for selective enrichment/elimination rather than gene-expression modulation alone.

      The advance is primarily technical and conceptual, with potential downstream applications in directed evolution, synthetic biology, biosensor engineering, and dynamic phenotype screening workflows that are difficult or impossible to implement using FACS alone.

      Audience:

      The work will likely be of strongest interest to researchers working in synthetic biology, microfluidics, single-cell analysis, systems biology, bioengineering, and automated microscopy. It may also be of broader interest to communities developing dynamic phenotype screening technologies, closed-loop biological control systems, and next-generation directed evolution platforms.

      The audience is likely specialized but multidisciplinary, spanning both engineering-oriented and biology-oriented researchers. The methods and conceptual framework may also influence future development of automated selection systems beyond the specific mother-machine context.

      Expertise - My expertise includes:

      • Microfluidics
      • Synthetic biology
      • Single-cell systems
      • Automated microscopy
      • Real-time image analysis
      • Bioengineering platforms
      • Dynamic phenotype characterization
    1. Reviewer #2 (Public review):

      Summary:

      This work presents three tools: SqueakPose Studio, which is used for pose estimation; SqueakView, which is used for real-time video and sensor data capture and analysis; and MouseHouse, which is a behavioral and sensor suite for mouse experiments. Together, these tools provide a comprehensive behavioral platform for acquiring and analyzing video, sensor, and behavioral data. The work is open source and provided as a resource for the field.

      Strengths:

      (1) Squeakpose Studio was relatively easy to install and use. We were impressed that we were able to install it and test our own videos with minimal struggles. The authors provide installation tutorial videos that were very helpful.

      (2) The GUI environment for SqueakPose Studio was very usable, and the authors should be commended on the time and effort that went into improving the useability of their system. The keypoint and skeleton configuration was flexible, allowing us to define custom body part sets without modifying code directly. The pose estimation accuracy on our own videos was good right out of the box, without requiring fine-tuning or retraining. For a tool being evaluated for the first time, this was all very impressive!

      Weaknesses:

      (1) While we were able to install and test Squeakpose Studio, it was not entirely seamless. The primary installation resource is a tutorial video, and we would recommend supplementing this with a written installation checklist that explicitly lists all required software dependencies (e.g. Python, UV, Visual Studio). The tutorial video was also at times unclear in distinguishing required from optional components. For example, Visual Studio is described as not necessary, yet the tutorial demonstrates the workflow entirely within that environment, so it may be challenging for a user to follow along without that. We recommend that the authors adopt a stricter, step-by-step installation guide that is prescriptive about required software and leaves little room for confusion.

      (2) The paper also describes SqueakView and MouseHouse. Unfortunately, we were unable to evaluate these components as both require the MouseHouse hardware platform. Even without directly using MouseHouse, we noticed some incompleteness here, as we could not locate a bill of materials, component pricing, or assembly guide in the paper or associated GitHub repositories. Given that affordability and accessibility are central claims, a consolidated parts list, approximate costs, and a build guide or video would be necessary for most labs to realistically decide whether they plan to replicate the hardware and evaluate this functionality that the paper describes. In this regard, we felt that MouseHouse and potentially SqueakView were not sufficiently documented for publication.

      (3) The benchmarking comparison to DeepLabCut (DLC) introduced multiple challenges that left us unclear if the head-to-head comparison was appropriate as described. First, the dataset used for benchmarking was small and homogeneous, from the methods they used "10 min open-field tasks of single mice with bilateral photometry cables." As such, the claims about comparisons between SqueakPose Studio and DLC may be too broad, given this single test case. Specifically, this dataset does not test robustness across lighting conditions, coat colors, species, occlusions, different-shaped arenas, etc. Second, the comparison to DLC in Figure 1 does not include any quantitative statistical comparisons, which are needed to evaluate the claims that were made. For instance, the error in Figure 1e looks worse for their system than DLC, although statistical comparisons were not made. Third, there are many settings and optimizations that can be made for both systems. Without more detail, this makes it hard to know if the head-to-head comparison is really fair. Fourth - the metrics are given as very specific numbers from single runs, i.e., an inference time of 71.59 minutes in Figure 1d. This metric would be more meaningful if it reported the mean of multiple runs, with error estimation. Finally, while the code is available, the trained datasets are made available only on "reasonable request". Given the importance of these datasets to evaluating the method and allowing others to benchmark it against other systems, these should be made available on GitHub. Overall, I would recommend toning down the comparison to DLC and focusing on the strengths of Squeakpose Studio on its own merits.

      (4) The paper at times makes general statements that are beyond what is shown. For instance, discussions of use in human applications are aspirational and should be treated much more conservatively in the discussion, or possibly even removed. As it stands, the discussion implies that this system can already do "zero-shot tracking of human posture and movement", enabling "a bridge between preclinical and clinical behavioral analysis". In principle, this may be true, but even for a Discussion section, this goes far beyond the capabilities that the paper actually shows.

      (5) While the comprehensive nature of the system and its 3 parts is impressive, I felt that it also detracted from the main focus of the paper, which was Squeakpose Studio. I might recommend dropping the other two parts, as they also require a much higher bar for a user to evaluate, and only present the Squeakpose Studio in this paper, presenting this as a general resource for pose estimation. This would also allow them more space to more comprehensively benchmark SqueakPose Studio.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This is a well-written and fully documented methods paper.

      The authors have established a clear rationale for their new packages, especially for real-time use, and demonstrate significant speed improvements that will likely appeal to many users of tools like DLC, SLEAP, and LightningPose. The inclusion of a graphical user interface will help make the package more accessible to neuroscientists with limited computational expertise. While it may be challenging to get users to switch from their established workflows for video analysis, the speed gains offered by this package make it worth considering. The hardware aspects of the project are well-documented, and the GitHub repository for this part of the setup is also thorough. Overall, this paper provides a clear summary of the tools, their uses, setup, and benefits.

      We thank this reviewer for the positive comments and have provided responses to the specific and constructive questions listed below.

      I have a few minor questions about the collective set of tools.

      First, the GitHub repository for SqueakPoseStudio appears to be missing a testing routine and associated badge, and the package has not been formally released. This means users would need to download the repository to install it, correct? I suggest the authors consider publishing a formal release of the package, making it installable via pip, and including a basic testing routine to clearly display the package's status on the repository page. Adding a DOI from Zenodo would also be helpful. A testing routine is especially useful when updates are made, as many users avoid repositories with failing tests.

      We thank the reviewer for this helpful suggestion. We agree that visible testing improves user confidence and reproducibility.

      SqueakPose Studio is currently distributed through a repository-based uv workflow rather than through PyPI alone. This is intentional. The application depends on platform-specific deep-learning libraries, and cloning the repository followed by uv sync provides a reproducible environment across Linux, macOS, and Windows while allowing the application to select CUDA, Apple MPS, or CPU execution at runtime. The written installation instructions now clearly describe this workflow.

      In response to the reviewer’s suggestion, we have added a unit-test suite covering the core helper modules used for label handling, dataset export, prediction, inference, and training logic. We have also added an automated GitHub Actions workflow that runs the tests on pushes and pull requests, together with a repository badge that displays the current test status.

      Second, the installation instructions simply state "Create a virtualenv and install:". This may not be sufficient for many researchers, as most neuroscientists are not experienced Python programmers and require clear guidance on the environment specific to this package. The installation instructions should be expanded to provide more detailed guidance and encourage more users. It would also be helpful to verify that the setups work across Windows, Mac, and Linux.

      We agree that installation guidance should be accessible to researchers who may not routinely manage Python environments. In addition to the existing video walkthrough, we have expanded the written GitHub documentation to provide a clearer, step-by-step installation checklist.

      The revised README now distinguishes required components from optional tools, explains the repository-based uv workflow, and provides the minimal commands needed to create the managed environment and launch the application.

      We have also clarified that an integrated development environment is optional. Although Visual Studio Code is used in the tutorial as a convenient interface for demonstrating the workflow, users may launch SqueakPose Studio directly from a terminal and are not required to use Visual Studio Code, Visual Studio, or any other editor.

      We have tested the application on Apple Silicon macOS systems, Windows systems, Linux systems, and NVIDIA GPU-enabled machines. SqueakPose Studio selects CUDA, Apple MPS, or CPU execution at runtime according to availability. Because accelerator support is partly determined by upstream packages such as PyTorch and Ultralytics, we have added links to the relevant compatibility documentation so that users can confirm whether their current hardware and driver configuration are supported.

      Third, the package defaults to UMAP for non-linear dimensionality reduction, which has some known issues. Can the package be modified to allow for alternative mapping methods, such as PaCMAP, PyDiffMap, or the more comprehensive topometry package?

      We agree with the reviewer that UMAP has limitations and that no single nonlinear dimensionality-reduction method is optimal for all pose datasets or behavioral questions.

      In SqueakPose Studio, the UMAP/HDBSCAN workflow is included as an accessible exploratory example for dimensionality reduction and clustering of pose-derived features. Our goal was not to designate UMAP as a preferred or definitive analysis method, but to provide an interpretable starting point that allows users to identify candidate clusters and inspect representative videos to evaluate what the embedding is capturing.

      We agree that supporting additional approaches, such as PaCMAP, PyDiffMap, or related tools, could be useful, and we will consider adding these as modular options in future versions. At the same time, SqueakPose Studio is not intended to replace specialized downstream behavioral-analysis packages or to adjudicate which embedding method is best for a particular dataset. Pose outputs can be exported for downstream analysis in other environments, including CEBRA, Keypoint-MoSeq, and packages implementing alternative clustering or dimensionality-reduction approaches.

      We have clarified in the documentation that the included UMAP/HDBSCAN workflow is intended as an exploratory demonstration rather than as a required or privileged analysis pipeline.

      Finally, what specific GPUs have been tested with the package, and are there any limitations based on the age of the video card or the available libraries for the deep learning component of the package?

      As noted above, GPU compatibility is determined by the deep-learning and hardware-acceleration libraries on which SqueakPose Studio depends, including PyTorch, Ultralytics, CUDA, Apple MPS, and ROCm. Our development ethos is to track current stable versions of these packages rather than maintain separate legacy dependency stacks. This improves performance, simplifies support, and allows users to benefit from ongoing improvements in upstream libraries, but it also means that older GPU architectures may lose support as they are deprecated by those upstream tools.

      For NVIDIA systems, the current package is indexed against CUDA 13.2. CUDA 13.x has deprecated support for some older GPU architectures, so users with older NVIDIA cards may need to use CPU inference or upgrade hardware. However, CUDA 13 is supported on GeForce RTX 20-series, 30-series, 40-series, 50-series, and professional equivalents. We made this clearer in the documentation and provided links to upstream CUDA, PyTorch, and Ultralytics compatibility resources so users can determine whether their hardware is supported.

      For Apple Silicon, the package can use PyTorch MPS acceleration, which supports M-series chips. For AMD GPUs, we do not currently maintain AMD-specific test hardware, but PyTorch supports ROCm on Linux for supported AMD GPUs. ROCm support is more limited on Windows, so AMD users should consult the current PyTorch ROCm compatibility documentation.

      Overall, our support commitment is to maintain compatibility with current upstream deep-learning frameworks rather than to guarantee support for all older or vendor-specific GPU configurations.

      Reviewer #2 (Public review):

      Summary:

      This work presents three tools: SqueakPose Studio, which is used for pose estimation; SqueakView, which is used for real-time video and sensor data capture and analysis; and MouseHouse, which is a behavioral and sensor suite for mouse experiments. Together, these tools provide a comprehensive behavioral platform for acquiring and analyzing video, sensor, and behavioral data. The work is open source and provided as a resource for the field.

      Strengths:

      (1) Squeakpose Studio was relatively easy to install and use. We were impressed that we were able to install it and test our own videos with minimal struggles. The authors provide installation tutorial videos that were very helpful.

      (2) The GUI environment for SqueakPose Studio was very usable, and the authors should be commended on the time and effort that went into improving the useability of their system. The keypoint and skeleton configuration was flexible, allowing us to define custom body part sets without modifying code directly. The pose estimation accuracy on our own videos was good right out of the box, without requiring fine-tuning or retraining. For a tool being evaluated for the first time, this was all very impressive!

      We thank this reviewer for the positive comments and have provided responses to the specific potential weaknesses noted below.

      Weaknesses:

      (1) While we were able to install and test Squeakpose Studio, it was not entirely seamless. The primary installation resource is a tutorial video, and we would recommend supplementing this with a written installation checklist that explicitly lists all required software dependencies (e.g. Python, UV, Visual Studio). The tutorial video was also at times unclear in distinguishing required from optional components. For example, Visual Studio is described as not necessary, yet the tutorial demonstrates the workflow entirely within that environment, so it may be challenging for a user to follow along without that. We recommend that the authors adopt a stricter, step-by-step installation guide that is prescriptive about required software and leaves little room for confusion.

      We thank the reviewer for this helpful feedback and agree that the installation workflow should distinguish more clearly between required and optional components. Our goal with SqueakPose Studio is to place as much functionality as possible in the GUI so that users are not required to rely on command-line tools for additional features or advanced use. For that reason, the command-line surface is intentionally minimal: after the repository is cloned and the UV-managed environment is created, almost all functionality is accessed through the graphical interface.

      We also appreciate the opportunity to clarify the point about Visual Studio. The tutorial video demonstrates the workflow using Visual Studio Code, not Visual Studio. Visual Studio Code is optional and is used in the video only as a convenient editor and interface for demonstrating the workflow. The GUI can also be launched directly from a terminal, and users may use any preferred editor or IDE, including VS Code, Zed, Cursor, Jupyter-based workflows, or no IDE at all.

      We have updated the written README and YouTube walkthrough to make this distinction clearer. Specifically, provided a stricter installation checklist that separates required components, such as Python and UV, from optional tools, such as VS Code or other editors. We also demonstrated launching SqueakPose Studio directly from a terminal so users can follow the workflow without relying on a specific IDE.

      (2) The paper also describes SqueakView and MouseHouse. Unfortunately, we were unable to evaluate these components as both require the MouseHouse hardware platform. Even without directly using MouseHouse, we noticed some incompleteness here, as we could not locate a bill of materials, component pricing, or assembly guide in the paper or associated GitHub repositories. Given that affordability and accessibility are central claims, a consolidated parts list, approximate costs, and a build guide or video would be necessary for most labs to realistically decide whether they plan to replicate the hardware and evaluate this functionality that the paper describes. In this regard, we felt that MouseHouse and potentially SqueakView were not sufficiently documented for publication.

      We agree with the reviewer that MouseHouse and SqueakView are more difficult to evaluate than SqueakPose Studio because they involve dedicated hardware, including an edge-compute platform. This is an unavoidable tradeoff for a system designed not only for offline pose estimation, but also for real-time acquisition and deployment. We recognize, however, that if the manuscript emphasizes affordability and accessibility, then users need a clear way to estimate cost, order components, assemble the system, and reproduce the hardware configuration.

      We have therefore added a consolidated bill of materials to the GitHub repository, including component names, approximate pricing, and suggested sources where appropriate. We now provide a complete guide for connecting the hardware and flashing the required firmware/software to the devices. This documentation makes clearer what is required for MouseHouse-specific functionality versus what can be used independently through SqueakPose Studio.

      We also note that edge-compute devices such as the Jetson Orin Nano are increasingly common in robotics and real-time computer-vision applications, but we appreciate that many behavioral neuroscience laboratories may not yet have this hardware in place. For some users, this paper may be their first exposure to this compute platform. For that reason, we agree that the repository should provide more complete onboarding materials for labs that wish to adopt the hardware ecosystem, and we now provide that.

      (3) The benchmarking comparison to DeepLabCut (DLC) introduced multiple challenges that left us unclear if the head-to-head comparison was appropriate as described. First, the dataset used for benchmarking was small and homogeneous, from the methods they used "10 min open-field tasks of single mice with bilateral photometry cables." As such, the claims about comparisons between SqueakPose Studio and DLC may be too broad, given this single test case. Specifically, this dataset does not test robustness across lighting conditions, coat colors, species, occlusions, different-shaped arenas, etc. Second, the comparison to DLC in Figure 1 does not include any quantitative statistical comparisons, which are needed to evaluate the claims that were made. For instance, the error in Figure 1e looks worse for their system than DLC, although statistical comparisons were not made. Third, there are many settings and optimizations that can be made for both systems. Without more detail, this makes it hard to know if the head-to-head comparison is really fair. Fourth - the metrics are given as very specific numbers from single runs, i.e., an inference time of 71.59 minutes in Figure 1d. This metric would be more meaningful if it reported the mean of multiple runs, with error estimation. Finally, while the code is available, the trained datasets are made available only on "reasonable request". Given the importance of these datasets to evaluating the method and allowing others to benchmark it against other systems, these should be made available on GitHub. Overall, I would recommend toning down the comparison to DLC and focusing on the strengths of Squeakpose Studio on its own merits.

      We appreciate the reviewer’s thoughtful comments about the benchmarking comparison. We agree that no single dataset can establish universal performance across all lighting conditions, coat colors, species, occlusion regimes, arena geometries, or camera configurations. Our intention was not to claim that SqueakPose Studio is superior to DeepLabCut under every possible condition, nor to present a comprehensive benchmark across the full space of pose-estimation use cases. Rather, the benchmark was included as an applied demonstration of performance in a representative behavioral neuroscience workflow involving mouse open-field videos with photometry cables.

      We also agree that users can substantially affect performance in any pose-estimation framework through model selection, training settings, hardware configuration, inference parameters, and optimization choices. For this reason, we view the comparison as a practical workflow benchmark rather than a definitive ranking of all possible DLC and SqueakPose Studio configurations. The primary contribution of SqueakPose Studio is not simply that it is faster in one head-to-head comparison, but that it provides an integrated GUI-based workflow for pose estimation, review, export, and real-time/edge-AI deployment.

      That said, the speed improvements are not incidental. They reflect deliberate architectural and deployment choices, including the use of modern object-detection/pose-estimation architectures and optimized inference workflows. In practice, these choices can substantially reduce inference time relative to workflows that were not designed around the same deployment constraints. We will be careful in our public response and documentation not to overstate this as a universal claim across every dataset or every possible DLC configuration.

      Regarding statistical comparisons and repeated runs, we agree that reporting means and variance across repeated benchmark runs can be useful. However, because this manuscript is primarily an applications and methods resource rather than a large-scale benchmarking study, we do not intend to benchmark every relevant dataset class or hardware configuration. We instead encourage users to evaluate SqueakPose Studio on their own videos and hardware, which is ultimately the most informative test for adoption in a given laboratory.

      Regarding the trained datasets and models, we agree with the reviewer that broad access improves reproducibility and benchmarking. The limitation is practical rather than philosophical: the full benchmark datasets are large and are not well suited for direct hosting in a GitHub repository. We currently make these data available upon reasonable request and have included a Zenodo repo explore more appropriate public hosting options for large files, such as an institutional repository, Zenodo, OSF, or another archival data platform. We will also clarify the availability of trained models and example data so users can more easily reproduce or extend the benchmarking workflow.

      Overall, we agree that SqueakPose Studio is strongest when evaluated on its own merits: accessibility, speed, GUI-based usability, flexible keypoint configuration, real-time deployment, and integration with acquisition and edge-compute workflows. We now frame the DLC comparison as a representative applied benchmark rather than as an exhaustive claim of general superiority.

      (4) The paper at times makes general statements that are beyond what is shown. For instance, discussions of use in human applications are aspirational and should be treated much more conservatively in the discussion, or possibly even removed. As it stands, the discussion implies that this system can already do "zero-shot tracking of human posture and movement", enabling "a bridge between preclinical and clinical behavioral analysis". In principle, this may be true, but even for a Discussion section, this goes far beyond the capabilities that the paper actually shows.

      We appreciate this comment and agree that the manuscript should distinguish more clearly between capabilities demonstrated in the present study and broader potential applications of the software architecture.

      SqueakPose Studio and SqueakView are not intrinsically mouse-specific. Users can define custom classes, keypoints, and skeletons, train compatible pose-estimation models for other organisms or experimental preparations, and deploy those models using the same acquisition and inference workflow.

      To make this technical capability concrete, the SqueakView repository now includes deployment-ready FP16 model packages for both the validated MouseHouse-specific pose model and a stock human-pose model. The included human-pose model demonstrates that the deployment architecture can support zero-shot human posture tracking without requiring changes to the underlying SqueakView pipeline.

      We agree, however, that this technical compatibility should not be interpreted as validation for clinical behavioral analysis. The experimental demonstrations in the present manuscript focus primarily on mouse behavioral datasets. Any clinical application would require separate benchmarking, validation, and domain-specific evaluation beyond the scope of the present manuscript.

      (5) While the comprehensive nature of the system and its 3 parts is impressive, I felt that it also detracted from the main focus of the paper, which was Squeakpose Studio. I might recommend dropping the other two parts, as they also require a much higher bar for a user to evaluate, and only present the Squeakpose Studio in this paper, presenting this as a general resource for pose estimation. This would also allow them more space to more comprehensively benchmark SqueakPose Studio.

      We appreciate this perspective and agree that SqueakPose Studio is the most immediately accessible component of the platform for many users. However, we respectfully disagree that MouseHouse and SqueakView should be removed from the paper. The motivation for developing SqueakPose Studio was not simply to create another offline pose-estimation and analysis tool, but to enable real-time behavioral detection and deployment on edge hardware. SqueakView and MouseHouse provide the acquisition and deployment context that motivated the software architecture and demonstrate how the platform can be used in closed-loop or real-time behavioral workflows.

      In developing the system, we recognized that SqueakPose Studio also functions as a user-friendly general pose-estimation interface, with features that may be useful even for laboratories that do not adopt the full MouseHouse/SqueakView ecosystem. For that reason, we presented it as both a standalone tool and as part of a broader acquisition and deployment platform.

      We agree that this makes the manuscript broader than a paper focused exclusively on pose-estimation benchmarking. However, we view that breadth as important: the paper is intended to serve as a central, peer-reviewed entry point for laboratories interested in deploying real-time pose estimation in behavioral experiments. The manuscript points users to the relevant repositories, documents the design rationale, and provides a source of peer-reviewed validation for the integrated workflow. We have clarified in our response and documentation that users can adopt SqueakPose Studio independently, while MouseHouse and SqueakView support the broader real-time hardware ecosystem.

    1. This study is an important contribution to our understanding of waterfowl conservation and population ecology in Europe. Recovery of marked birds, typically through harvest by waterfowl hunters, is an important means of obtaining data to assess survival and harvest probabilities in waterfowl, but the ability to differentiate between natural and harvest mortality requires a better understanding of reporting probabilities (the proportion of banded/ringed birds that are harvested by hunters that are also reported to banding authorities). In North America we have had numerous studies using reward bands to estimate this “band reporting rate”, but comparable studies have not been conducted elsewhere, until this study. I thoroughly reviewed this preprint and my overall assessment is strongly supportive. I have only a few suggestions for potential improvement.

      It might be nice to bound the reporting rate estimates between 0 and 1 by formally including reporting rate in the model likelihoods rather than estimating it as a derived parameter. I can’t use the link to your code, so I’m unable to see exactly how you modeled this, but you could seemingly model reporting probability directly by including it in the likelihood anywhere that Brownie’s f or Seber’s r appears for birds marked with reward rings.

      Lines 328-330: You conclude this paragraph with a statement about your results supporting additive mortality from hunting, but the rationale for this isn’t explained (I’m not disputing your claim, but you haven’t clearly articulated why you believe your results support partially additive mortality). The stark difference in estimated harvest probabilities between newly ringed and previously ringed (i.e., direct vs. indirect in North American terminology) suggests that heterogeneity in vulnerability to harvest might (also) be very important in these populations and thereby contribute to compensation of harvest. Coauthor Emilienne Grzegorczyk presented intriguing results on survival heterogeneity at the latest EURING conference and it might be worthy of a little bit of discussion here.

      Minor edits: Line 77 or thereabouts: Because there is an extensive literature on reporting probabilities from North America, but quite different terminology, it might be nice to include a Methods paragraph clarifying ring/band/tag recovery as identical, young vs. adult and hatch-year vs. after-hatch-year, and define the terms direct vs. indirect recovery in terms of time since marking.

      Line 154: In addition to the inscription included on reward rings, it would be helpful to indicate the exact inscription provided on standard rings. In North America we observed a pronounced increase in band reporting probabilities when band inscriptions were modified to include toll-free phone numbers and later, web addresses.

      When do (most) of your recoveries occur? It would be helpful to include information on timing of harvest in France. Given that you include season of banding as a covariate on survival, subsequent estimates of survival beyond the first year will be hunting season to hunting season. It might be nice to more formally address timing of banding by including a “partial year survival” term in the first diagonal of your m-arrays. This could be a shared annual survival term, but partitioned into portions based on how much of the year an average bird would have to survive (e.g. S^(5/12) if 5 months or S^(9/12) if 9 months).

      In North American ducks, we would expect to see pronounced differences in seasonal survival between sexes due to breeding risks incurred by females. For example, spring releases of female mallards would be expected to have lower survival to the first hunting season than spring releases of males. It might be nice to indicate in the methods that you ignored sex in your analysis given small sample sizes (given interactions with species, age, and timing, it might require 6-12 df to properly address), but future analyses based on additional data might wish to investigate sex differences in both survival and recovery probabilities.

      You have a nice literature review, but there are a few additional papers that would be worth including: Lines 64-65: Either of the two Riecke et al. 2022 Journal of Animal Ecology papers would be good to cite for an example of how reporting probabilities can help partition annual survival into harvest and natural mortality. Koons et al. (2014, Wildfowl) would be a nice paper to cite here for life-history differences in relation to body size. The results from nasal-marked teal are intriguing, and I suspect that nasal markers might influence survival, vulnerability to harvest, and reporting probability. Arnold et al. 2016 J. Wildl. Manage., Szymanski et al. 2020 Wildl. Soc. Bull., Reinecke et al. 1992, J. Wildl. Manage., Caswell et al. 2012, J. Wildl. Manage.).

      Minor changes to wording: Abstract, line 49: I think you mean “subjected” rather than “submitted”. Intro, line 57: “elaborate” rather than “elaborated”. Intro, line 83: use “to” instead of “on”. Intro, line 89: use “of” instead of “or”. Methods, line 126: use “drop-door” instead of “door-falling” Line 161: “departmental”. Line 196: “parameter” (not plural). Line 298: “a heavy predator-control program was in place”. Line 344-345: Curiosity effect has been hinted at in some other research.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Freas and Wystrach present a computational model of steering in insects. In this model, the central complex provides an error signal indicating the animal should turn left or right; this error signal biases the function of an oscillator composed of two mutually inhibiting self-exciting units. The output of these units generates a "steering signal" that is used both to set the direction and speed of the ant. Additionally, a separate module induces pauses, and an inverse relation between forward speed and turning speed is externally imposed. Statistics of the trajectories generated by the model are compared to the measured behaviors of ants.

      Strengths:

      While the model is very simple compared to state-of-the-art models, that simplicity makes it a potentially useful guide to researchers studying insect navigation. Some predictions that emerge from the model appear to be experimentally testable, although a more complete description of the model and its parameters, as well as an analysis of how this model's predictions differ from previous models' predictions, would be required to design these experiments.

      Weaknesses:

      I found it difficult to identify evidence in the paper supporting central elements of the abstract. Hopefully, these difficulties can be resolved with a clearer presentation and the addition of supporting detail, especially in the methods.

      (1) The model is not clearly described

      In the Materials and Methods, there is no description of the model, just "The computational model is presented in Figure 1." (This is probably a typo and may refer to Figure 2A-C), and a link to Matlab source code. It is inappropriate to ask readers or reviewers to examine source code in lieu of providing a method, but I attempted to do so anyway. 

      We have now added a full description of the model in the methods.

      To my eye, the source code does not match the model presented in 2A-C. For instance, in 2C, "Steering signal" inhibits "Freeze", but I couldn't find this in the source. "Freeze" is shown to inhibit "steering signal," but as "steering signal" is a signed quantity, it's not clear what this means. Literally, since "ang_speed_raw = L-R," it would seem to indicate the "freeze" would bias towards right turns. In the code, "freeze" appears to be implemented through the boolean variable "speed_inhibition_time." The logic controlled by this variable doesn't appear to inhibit the "steering signal" but instead (depending on control parameters) either reduces the movement speed and amplifies the turning rate, or it turns the angular speed output into a temporal integral of the control signal.

      We understand the confusion. Our neural implementation does not go downstream of the neural steering signal (Left and Right Descending neurons), and the way it is transformed into a movement (ang_speed_raw = L-R) is not modelled neurally (the formula is explicitly shown on the right hand side of Figure 2). Indeed, we did not attempt to put forward any assumption about neural implementation for our freezing signal (see our response to comment 2 below). To avoid confusion, we have now removed the reciprocal inhibition portion as it was previously drawn in Figure 2C, and replaced it by a non neural sign (a cross, indicating that the signal is blocked) acting between steering signal and movement.

      There are a number of parameters in the source code that aren't described at all in the paper, including the internal oscillator parameters.

      We now provide all the parameters in the methods, together with figures showing the dynamics of oscillations across parameter range, and a rationale for their choice (see Supplemental Figure 2).

      Together, these limitations make it difficult to understand what is being simulated, what parts of the model are tied to biology, and where the model improves on or departs from previous work.

      It is absolutely essential that authors fully describe the computational model, that they explain the meaning of all parameters of the model, and that they explain how the particular values of these parameters were chosen.

      This is now done in the methods section under the “Model Overview” subsection.

      (2) The biological inspiration is unclear

      A central claim of the paper is that the model is "biologically grounded." But some elements, for instance, using a signed quantity to represent left-right steering drive, are not biologically possible; at best, these are shorthand for biologically possible implementations, e.g., opposing groups of left-right driving neurons.

      The mechanism that produces fixations and saccades - the "freeze" module - is not tied to any particular anatomy of the insect brain. Initiation of a freeze occurs at a specific time coded into the model by the authors; it is not generated by an internal model signal. Release of a freeze is by drawing a random variable; there is no neural mechanism proposed to generate this signal.

      We now clarified what is neural from is not from the introduction onwards, for instance:

      “Because we did not want to form pre-assumptions for how such a ‘freeze signal’ could be implemented in the insect nervous system; in our model this was achieved using a simple external signal that halts forward motion at random intervals.”

      In some versions of the model, instead of directly controlling the signal, during fixations, the angular drive signal is integrated into a variable "cumul_drive." No neural substrate is proposed for this integrator. In the code, if cumul_drive passes a threshold, the angular heading of the ant changes (saccades), but only if this threshold is passed before the Poisson process ends the fixation. No neural substrate is proposed for any of this logic.

      This has now also be clarified in the introduction:

      “During scanning, real ants display rotational saccades of variable duration and angular magnitude (Figure 1A–C). To replicate this, we introduced a threshold-based mechanism: after each fixation (i.e., zero angular and forward speed), the underlying angular steering signal accumulates until surpassing a threshold, triggering a saccade. The resulting angular magnitude of the saccade corresponds to the sum of the angular drive accumulated during the fixation. Here also we stuck to a non-neural, straight-forward algorithmic level, as we did not want to make assumptions about how such a cumulate-and-release mechanism could be neurally implemented in the insect brain (see discussion for potential implementations).”

      The model steps forward in time by a fixed increment - the actual duration (in seconds) of this time step is not specified. From Figure 4F, G, it appears a simulation time step is meant to be about 10ms. This would imply an oscillator frequency of about 2 Hz (Fig 2B), that the heading oscillates at a similar frequency (2G), and that a forward crawling ant stops moving every 500 ms (2I). Are these plausible? Can they be compared to an experiment? Model parameters, including the ones that control the frequency of the oscillator, are non-dimensionalized. It is not possible to evaluate whether these parameters are biologically plausible or match experimental results.

      We now added a figure showing the oscillatory dynamics of the oscillator across parameter ranges (supplemental figure 2). The step increment (i.e., and thus the sampling rate along an oscillatory cycle) necessarily varies according to the inhibition strength and self decay parameter chosen (e.g., small parameter values will lead to small step increment, and thus a high sampling rate along the oscillatory cycle). We chose oscillatory parameters to ensure that the sampling rate will be high enough to resolve multiple saccades within one oscillatory cycle and that sampling rate is small enough for computation time to remain practical.

      Beyond these constraints, the oscillator parameters can be chosen arbitrarily, and a conversion of time step to actual time (ms) would be equally arbitrary and give the illusion that the model captures the data quantitatively. Because we did not model spiking neural dynamics (or brain region low field potential frequencies), we can not constrain our model through a temporal link between brain clock and behavioural speed. We thus prefer to stick to the true and non-dimensional label ‘time steps’ in our figures.

      (3) Claims that behaviors emerge from the model may be overstated

      The abstract claims that steering correction and fixations/saccades emerge naturally from the same model. But it appears to me that fixations/saccades are externally imposed by the specification of specific times for a "freeze." Faster angular rotation during saccades than during course correction is imposed and does not emerge naturally from neural simulations.

      The abstract now clarifies that what emerges spontaneously is not scannings per se (indeed, the inhibition of movement is externally imposed) but their dynamics. Note that our model captures many aspects of scanning dynamics that are not trivial and which results from the dynamical interactions and contingencies between modules (figure 3 to 7), hence justifying the word ‘emerge’ insofar as these behavioural dynamics cannot be reduced to one module or parameter. Regarding the faster angular rotation during scanning, we agree that its cause is rather straightforward to understand: it results from the added bodily constraints of forward speed to rotational movements. Nonetheless it is not ‘imposed’ during saccades in the sense that 1.) it is biologically/physically evident rather than cherry picked and 2.) it is continuously present in our model, even during forward navigation. We believe the new version of the manuscript now conveys this message in a transparent manner.

      (4) Citations to previous literature are difficult to follow, and modeling results are presented as though they are experimental data

      I would ask the authors to be much clearer in their description and citation of previous work. It should be clear whether the cited work was experimental or computational. To the extent possible, the actual measurement should be described succinctly. Instead of grouping references together to support a sentence with multiple claims, references should be cited for each claim. Studies of computational models should not be presented as proving a biological result.

      Indeed, This we now clearly separated citations referring to experimental evidence vs. modelling. See examples citations below

      For example:

      (a) Lines 141-146:

      "Previous studies have established many key components of insect navigation, including .... the intrinsic oscillatory dynamics in the lateral accessory lobes (LALs) that support continuous zigzagging locomotion (Clément et al., 2023; Kanzaki, 2005; Namiki and Kanzaki, 2016;

      Steinbeck et al., 2020)."

      The first reference is to one author's previous modeling work - it hypothesizes that oscillations in the LAL support zigzagging but includes no data that would "establish" the fact. Kanzaki et al. 2005 describes numerical modeling and simulation with a physical robot. Namiki and Kanzaki, 2016 is a review article that links the LAL to zigzagging behavior. It describes the LAL as a winner-take-all bistable network but does not describe or hypothesize that the LAL has intrinsic oscillatory dynamics. Steinbeck et al. 2020 is a more comprehensive review; it reinforces that the LAL is a winner-take-all bistable network that drives left-right steering, including during zig-zagging behavior. But in my reading, I could not find a statement that the LAL has intrinsic oscillatory dynamics (the closest is Steinbeck et al. saying the activity pattern switches regularly, as does the behavior; this doesn't imply that the LAL is intrinsically oscillatory.)

      It now reads:

      “Previous studies have established many key components of insect navigation, notably, how goal headings are set in the central complex (CX) (Fisher, 2022; Green and Maimon, 2018). Modelling efforts have shown that the CX circuitry can naturally accommodate innate and learnt guidance such as path integration, learn vectors, visual route following or homing as observed in ants and bees. In parallel, oscillatory dynamics in the lateral accessory lobes (LALs) - produced by reciprocal inhibition across both hemispheres and conveyed by so-called descending flip-flopping neurons - were shown to drive the spontaneous zigzags displayed by moths upon losing their pheromone plume (Kanzaki and Mishima, 1996; Mishima and Kanzaki, 1998, 1999; Wada and Kanzaki, 2005; Kanzaki et al., 2005; Iwano et al., 2010). Here also, subsequent modelling efforts have shown how these circuits can equally support the continuous lateral oscillations displayed by a wide range of insect species, including ants.”

      (b) Lines 701-703:

      "In plume-tracking moths, CX output has been shown to modulate LAL flip-flop neurons driving zigzagging (Adden et al., 2022)."

      This reads as though an experimental measurement was made, but in fact, this is modeling work.

      Yes, this could be clearer, it now reads: 

      “In moths, descending neurons in the LALs exhibit characteristic 'flip-flop' activity patterns that correlate with zigzagging maneuvers (Olberg, 1983; Kanzaki and Ikeda, 1994). Computational models suggest that having these LAL neurons modulated by the CX output can explain aspects of the moths’ plume-tracking behaviour (Adden et al., 2022).”

      (c) Lines 703-706:

      "In ants, strong goal signals in the CX - whether elicited by the path integrator or visual familiarity (Wehner et al., 2016; Wystrach et al., 2020b, 2015) do not only sharpen directional accuracy but also increase oscillation frequency (Clément et al., 2023)."

      Here again, modeling results are presented as though they were experimental data.

      Here, we are referring to the experimental part of these works, although this comment demonstrates that our statement should be more clear in stating what are biological results. It now reads: 

      “In ants, behavioural studies show that strong directional drives elicited by the path integrator or visual familiarity do not only gain behavioural weights and sharpen directional accuracy (Wehner et al., 2016; Wystrach et al. 2015, Legge et al. 2014) but also increase the ants’ oscillation frequency (Clément et al., 2023). Assuming that path integrator and visual familiarity modulate goal signals in the CX, as modelled here and elsewhere (Wystrach et al., 2020b, Stone et al., 2017) and that the intrinsic oscillator is in the LAL (Clément et al., 2023, Steinbeck et al., 2020), it suggests that CX output modulates the intrinsic oscillatory activity of the LAL”

      Reviewer #2 (Public review):

      Summary:

      The paper by Freas and Wystrach is an interesting computational study, exploring the detailed mechanisms of how simple neural circuits could explain complex behavioral patterns observed in navigating ants. The authors compare detailed, high-speed video recordings of Australian desert ants (Melophorus bagoti) with predictions made by their new computational model and find convincing similarities between the model and the behavioral data, at a level of detail not previously studied. Particularly interesting are emerging properties of the model, yielding behavioral motifs it was not designed to reproduce, but which occur in natural ant behavior.

      Strengths:

      A strength of the study is that the model is based on previous models, without making major novel explicit assumptions. It combines existing models of the insect central complex with a model of the lateral accessory lobe and adds a stochastic inhibition of forward velocity to the interaction of central complex and lateral accessory lobes. The central complex provides corrective steering signals when the goal direction and the current heading of an insect are not aligned, while the lateral accessory lobes provide an intrinsic oscillator underlying the behavioral oscillations shown by walking ants at all times. These background oscillations are modulated by the steering signals from the central complex. Depending on which phase of the intrinsic oscillations coincides with the corrective signals, and how fast the ant is moving forward during this time, a complex set of behaviors emerges. Most prominently, scanning behaviors, which are regularly carried out by the ants, are recapitulated in great detail by the model. Additionally, other behaviors, such as full loops, emerge naturally from the model. While computational models are not to be seen as definite evidence for any biological reality, they can provide strong support for particular neural implementations. The current study is an excellent example in that it provides evidence for a serial arrangement of central complex circuits upstream of the lateral accessory lobe circuits, modulated by speed-regulating input. While the latter is hypothetical, it yields a clear hypothesis that can be validated by connectomics studies and functional work in the future.

      The study shows that even complex behavioral motifs do not require dedicated neural modules, but can rather emerge from the interplay of already known circuits - highlighting the efficiency of insect brains and possibly providing the path towards embodied hardware solutions of such circuits in autonomous agents.

      Weaknesses:

      There are several weaknesses in the paper as it is.

      Firstly, the model is not described in the methods, but only found when following the link to the authors' GitHub repository. This is clearly not sufficient and prevents readers from evaluating the model's assumptions directly. Most importantly, how natural do the emerging properties indeed emerge from the model? What parameters need to be tuned to generate a match between data and model?

      We have now added a full description of the model in the Methods section.

      These include:

      Mathematical equations for model components

      Complete parameter table along with justifications

      Description of what is fitted vs. what emerges 

      Key assumptions and limitations

      Regarding the emergence of scanning properties: The model has two types of parameters:

      Parameters tuned to match general navigation behavior (independent of scanning):

      Motor gains (g_ang, g_fwd, k): adjusted to produce realistic continuous walking paths and species differences between desert ants and Myrmecia

      CX gain (g_CX = 0.5): set to produce appropriate corrective steering strength during continuous navigation

      Oscillator parameters (α, β, s): are taken from Clément et al. (2023)

      Parameters tuned to match scanning behavior:

      CPG angular threshold (θ_CPG = 2.0): adjusted to generate realistic saccade timing Scan termination probability (p_stop = 0.5/timestep): matched to the Poisson-like distribution of scan durations in M. bagoti

      Properties that emerge without specific tuning:

      Fixation-saccade alternation structure (emerges from angular drive accumulation mechanism)

      Directional reversals (arise from oscillator dynamics competing with CX steering)

      Corrective saccade amplitude increasing with angular deviation (Figure 3)

      Rare full-loop scans (emerge from CX signal shifting oscillator phase)

      The behavioral continuum from straight paths → oscillations → voltes → scans (Figure 8)

      We have clarified this distinction in the Methods section and emphasized that our goal was qualitative demonstration of emergence rather than quantitative parameter optimization.

      Second, it is often not entirely clear what is biological data and what is a computational model. This relates to figures, text, and references. As a reader, this makes it difficult to clearly judge what is new in the current paper, how it adds to previous models, and what the predictions and assumptions are for biology.

      Indeed, we have now clarified the manuscript, clearly separating when we refer to behavioural data, neurobiological data and modelling. In the figures, each panel now clearly indicates if it is model data or biological data so that any reader can immediately tell the data type.

      Third, while neural data from bees and flies are taken to motivate and design the computational model, the discussion and interpretation revolve almost exclusively around ants. For the most part, this is justified, as the behavioral data used to benchmark the model are taken from ants. Nevertheless, more broadly discussing the newly defined circuit in the context of flying insects would give a better idea of the broad relevance of the neural circuits predicted by the model.

      To address this suggestion we have now added two paragraphs in the discussion called: “Scanning in flying hymenopterans”.

      Also happy to add more to this section if requested.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      As mentioned in the public review, I suggest fixing the two concerns I have regarding methods and discussion.

      (1) Include a full description of the model in the methods, so that the model remains reproducible even if the GitHub repo is deleted in the future.

      True, the code’s internal explanations could indeed be removed from GitHub later. The model component overview are now included in text.

      (2) Include the relevance of the model for flying insects in the discussion more prominently. This seems to be an implicit assumption in the model, as neural data from bees and, more prominently, from Drosophila are used to motivate the model to explain ant data.

      Add an “Expression in flying hymenopterans” section at ~line 834.

      Minor points:

      (1) Line 207: I suggest adding the recent review by Collett, Graham, and Heinze (2025, Current Biology), as it proposes interactions between LAL and CX as well.

      Added

      (2) Figure 4: I'm interested in the conversion from steps in the model to real units (ms) in the ants. In Figures 4F and G, it seems that 5 model steps represent circa 100ms. Does this allow us to define the neuronal time constants of the model neurons? If so, are the resulting values biologically plausible? This seems important when describing real-world dynamics being created by a model circuit.

      No the model is time agnostic.

      (3) Figure 7: Font sizes of axis labels are much too small. Also applies to other figures. Please ensure that when printed, labels can be read.

      Enlarged axis labels in all figures. 

      (4) Line 645: proprieties -> properties?

      Fixed. Thanks!

      (5) Figure 7: The figure heading states: "Slow forward speed (Myrmecia) example". This sounds as if real data from ants are shown here, while these are modeling data. It is clear after reading the text and caption in detail, but I was taken off course briefly here. Please make sure that there is no possibility of being misled here.

      We have altered the subtitle to “Slow forward speed (Myrmecia Model) example”. 

      Additionally, we have added a Model tag under each of the model image labels so classification can be done at a glance.

      (6) General discussion: What about search dynamics, i.e., increasing loops when not finding the nest entrance after homing? Are those emerging from this circuit as well? Or would that need to be a separate module? There have been discussions about search emerging from the PI circuit, but as far as I know, this is not settled, and it would be good to know if the current circuit adds something useful to this aspect.

      Because we kept a fixed goal heading, our model does not bring insight about overall trajectories such as search pattern. We now mention in the discussion:

      “In our simulations, the CX goal representation remained fixed in both direction and strength throughout each trial. This simplification allowed us to isolate and compare the effects of different CX strengths on scanning behaviour (Figure 6). However, goal headings in the CX are likely to be updated continuously, including during scans, by novel input from visual recognition in the MB (ref). This would in turn bias saccades direction and duration. Exploring such dynamics lies beyond the scope of the present study but would represent an interesting direction for future work. Notably, our proposed CX-LAL-Body relationship could be implemented downstream of an existing path integration or visual-based model (or both) to form predictions about the occurrence and dynamic of scans along the path, as well as their impact on the emerging trajectories.”

      (7) Line 690: The modulation of PFL3 by PFL2 was presented as a hypothesis in Westeinde et al., consistent with the data, but as far as I know, this is not an established fact.

      You are correct. We have now softened the text, which now reads: “In Drosophila, it has been proposed that PFL2 neurons, which respond maximally when the fly faces away from the goal, modulate steering gain by converging with PFL3 neurons (which drive left or right turns) onto downstream descending neurons (Westeinde et al., 2024).”

      (8) Please ensure that Drosophila is consistently spelled with a capital D and in italics.

      Fixed throughout the text.

      (9) Line 702: Reference Adden et al 2022: This reference is a modeling paper; it sounds as if you are referring to an experimental moth paper, though. Rephrase to clarify.

      You are correct, this could be unpacked much better regarding what is modelled and what has been experimentally shown. Changed to:

      Descending neurons in the LALs exhibit characteristic 'flip-flop' activity patterns that correlate with the zigzagging maneuvers of plume-tracking moths (Olberg, 1983; Kanzaki and Ikeda, 1994). Recent computational models suggest that CX output directly modulates these LAL circuits to coordinate orientation (Adden et al., 2022). 

      (10) Line 761: I would assume that during scans, information is acquired that would decrease uncertainty and thus, as a result change the amplitude of the CX steering signal. Maybe I missed this, but is this closed-loop interaction integrated in the model?

      In our simulation the CX goal representation remains stable in direction and strength throughout the trial. This enabled us to compare neatly the effect of different CX strengths on scanning. However, we fully agree with you that goal headings in the CX might well be continuously updated, both during scans and between scans! The goal heading novel strength or direction may thus bias the scan further left, right, in front or in the back, and also up or down regulate scan duration in both directions. 

      Modelling this would require adding a layer of complexity to determine how the goal heading is updated, which is beyond the scope of the current work, but would form a remarkable project for the future. We now mention this in a dedicated paragraph in the discussion section “Model limitations and future directions”

      (11) Line 814: Please add 'fly' in front of larva. Other insect larvae have a fully developed CX.

      Corrected. Added fly to this sentence 

      (12) Line 815: Maybe add the recent review, Heinze 2025.

      Added this one (Heinze 2024) which seems to fit the best and the 2025 Curr Biol Review doesn't quite fit this line (cited elsewhere though): 

      Heinze, S. (2024). Variations on an ancient theme—the central complex across insects. Current Opinion in Behavioral Sciences, 57, 101390.

      (13) Methods: Subheading formatting should start with capital letters.

      Ah yes, the second level of subheadings got formatted weirdly. Fixed now.

    1. Reviewer #1 (Public review):

      In this work, Jiqi Shao and colleagues evaluate the microbial iron competition and siderophore-mediated interactions combining (a) a dynamic modeling framework based on the consumer-resource model, including multiple siderophore and siderophore-receptor types, and (b) a graph-theory framework based on directed graphs to quantify the ecological dependencies of the community (referred to as Benefit Transfer Graph). Through a plethora of simulation experiments, by changing the number of species in the community, the ratio of pure-cheaters, and the number of foreign siderophores a partial-producers can utilize (referred to in this study as 'Cheating Breadth'), the authors found:

      (1) Using simulations of small communities of 5 or fewer members, they observe that closed benefit-transfer loops (commensalism/mutualism loops) serve as the structural scaffold for diversity, observing coexistence, dominance, or dynamic fluctuations in function of the fraction of receptors in species and the number of community members.

      (2) Using simulations of large communities of 50 members, they observed a paradox on the capacity of partial producers to utilize different foreign siderophores (referred to in this study as 'The Paradox of Cheating'). They observed that broad 'Cheating Breadth' of partial-producer members increases the probability of community-wide extinction and can act as destabilizing forces. However, at the same time, 'Cheating Breath' of partial-producer members promotes species richness and community biodiversity.

      (3) The application of graph-theory framework helps to unveil ecological complexities of small and large microbial communities, explaining the aforementioned Paradox of Cheating.

      As major strengths of this work, the authors present a novel modeling framework considering the ecological complexity of siderophore-mediated interactions by differentiating types of community members (pure-producers, partial-producers, and pure-cheaters), siderophore/receptor pairs, and exploring a wide range of situations (such as the number of community members, the ratio of pure-cheaters, or the siderophore breadth of partial-producers). Moreover, the discussion and conclusions of this study are mechanistically well-founded with a graph-theory framework (Benefit Transfer Graph). All computer code and scripts to replicate the simulations, analysis, and figure generation are public in the Zenodo repository.

      However, this study still has some work to do before it meets the expected standards, presenting some weaknesses to be addressed. Please regard the following paragraph as constructive feedback aimed at improving your work. The main weakness of the actual version is the Abstract, the missing Methods section, the structure of the Results section, and the results displaying (i.e., Figures), and how partial-producers are considered as cheaters (including how they referred to the capacity of partial-producers to use different siderophores as 'Cheating Breath'). The Abstract could be significantly improved with a better introduction of the system (cooperators and cheaters, and the concept of the 'Tragedy of Commons'), a better description of the modeling framework, and other details included in 'Recommendations for the authors'. The current version of the manuscript misses a proper 'Methods' section.

      Moreover, the authors could include (1) a section with the simulated systems and parameter choices of simulation experiments, (2) the key model assumptions, and (3) a separate (and more detailed) section explaining the graph-theory framework applied in this study (Benefit Transfer Graph). Most of this information is included in Supporting Information, but including it in the main text will facilitate the comprehension of the work. The structure of the results displayed (i.e., Figures) is quite confusing, especially in the section 'Closed Benefit Loops Drive Transitions from Exclusion to Coexistence and Chaos'. Moreover, important results are included in Supportive Information when they should be in the main text. Also, the lack of a proper Method section makes it harder to follow the Results sections. I have included some recommendations/suggestions to improve the Results structure. This study reveals an interesting ecological dynamic in siderophore-mediated interactions. The authors suggest the existence (and further explanation) of the 'Paradox of Cheating'. However, this paradox (and their discussion) may come from a misunderstanding of concepts and/or terminologies used by the authors applied here (and maybe widely applied in cooperator-cheaters systems). The authors refer to the capacity of 'partial-producers' to utilize foreign siderophores (i.e., siderophores of other species) as cheating. Also, they refer to the number of foreign siderophores that a 'partial-producer' can utilize as 'Cheating Breadth'. A microbial cheater is one that has receptors for siderophore uptake but does not pay the cost of producing siderophore themselves. Because 'partial-producers' are generating at least one type of siderophore, these are not technically cheaters (although they may act as 'pure-cheaters', changing their gene expression and do not synthesize any siderophore for the community). All this may entail a misleading of the results and a potentially overstated title and conclusions of this work. Community members 'pure-producers', 'partial-producers' cheaters may be called in a different way, e.g., 'single-receptor producer', 'multiple-receptor producers' and 'nonproducers', respectively [Gu. et al. (2025), doi: 10.1126/sciadv.adq5038]. A better terminology for 'the number of foreign siderophores that a partial-producer can utilize' could be 'Siderophore Breadth', and instead of stating a 'Paradox of Cheating', it can be a 'Paradox of Multiple-receptor Producers'. The discussion of the authors aligns better with the presented results if the proposed terms 'single-receptor producer/multiple-receptor producer and cheater' are used, considering multiple-receptor producers as cooperative members rather than 'moderate cheating'. On the other hand, the Paradox of Multiple-receptor Producers (or Paradox of Cheating by the authors) could be a modeling artifact. Although some species possess multiple siderophore receptors in their genome (some studies suggest that Pseudomonas species and other environmental strains' genomes can have up to 20-30 siderophore receptors), that does not mean that they are all expressed simultaneously.

      Regardless of the weaknesses and the major points to be improved, the findings presented in this work substantially advance our understanding of complex ecological interactions between cooperators and cheaters mediated by siderophore and siderophore-receptor syntheses, especially when multiple-receptor producers are present. Moreover, the modeling and graph-theory frameworks presented by the authors can be applied in other microbial systems, such as collaboration/competition/cheating for substrates or nutrients. Fundamental modeling exercises are indispensable to unveil ground ecological rules of complex microbial communities, accelerating the advances in ecology by developing theory-based hypotheses for future experimental and environmental studies.

    1. eLife Assessment

      This paper demonstrates that a genetic code expansion to tag two amyotrophic lateral sclerosis (ALS) proteins associated with stress granules is useful in an experimental context. The data are solid and demonstrate the feasibility of using ANAP-fluorescence for live cell imaging.

    2. Reviewer #1 (Public review):

      Summary:

      The authors utilize genetic code expansion to tag TDP-43 and G3BP1, and evaluate this protein tagging system (ANAP) compared to antibodies and evaluate protein trafficking and stress granule formation in response to stress with sodium arsenite treatment. They find similar staining to antibodies in HeLa cells, mouse embryonic stem cells and primary mouse cortical neurons. By incorporating the intrinsically fluorescent noncanonical amino acid Anap at carefully selected sites, the authors enable live-cell and neuronal visualization of protein localization, stress-induced redistribution, and dynamic behavior without the structural and functional compromises often associated with large fluorescent protein tags. The work provides technical framework that will be useful for live imaging of tagged proteins.

      Strengths:

      A key strength is the demonstration of the specificity of the Anap fluorescence signal through appropriate controls and the agreement between Anap labeling and antibody-based detection across multiple cell types, including primary neurons. The ability to visualize stress-induced redistribution of both G3BP1 and TDP 43 in living cells highlights the practical value of this approach.

      The functional validation of TDP 43-Anap is compelling. The rescue of both cell viability and RNA splicing defects in TDP 43 knockout models provides evidence that Anap incorporation preserves core protein functions. This is important, as functional disruption is a central concern for any alternative tagging strategy applied to aggregation-prone or RNA-binding proteins.

      Weaknesses:

      While some inherent limitations of genetic code expansion remain (e.g., variable amber suppression efficiency and the inability to directly assess endogenous protein behavior), these are acknowledged and discussed appropriately. Importantly, these limitations do not undermine the central contributions of the study.

    3. Reviewer #2 (Public review):

      In this manuscript, Chen and colleagues describe a novel means of labeling two RNA binding proteins, G3BP1 and TDP-43, using genetic code expansion. Overexpressed constructs that incorporate the intrinsically-fluorescent non-canonical amino acid Anap redistribute to cytoplasmic granules upon application of external stressors such as sodium arsenite. Similar labeling and redistribution of overexpressed G3BP1 and TDP-43 was observed in cultures of mouse primary neurons.

      Genetic code expansion and non-canonical amino acid labeling have many advantages over traditional fusion proteins for tracking protein redistribution in living cells. The authors show that they are able to label exogenous G3BP1 and TDP-43 with the non-canonical amino acid Anap, and follow labeled proteins in living cells with and without stress.

      I suspect that this method could be incredibly valuable to many investigators studying the dynamics and interactions of proteins that are difficult to label or detect by conventional methods.

      Comment on revised version:

      The revised manuscript is significantly improved, with added controls and experiments to confirm expression and Anap labeling of G3BP1 and TDP-43.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Amyotrophic lateral sclerosis (ALS) affects nerve cells in the brain and spinal cord. The authors' approach to use genetic code expansion to tag two ALS proteins associated with stress granules has value and should be useful in the ALS field. Parts of the work are well done, but there are concerns that the evidence is incomplete overall, and additional controls would strengthen the study.

      We thank the editors and reviewers for their thoughtful assessment and for highlighting the potential value of applying genetic code expansion (GCE) to study ALSassociated proteins involved in stress granule biology. Our goal in this work was to establish and validate a minimally perturbative labeling strategy using the noncanonical amino acid Anap to monitor the localization and stress-dependent behavior of TDP-43 and G3BP1.

      We agree that additional controls can further strengthen the conclusions. In the revised manuscript, we have clarified the experimental design and added essential controls to better support the reliability of the Anap labeling approach (Supplementary Fig. 1).

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors utilize genetic code expansion to tag TDP-43 and G3BP1, and evaluate this protein tagging system (ANAP) compared to antibodies, and evaluate protein trafficking and stress granule formation in response to stress with sodium arsenite treatment. They find similar staining to antibodies in HeLa cells, mouse embryonic stem cells, and primary mouse cortical neurons. This is a useful study that demonstrates the utility of ANAP tagging to evaluate ALS proteins.

      We sincerely thank the reviewer for the positive assessment of our work and for recognizing the utility of the Anap-based GCE system for studying ALS-associated proteins.

      Strengths:

      Rescue of cell survival by ANAP-tagged TDP-43 is compelling

      We appreciate the reviewer’s highlighting of this point. Demonstrating that TDP43-Anap can rescue cell survival was an important validation in our study, as it indicates that incorporation of the noncanonical amino acid does not substantially disrupt the biological function of TDP-43. Additionally, we also tested the RNA splicing function recovery potency of TDP-43-Anap. As shown in Fig. 1K and 1L, a recovery of expression of PFKP, a protein undergoing cryptic exon when TDP-43 lost its function [1], was observed when expressing TDP-43-Anap in TDP-43 knockout Hela cells.

      Weaknesses:

      While the ANAP-tagged proteins had similar distributions to antibody staining, there were some discrepancies that may be more explained by the technique than by novel findings, as the authors suggested. The inclusion of additional controls to evaluate this would be helpful.

      This is a helpful suggestion. To ensure that the fluorescence signal observed in our experiments was specifically derived from site-specific Anap incorporation rather than background fluorescence, we performed three control conditions. Specifically, we tested: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. These control experiments were performed for both TDP-43 and G3BP1, and no observable fluorescence signal was detected under any of these conditions (Supplementary Fig. 1). We have clarified this control experiment in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Chen and colleagues describe a novel means of labeling two RNAbinding proteins, G3BP1 and TDP-43, using genetic code expansion. Overexpressed constructs that incorporate the intrinsically fluorescent non-canonical amino acid Anap redistribute to cytoplasmic granules upon application of external stressors such as sodium arsenite. Similar labeling and redistribution of overexpressed G3BP1 and TDP43 were observed in cultures of mouse primary neurons.

      We are grateful for the reviewer’s accurate summary of our study and recognition of the value of GCE strategy for labeling the RNA-binding proteins G3BP1 and TDP-43.

      Strengths:

      Genetic code expansion and non-canonical amino acid labeling have quite a few advantages over traditional fusion proteins for tracking protein redistribution in living cells. The authors show that they are able to label exogenous G3BP1 and TDP-43 with the non-canonical amino acid Anap and follow labeled proteins in living cells with and without stress.

      We acknowledge the reviewer’s comment on the advantages of GCE-based noncanonical amino acid labeling for studying protein dynamics in living cells.

      Weaknesses:

      The authors do not convincingly leverage the advantages of genetic code expansion in the current study. There is no specific question posed by the authors that can be or is answered using this approach, and several of the experiments lack critical controls. This is also not the first example of TDP-43 labeling by genetic code expansion (see PMID: 38290242). As a result, the study as a whole adds little to our understanding of protein trafficking and behavior under stress.

      We thank the reviewer for raising these important points. Although as reviewer mentioned, genetic code expansion has previously been applied to TDP-43 [2], it mainly employed the photocaged lysine incorporation system to optogenetic control of TDP-43 translocation, and the protein was still labeled by mRubby. Our paper has totally different goal, to establish and validate a minimally perturbative labeling strategy using the intrinsically fluorescent noncanonical amino acid Anap to monitor the localization and stress-dependent behavior of both TDP-43 and G3BP1. And our work extends this approach in several important ways.

      First, we demonstrate that Anap incorporation enables visualization of stress-dependent redistribution of both TDP-43 and G3BP1, two key proteins involved in stress granule biology. Importantly, we validate this approach across multiple cellular systems, including HeLa cells, mouse embryonic stem cells, and primary mouse cortical neurons, which broadens the applicability of this labeling strategy.

      Second, we provide functional validation of the Anap-tagged protein, showing that TDP43-Anap rescues both cell survival and RNA splicing activity in TDP-43 knockout cells, including restoration of PFKP expression, a known cryptic exon target of TDP-43. These results support that Anap incorporation does not substantially disrupt protein function.

      We performed additional control experiments to ensure the specificity of the labeling system. Specifically, we tested three control conditions: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. These control experiments were performed for both TDP-43 and G3BP1, and no observable fluorescence signal was detected under any of these conditions (Supplementary Fig. 1).

      We agree that the manuscript would benefit from clearer articulation of the advantages of genetic code expansion in this context. Accordingly, we have revised the manuscript to more explicitly emphasize how Anap labeling provides a minimally perturbative alternative to large fluorescent protein fusions, which can alter the phase behavior and localization of stress granule proteins.

      “Conventional fluorescent protein tags have enabled visualization of TDP-43 and G3BP1 in living cells; however, these approaches can perturb the native biophysical properties of the proteins being studied. For example, GFP or other fluorescently tagged TDP-43 usually requires additional modifications, such as deletion of the nuclear localization signal (NLS) [3, 4], to induce cytoplasmic inclusion formation. Such manipulations introduce non-physiological conditions that may alter the native trafficking and aggregation behavior of TDP-43. As for G3BP1, tags like GFP may also cause unexpected effects on the phase separation or other dynamics of the protein. In contrast, Anap based GCE strategy allows the minimally perturbative labeling and visualization of protein localization and stress-induced redistribution while preserving native protein architecture and function of both proteins. Importantly, the approach provides a generalizable genetically encoded platform for quantitatively examining the behavior of ALS-associated proteins in living cells. By enabling faithful monitoring of protein trafficking and stressgranule dynamics without extensive protein engineering, Anap-based GCE can offer a powerful strategy for probing molecular-scale mechanisms underlying ALS-linked proteinopathies”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1A

      The authors report that the nuclear staining of G3BP1 by ANAP labeling shows the presence of nuclear pools of G3BP1 that aren't detected with antibody staining. However, unspecific nuclear staining by aminoacylated tRNAs bound to synthetases has been described. It would be important to have a control to evaluate for this possibility.

      This is an important point. We agree that the nuclear ANAP signal should be carefully controlled to exclude the possibility of nonspecific staining arising from the Anap incorporation machinery itself, such as aminoacylated tRNAs and/or synthetases.

      To address this concern, in methods and material part, we note that after DPBS washes to remove excess Anap, cells were incubated in fresh medium for 2 hours to allow sufficient time for the decay of unstable aminoacylated tRNAs, which are generally cleared within minutes to tens of munites [5].

      Also, we performed three control conditions for both TDP-43 and G3BP1: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. Under all three conditions, we observed no detectable fluorescence signal (Supplementary Fig. 1).

      In addition, as shown in Fig. 1I, the nuclear signal of G3BP1-Anap partially colocalizes with the nuclear signal of TIA-1 in several condensate-like structures. This observation further supports that the nuclear Anap signal reflects protein-associated localization rather than nonspecific fluorescence, as it overlaps with a known RNA-binding protein that can form nuclear condensates under certain conditions.

      (2) Figure 1A, 1B

      Anap labeling appears to stain fewer cytoplasmic structures compared to antibody staining for both G3BP1 and TDP-43 after sodium arsenite treatment. Quantification would be useful to address whether this is the case. If so, might this be due to unincorporated/truncated proteins competing with Anap-labeled proteins?

      We appreciate the reviewer’s helpful suggestion. To address this point, we performed quantitative colocalization analysis using Fiji/ImageJ, calculating the Pearson correlation coefficient (R) for regions of interest between the Anap signal and antibody staining. These analyses indicate a strong overall agreement between the two detection methods under stress conditions, supporting that Anap labeling reliably reports the localization of both G3BP1 and TDP-43 (see Fig1. A, B).

      Regarding the possibility that truncated or unincorporated proteins could influence the observed signal, we note that fluorescence from Anap depends on successful amber suppression and incorporation of Anap at the engineered TAG site. Proteins that fail to incorporate Anap, such as truncated products generated by premature termination, would not produce fluorescence, and therefore would not contribute to the Anap signal. Thus, the Anap fluorescence selectively reports the population of successfully labeled full-length proteins, whereas antibody staining detects both labeled and unlabeled protein pools. This difference may partially explain why antibody staining appears to label a larger number of cytoplasmic structures.

      (3) Figure 1F

      FRAP of G3BP1-GFP in stress granules is slower than in previous publications. The underlying reasons for this should also be addressed.

      We thank the reviewer for this important observation. Differences in FRAP recovery kinetics of G3BP1 in stress granules may arise from several experimental variables that are known to influence stress granule dynamics. These include differences in cell type, expression levels of G3BP1-GFP, and imaging or photobleaching parameters. In our experiments, FRAP measurements were performed under specific conditions optimized for our experimental system, which may lead to recovery kinetics that differ from those reported in previous studies.

      (4) Figure 1H

      A full-size Western blot would be useful to evaluate for amount of truncated protein for G3BP1 and TDP-43. Could truncated proteins be competing with and altering ANAPtagged G3BP1 and TDP-43 localization in response to stress? This should be addressed.

      We acknowledge this important point. Full-size Western blotting can provide information on the overall presence of truncated species in the transfected population; however, it represents a bulk measurement and does not capture cell-to-cell variability in amber suppression efficiency at the single-cell level. We therefore cannot exclude the possibility that truncated products are present at varying levels in individual cells and may contribute, directly or indirectly, to differences between antibody staining and Anap fluorescence.

      Importantly, we observe that cells with successful Anap incorporation consistently exhibit strong antibody staining for TDP-43 or G3BP1, indicating that full-length protein is the predominant species in these cells. Because Anap fluorescence depends on successful amber suppression, it selectively reports the full-length protein population, whereas truncated products are not detected in the imaging assay. The concordance between Anap fluorescence and antibody staining therefore argues against a major contribution of truncated species to the observed localization patterns (Supplementary Fig. 1).

      Accordingly, we interpret the Anap signal as reflecting the localization of successfully labeled full-length protein, while acknowledging that heterogeneity in suppression efficiency is an important limitation of the current approach.

      (5) Figure 3

      This is a well-designed diagram.

      We are grateful for the reviewer’s positive feedback on the diagram and are pleased that the schematic effectively illustrates the experimental design and the principles of the genetic code expansion strategy used in this study.

      Reviewer #2 (Recommendations for the authors):

      The authors present a one-sided viewpoint concerning the connection between stress granules and disease (lines 45-46). A more balanced discussion is recommended, including data arguing against a role for abnormal stress granules in neurodegeneration.

      This is an important suggestion. We agree that the relationship between stress granules and neurodegeneration remains an active area of investigation and that evidence both supporting and questioning a causal role of stress granules in disease has been reported. In the revised manuscript, we have modified the Introduction to provide a more balanced discussion of this topic.

      “Altered stress-granule dynamics have been associated with ALS/FTD [6, 7]; however, whether stress granules directly drive neurodegeneration remains debated, as several studies suggest that stress granules primarily function as protective stress responses [8].”

      (1) A central rationale for the study is missing. The authors state only that G3BP1 and TDP-43 'undergo dynamic stress-dependent redistribution, making them ideal candidates for minimally invasive, site-specific fluorescent labeling.' Is there a controversy or question that can be resolved using these approaches?

      We thank the reviewer for raising this important point. The central motivation of this study is that the dynamic behavior and phase separation properties of stressgranule proteins are highly sensitive to protein modifications and tagging strategies.

      “Conventional fluorescent protein tags have enabled visualization of TDP-43 and G3BP1 in living cells; however, these approaches can perturb the native biophysical properties of the proteins being studied. For example, GFP or other fluorescently tagged TDP-43 usually requires additional modifications, such as deletion of the nuclear localization signal (NLS) [3, 4], to induce cytoplasmic inclusion formation. Such manipulations introduce non-physiological conditions that may alter the native trafficking and aggregation behavior of TDP-43. As for G3BP1, tags like GFP may also cause unexpected effects on the phase separation or other dynamics of the protein.”

      (2) Related to this, there is little context for how or why genetic code expansion is utilized for these studies

      We agree that the rationale for using genetic code expansion should be more clearly explained. In this study, genetic code expansion was employed to enable sitespecific incorporation of the small fluorescent noncanonical amino acid Anap, allowing minimally perturbative labeling of proteins of interest.

      “Anap based GCE strategy allows the minimally perturbative labeling and visualization of protein localization and stress-induced redistribution while preserving native protein architecture and function of both proteins. Importantly, the approach provides a generalizable genetically encoded platform for quantitatively examining the behavior of ALS-associated proteins in living cells. By enabling faithful monitoring of protein trafficking and stress-granule dynamics without extensive protein engineering, Anapbased GCE can offer a powerful strategy for probing molecular-scale mechanisms underlying ALS-linked proteinopathies.”

      (3) The justification for the criteria for selecting the site for incorporation of non-canonical amino acids in G3BP1 or TDP-43 is missing.

      We acknowledge this important comment and agree that the rationale for selecting the incorporation sites should be stated more clearly.

      “For TDP-43, the incorporation site was selected to avoid the major functional domains involved in RNA binding, nuclear localization, and aggregation-related behavior, thereby reducing the likelihood that Anap incorporation would perturb its native trafficking or function. For G3BP1, the selected site was chosen to minimize interference with domains important for stress granule assembly, RNA binding, and protein-protein interactions. More generally, we aimed to place the ncAA at positions likely to be solventaccessible and tolerant of substitution, while avoiding highly conserved or functionally essential residues.”

      (4) Studies in Figures 1 and 2 lack essential controls, including background signal from Anap in non-transfected cells, or those transfected with plasmids lacking the tRNA or tRS.

      This is an important point, also raised by Reviewer 1. To evaluate potential background fluorescence arising from Anap or the labeling system, we performed several control experiments. Specifically, we examined three conditions: (1) cells cultured with Anap supplement, (2) cells expressing the Anap incorporation system with the addition of Anap, and (3) cells expressing both the TAG-mutated protein plasmid and the Anap incorporation system but without the addition of Anap. Under all three conditions, we observed no detectable fluorescence signal (Supplementary Fig. 1).

      (5) Another marker of stress granules should be used for confirming the identity of G3BP1-Anap (+) or TDP-43-Anap (+) structures, including TIA1, TAF15, or polyA RNA.

      We appreciate this helpful suggestion. To further confirm the identity of the stress granule structures observed in our experiments, we performed colocalization analysis with TIA-1, a well-established marker of stress granules. The results have been included in revised manuscript.

      “Additionally, we examined the colocalization of G3BP1-Anap with TIA-1, another established stress granule marker. Under stress conditions, G3BP1-Anap largely colocalized with TIA-1 within stress granules. Interestingly, under basal conditions, the nuclear signal of G3BP1-Anap, which was not detected by antibody staining, appeared to partially colocalize with TIA-1 in several condensate-like structures. (Fig. 1I).”

      (6) There is no information on the number of granules bleached or the number of cells selected for FRAP studies. There is no information on the shaded areas in Figure 1F or 1G, and no information on statistical comparisons between regressions in Figure 1F.

      We thank the reviewer for pointing out these omissions. We have revised the figure legends to clarify these details.

      “One granule from each of three independent cells was selected and photobleached for FRAP analysis.”

      “Here, error bars with filled area are used for better data presentation. FRAP recovery curves were compared using two-way ANOVA.”

      (7) Protein dynamics measured by FRAP are highly dependent on the concentration and/or expression level of each protein. Because of this, the authors need to control for expression level in all FRAP studies.

      We agree that protein concentration and expression level can influence FRAP recovery kinetics. Since Anap incorporation is based on amber suppression, and the suppression rate in each cell varies, so it is difficult to control the expression of Anap labeled proteins, however, to minimize this potential effect, we performed FRAP measurements on cells exhibiting comparable fluorescence intensities, which served as a proxy for similar expression levels of the labeled proteins. In addition, FRAP analyses were conducted on individual granules within cells expressing moderate levels of the protein, avoiding cells with unusually high fluorescence intensity that might reflect overexpression.

      Furthermore, fluorescence recovery was normalized to the pre-bleach intensity of the selected granules, which reduces variability arising from differences in overall expression levels between cells.

      (8) There is no point of reference for TDP-43-Anap FRAP results in Figure 1G. Additional studies using variants harboring a mutated NLS (mNLS) can be used in place of TDP43-YFP.

      This is a helpful suggestion. In response, we have performed additional FRAP experiments using TDP-43<sup>ΔNLS</sup>, a commonly used construct that promotes cytoplasmic localization and facilitates analysis of TDP-43 granules. The results from TDP-43<sup>ΔNLS</sup> have now been included as a reference for the FRAP measurements of TDP-43-Anap in the revised manuscript (Fig. 1D, 1G).

      “We then used YFP-tagged nuclear localization signal (NLS)-deleted TDP-43 (TDP43<sup>ΔNLS</sup>-YFP) as a reference and performed FRAP analysis to compare the mobility of TDP-43-Anap and TDP-43<sup>ΔNLS</sup>-YFP. Fluorescence recovery of TDP-43-Anap reached ~45% within 20 s after photobleaching, consistent with liquid-like dynamics. In contrast, TDP-43<sup>ΔNLS</sup>-YFP showed only ~22% recovery, suggesting more solid-like dynamics (Fig. 1D, 1G). These results are consistent with previous reports describing relatively immobile aggregates formed by TDP-43<sup>ΔNLS4</sup>and illustrate the advantage of Anap-based labeling, which preserves native protein properties and enables real-time assessment of protein dynamics without introducing disruptive mutations.”

      (9) There is no point of reference for comparing FRAP results from G3BP1-GFP to G3BP1-Anap. What is the 'gold standard'? Without this, it is difficult to conclude that "... Anap labeling better preserved the native mobility and biophysical properties of G3BP1 than the conventional GFP tag."

      We acknowledge this important point and agree that there is currently no definitive gold standard for measuring the native mobility of endogenous G3BP1 within stress granules in living cells. Our intention was not to claim that the Anap-labeled protein definitively represents the native state, but rather to compare the relative effects of different labeling strategies.

      Thus, we rewrite the sentence as “These results suggest that G3BP1-Anap displays higher mobility compared with G3BP1-GFP, indicating that Anap labeling may provide a less perturbative approach for monitoring G3BP1 dynamics.”

      (10) The WB in Figure 1H is overexposed, making it difficult to compare expression levels between WT and V100Anap-transfected cells. In addition, there is no similar assay for confirming G3BP1-Anap expression.

      Thank you for pointing this out. In the revised manuscript, we have replaced the image with a properly exposed Western blot to allow clearer comparison of protein expression levels.

      In addition, we have now included a corresponding western blot analysis to confirm the expression of G3BP1-Anap in G3BP knockout U2OS cell (Fig. 1H). These results verify that the Anap-labeled proteins are expressed at detectable levels and support the interpretation of the imaging and FRAP experiments.

      (11) Although survival studies in Figures 1I and J are promising, a more convincing demonstration of functional replacement of TDP-43 would involve an assessment of cryptic exon splicing, comparing WT to TDP-43 KO, V100Stop- and V100Anaptransfected cells.

      This is a valuable suggestion.

      “We also evaluated TDP-43-dependent RNA splicing activity by examining the expression of PFKP, a well-established target that undergoes cryptic exon inclusion upon loss of TDP-43 function17. As shown in Figures 1K and 1L, expression of TDP-43Anap in TDP-43 knockout HeLa cells restored PFKP expression, indicating that the Anap-labeled protein retains functional RNA splicing activity. These results demonstrate that TDP-43-Anap is capable of functionally compensating for endogenous TDP-43, supporting that the incorporation of Anap does not substantially disrupt the protein’s biological function.”

      (12) Tuj1 staining in Figure 2 is inconsistent and often fails to confirm neuronal identity.

      We thank the reviewer for this important comment. We acknowledge that Tuj1 staining in Figure 2 is variable and, in some cases, does not clearly delineate neuronal identity. Notably, the reduced Tuj1 signal is primarily observed in neurons that express Anap-labeled proteins under sodium arsenite treatment, which likely reflects the combined effects of transfection-associated stress and oxidative stress on neuronal morphology and cytoskeletal integrity.

      In addition, transfection efficiency in primary neurons is inherently low and variable, and cells that successfully express the constructs may represent a more stress-sensitive subpopulation, further contributing to variability in staining quality. Despite optimization efforts, these technical constraints limit the consistency of Tuj1 labeling under these experimental conditions.

      (13) Close-up images and correlation scatter plots in Figures 1 and 2 do not add very much information.

      We thank the reviewer for this comment. To address the reviewer’s concern, we have revised the figure legends to better clarify the purpose of these panels and how they support the quantitative analysis presented in the manuscript.

      For scatter plot, “Colocalization threshold analysis was performed in Fiji/ImageJ to calculate the Pearson correlation coefficient (R) for each region of interest (A, B, I, J). The X- and Y-axes represent the fluorescence intensity values of the red and green channels, respectively. When signals are colocalized, pixels with high intensity in one channel correspond to high intensity in the other, forming a diagonal distribution. In contrast, non-colocalized signals cluster along the axes. A higher R value indicates a greater degree of colocalization. Scale bar, 3 μm.”

      Same information was added to figure legend of figure 2.

      For the scheme, please see line 412-413 in the revised manuscript.

      Reference:

      (1) Rothstein, J.D. et al. Sporadic ALS induced pluripotent stem cell derived neurons reveal hallmarks of TDP-43 loss of function. Nature Communications 16, 7092 (2025).

      (2) Shadish, J.A. & Lee, J.C. Genetically encoded lysine photocage for spatiotemporal control of TDP-43 nuclear import. Biophys Chem 307, 107191 (2024).

      (3) Gasset-Rosa, F. et al. Cytoplasmic TDP-43 De-mixing Independent of Stress Granules Drives Inhibition of Nuclear Import, Loss of Nuclear TDP-43, and Cell Death. Neuron 102, 339–357.e337 (2019).

      (4) Yan, X. et al. Intra-condensate demixing of TDP-43 inside stress granules generates pathological aggregates. Cell 188, 4123–4140.e4118 (2025).

      (5) Walker, S.E. & Fredrick, K. Preparation and evaluation of acylated tRNAs. Methods 44, 81–86 (2008).

      (6) Kassouf, T. et al. Targeting the NEDP1 enzyme to ameliorate ALS phenotypes through stress granule disassembly. Science Advances 9, eabq7585 (2023).

      (7) Van Nerom, M. et al. C9orf72-linked arginine-rich dipeptide repeats aggravate pathological phase separation of G3BP1. Proceedings of the National Academy of Sciences 121, e2402847121 (2024).

      (8) Wolozin, B. & Ivanov, P. Stress granules and neurodegeneration. Nat Rev Neurosci 20, 649–666 (2019).

    1. AI Coding at Home Without Going Broke
      • Transitioning from standard chat interfaces to autonomous, multi-file AI coding agents can cause API token consumption and monthly costs to skyrocket if left unmanaged.
      • Including massive, multi-file codebases in every agent prompt rapidly exhausts context windows and inflates the cost per turn exponentially.
      • To code at home without going broke, developers should shift to a modular architecture: isolating components, splitting projects into small modules, and relying heavily on mock data layers.
      • Restricting the AI's visibility to a single file or a narrowly scoped subdirectory keeps context tokens low, prevents the agent from making sweeping changes across the codebase, and lowers billing.
      • Leveraging free or low-cost tier tools to map out full architectural specs and test files before generating implementation code provides rigid constraints that minimize wasted AI loops.
      • Developers can significantly curb expenses by opting for deep-context consumer subscription plans (such as $20 to $100 per month tiers) over uncapped pay-as-you-go API keys when executing heavy agent tasks.

      Hacker News Discussion

      • The Reality of the Cost "Squeeze": A debate emerged over what constitutes "going broke," with many users noting that standard $20 to $100 consumer tiers are more than sufficient for normal hobbyist workflows and are likely heavily subsidized by AI providers at break-even rates.
      • The Culprit Behind Token Bleed: Commenters pointed out that users burning thousands of dollars in API credits are typically running automated pipelines, loading up dozens of Model Context Protocol (MCP) tools, or deploying recursive sub-agents that reload the entire codebase context on every single turn.
      • Niche Utility for Unattended Grinding: While continuous, unattended AI coding is rarely efficient for daily tasks, an engineer shared a highly valuable edge case: letting an AI autonomously decompile, reverse-engineer, and rebuild five interrelated legacy firmware images back into recognizable C projects over several hours.
      • The Sequential Refactoring Playbook: For managing large-scale modifications, users advocated for a strict, multi-step pipeline: first utilizing AI to ingest code and write unit tests, then breaking the files into tiny, isolated blocks, testing those blocks independently, and only then generating the actual refactored behavior.
      • Interruption Management Advantage: A key human-centric benefit highlighted was how agentic setups alleviate cognitive load during family interruptions; a developer can step away for hours and simply tell the agent to catch them up and proceed without losing flow state.
    1. If you are requesting human attention, demonstrate human effort.

      Hacker News Discussion

      • The Pull Request Fatigue Loop: A widely upvoted comment highlighted how a colleague using Claude flooded the team with AI-generated PRs, then complained when they languished; reviewers subconsciously avoided them because reviewing AI code for hidden hallucinations requires an immense, asymmetric amount of human effort.
      • The Asymmetry of Feedback: Users noted that it feels deeply dismissive when a human invests an hour of intense cognitive effort to thoughtfully review a massive PR, only to receive an instantaneous, AI-generated reply or amendment from the author.
      • Review Scalability vs. Guardrails: Some participants argued that traditional code review cannot scale to prolific AI agents or hyper-productive humans; they suggested transitioning to automated guardrails—such as linters, auto-formatters, and robust end-to-end continuous deployment testing—to offset the review bottleneck.
      • Code Review as a Cultural Practice: The discussion underscored that code review should function as a collaborative team process for shared understanding and mentorship rather than a cold, adversarial gatekeeper blocking a developer from merging code.
      • Exploiting Token Budgets: One commenter observed that large, complex PRs often trigger scrolling blindness in humans and cause LLMs to run out of token budget, leading both to blindly approve the change with a generic "looks good to me."
    1. Why AI hasn’t replaced software engineers, and won’t
      • Software engineering has a long history of aggressive automation—from assembly to high-level languages—and rather than replacing engineers, every leap in productivity has expanded the scale and complexity of what can be built.
      • The demand for software is functionally insatiable; as soon as engineers become more efficient, the organizational goalposts move, leading to higher expectations rather than a reduction in staff.
      • Current AI development tools act primarily as force multipliers rather than autonomous agents, meaning that an expert developer is still strictly required to drive, review, and handle the remaining high-value 10% of the work.
      • For AI to truly replace software engineers, an autonomous AI system would need to consistently outperform an AI+human developer hybrid team, a milestone that current data and architectures are far from reaching.
      • While generalist software engineers remain secure, specific narrow domains or commoditized skill sets (such as basic, boilerplate frontend development) face a heightened risk of being entirely absorbed by AI tools.
      • The most significant hurdle for autonomous AI is not initial code generation, but rather the long-term maintenance, context retention, and reasoning required to safely adapt to changing ecosystems and walled gardens.
      • Rather than destroying the engineering market, AI changes the underlying economics of production, allowing developers to rapidly clear backlogs, build minor utilities, and focus more on architectural architecture and system design.

      Hacker News Discussion

      • The Jevons Paradox of Code: Commenters emphasized that increasing the efficiency of software creation lowers its cost, which historically exponentially increases overall demand rather than exhausting the market.
      • The Rise of Bespoke Consumer Software: A popular theory suggested that AI will enable everyday users to spin up personalized, ad-free, micro-utilities (like custom todo lists) on the fly, reducing reliance on bloated commercial applications.
      • The Tinkering vs. Maintenance Chasm: Several users countered the "bespoke software" future by comparing it to 3D printing; while creating a custom script is easy with AI, the average user lacks the logical thinking and patience required to maintain software over time.
      • A Cyberpunk Technological Stack: Users noted that the current trajectory feels reminiscent of science fiction, where individuals possess highly customized, personalized technology stacks modified specifically for their unique workflows.
      • B2B Complexity and Standardization: Many participants pointed out that while consumer-facing apps might become fragmented, enterprise B2B infrastructure, distributed systems, and core data layers (like the Linux kernel or banking infrastructure) strictly require human-driven rigor, consistency, and standardization.
    1. Doing nothing at work
      • Many software engineers should deliberately work fewer hours and at a slower pace, aiming for around 80% utilization by default to leave 20% slack time away from the computer.
      • Engineering performance in tech companies is dominated by outlier, high-impact events where solving the right problem at the right time matters more than raw effort.
      • Key time-dependent opportunities for outsized impact include stepping in to unblock a massive enterprise deal, mitigating or preventing a major incident early, and rapidly unblocking high-profile feature shipments.
      • Staying 100% utilized on low-priority backlog tasks makes engineers too busy to spot high-impact opportunities, and prevents managers from volunteering them for strategic, high-visibility work.
      • Keeping time free and "doing nothing" gives the brain rest to spark new ideas, prevents exhaustion before high-pressure incidents, and encourages engineers to "think in slow motion" during critical situations rather than making frantic, harmful changes.
      • Engineers must consciously avoid low-priority "glue work" (like unsolicited documentation or unprioritized tech debt) because insulating an organization from its own poor prioritization leads to individual burnout without reward.
      • Being overly helpful leaves engineers vulnerable to "predators" who extract uncompensated, unrecorded work through backchannels, such as product managers asking for ad-hoc data queries or colleagues taking credit for paired programming.
      • Engineers should resist the urge to immediately implement volatile requirements from indecisive designers or run out the clock on low-clout managerial ideas that are likely to be canceled, avoiding wasted effort.
      • Peak high performance does not require constant grinding; it is more effective to maintain an 80% effort baseline during ordinary times and save 100% maximum effort for the two or three times a year when the rewards are exceptionally high.

      Hacker News Discussion

      • The Firefighter Incentive Problem: Multiple commenters noted a fundamental misalignment in corporate game theory: preventing an outage yields zero visibility or measurable metrics, whereas creating "a giant pile of kindling" and publicly putting out the resulting fire gets rewarded twice by management.
      • Strategies for Technical Relevance: Rather than relying on firefighting visibility, some users suggested building robust, highly reliable, yet complex essential tools that force other teams to repeatedly come back to you for guidance, naturally cementing your status as an expert.
      • The Risk of Over-Helpfulness vs. Goodwill: While the author warned against backchannel "predators," a popular counterpoint detailed how a Principal Engineer achieved their title by intentionally giving away credit and building immense team goodwill, which paid off critically during a high-stakes project rescue.
      • Product Search vs. Pure Engineering: There was nuance added around the balance of code quality; in early-stage feature exploration or "search problems," moving fast and breaking things to find out what users want can sometimes be more valuable than building perfectly solid, slow-moving architecture.
    1. Stratégies de Différenciation et Aménagement de l'Espace en Classe : Le Modèle de Charlotte Monin

      Synthèse

      Ce document détaille l'approche innovante de Charlotte Monin, professeure de physique-chimie, pour intégrer la différenciation pédagogique au cœur de sa salle de classe sans accroître la charge mentale de l'enseignant.

      La stratégie repose sur la transformation de la posture de l'enseignant — passant de maître du savoir à facilitateur — et sur un aménagement spatial saturé d'outils d'étayage accessibles en autonomie.

      Les points clés incluent l'utilisation des murs comme ressources d'apprentissage, la création de zones dédiées (calme, recherche, révision) et le déploiement d'outils de mémorisation active.

      L'objectif final est de répondre aux besoins fondamentaux des élèves : autonomie, compétence et lien social.


      1. Philosophie Pédagogique : De la Transmission à la Facilitation

      Le système repose sur un changement profond de paradigme éducatif.

      L'enseignant ne se contente plus d'une transmission descendante, mais devient un accompagnateur.

      • Posture de « Facilitatrice » : L'enseignante intervient en « dernière ligne ».

      Les outils permettent de résoudre les blocages initiaux (compréhension des consignes, rappels méthodologiques) pour que l'enseignante puisse consacrer son temps aux élèves en réelle difficulté.

      • Continuité Primaire-Collège : Le dispositif capitalise sur l'autonomie acquise à l'école primaire (circulation en classe, gestion du temps d'attente) que le collège a parfois tendance à briser.

      • Besoins Psychosociaux (CPS) : L'aménagement vise à satisfaire trois besoins essentiels :

      • Autonomie : Capacité à agir seul grâce aux ressources disponibles.

      • Compétence : Sentiment de réussite via des outils de vérification immédiate.

      • Lien Social : Travail coopératif favorisé par des îlots et des outils de communication.


      2. L'Espace de Classe comme Outil d'Étayage

      Chaque mètre carré de la salle est pensé pour soutenir l'apprentissage.

      L'aménagement ne laisse rien au hasard.

      L'Exploitation des Murs et des Sas

      • Le Mur des Révisions (Extérieur) : Situé dans le couloir, il permet aux élèves de réviser ou de se rassurer avant d'entrer en cours.

      • Le Mur de la Démarche Scientifique : Regroupe le vocabulaire d'argumentation et de comparaison (souvent emprunté au français) pour aider à la rédaction des comptes-rendus.

      • Le Mur des Repères : Aides visuelles pour les notions spatiales (droite/gauche, abscisse/ordonnée, vertical/horizontal).

      Les Zones Dédiées

      | Zone | Fonction principale | Matériel / Spécificités | | --- | --- | --- | | Espace Bibliothèque | Calme et autonomie | Poufs, livres, jeux pédagogiques (après l'activité). | | Espace Calme / Isolement | Régulation émotionnelle | Casques antibruit, coloriages (mandalas, pixel art), exercices de respiration. | | Pôle Ressources | Autonomie matérielle | Casiers numérotés, matériel d'électricité, fiches d'activités en libre-service. | | Paillasses Élèves | Travail collaboratif | Écriture directe sur les tables (feutres effaçables), porte-clés de grandeurs. |


      3. Boîte à Outils pour l'Autonomie et la Différenciation

      Le dispositif s'appuie sur des outils concrets que les élèves apprennent à manipuler dès le début de l'année.

      • Le Tétraide : Un outil de signalisation (code couleur) qui permet aux groupes d'indiquer leur état d'avancement ou leur besoin d'aide sans interrompre le cours.

      • Le Lexique des Verbes de Consigne : Un dictionnaire intégré à la paillasse définissant les termes « argumenter », « calculer », etc., pour lever les blocages de lecture.

      • La Méthode des « 5C » : Un guide de rédaction systématique (Chercher, Connaître, Convertir, Calculer, Conclure).

      • Porte-clés des Grandeurs Physiques : Un outil de référence rapide reliant grandeur, symbole, unité et instrument de mesure.

      • Matériel Inclusif : Réglettes de lecture pour dyslexiques, casques antibruit, et tabourets « Culbuto » pour les élèves ayant un besoin de mouvement.


      4. Retours d'Expérience : Succès et Échecs (Tops & Flops)

      L'évolution du système s'est faite par essais et erreurs sur plusieurs années.

      Les « Tops » (Indispensables)

      • Le Porte-vue d'Étayage : Centralise toutes les fiches méthodes.

      Il évite la perte de documents d'une année sur l'autre et oblige l'élève à une démarche consciente de recherche.

      • Ardoises et Checklistes 5C : Des fiches plastifiées permettant de cocher les étapes de résolution d'un problème, favorisant l'auto-évaluation.

      • Les Rôles Coopératifs : Responsable de séance, synthétiseur, animateur, médiateur.

      Ces rôles sont harmonisés à l'échelle de l'établissement pour éviter de perdre les élèves entre les matières.

      Les « Flops » (À éviter ou ajuster)

      • Les « Coups de Pouce » Surchargés : Un premier format de porte-clé trop dense en informations a fini par perdre les élèves.

      La simplicité doit primer.

      • La Table d'Appui Fixe : L'idée d'une table dédiée où l'enseignante reste pour la remédiation a échoué car elle figeait trop l'espace et créait une surcharge sur les autres îlots de travail.

      5. Logistique, Investissement et Mise en Œuvre

      La mise en place d'un tel environnement nécessite un investissement personnel et temporel significatif.

      • Progressivité : Charlotte Monin préconise la méthode des « petits pas ».

      Inutile de tout transformer en une fois ; il est préférable de commencer par un porte-vue de méthodes ou le système des rôles.

      • Financement : Le projet repose sur un mélange de budget disciplinaire (achat de matériel flexible comme les tabourets) et d'investissement personnel (centaines d'euros par an pour la plastification, le matériel de récupération, etc.).

      • Maintenance : Le matériel personnel permet à l'enseignant de conserver ses outils en cas de changement d'établissement.

      • Formation des élèves : Des séances spécifiques en début d'année sont nécessaires pour apprendre aux élèves à se repérer dans la salle et à utiliser les ressources sans l'aide systématique de l'adulte.

      « Un bon étayage, une bonne prise en compte de l'hétérogénéité, c'est aussi que ces outils ne sont pas tous accessibles directement.

      Le fait qu'ils doivent aller chercher [...] nécessite que les élèves aient pleine conscience de l'existence de l'outil. » — Charlotte Monin

    1. Additionally, people who work or live in multilingual settings may code-switch many times throughout the day, or even within a single conversation.

      I watched code-switching a lot growing up in a bilingual household. Specifically, from my mother who not only code-switched at home but also at work. It was always kind of impressive to me how seamless a conversation could be even with a mix of two different languages involved. It wasn't until I was older and we spoke less Spanish in the house that she shared it isn't always easy. She explained to me that there are many different dialects of Spanish, some similar and some very different. Even though speaking with family is easy at work, it can start to get more complicated. Often making conversations longer as more descriptors are needed to make sure everyone is on the same page.

    2. code-switching refers to changes in accent, dialect, or language (Martin & Nakayama, 2010). There are many reasons that people might code-switch. Regarding accents, some people hire vocal coaches or speech-language pathologists to help them alter their accent

      I do this at my IT help desk job on campus all the time. I talk totally normal to other students when they come in for help. But when my boss or a teacher comes in I switch up and sound super professional.

    1. Welches System für welchen Bedarf? Die fünf Systeme gewinnen in unterschiedlichen Szenarien. Diese Übersicht zeigt, welche Plattformen für welche typischen Anforderungen besonders geeignet sind. Im Folgenden werden die Plattformen im Detail vorgestellt. PlattformIhr SchwerpunktWarum PlentyONESkalierender Multichannel-Verkauf, Commerce-Betrieb in einem System gebündelt150+ native Vertriebskanäle plus PIM, OMS, WMS und Shop in einer cloud-nativen Plattform JTLGünstiger Einstieg, einzelner Shop plus ein bis zwei Marktplätze im DACH-RaumKostenlose JTL-Wawi und tiefes, deutsches Shop- und Marktplatz-Ökosystem XentralE-Commerce und zugleich Produktion oder B2B-Großhandel in einem System200+ Integrationen, No-Code-Automatisierung sowie Einkauf und Fertigung weclappCRM, Projekte und Dienstleistung neben dem Verkauf, native Buchhaltung gewünschtNatives, GoBD-konformes Finanzmodul in einer vollständigen Business-Suite OdooMaximale Anpassung bzw. Open Source, E-Commerce als einer von mehreren Kanälen80+ Module, voller Quellcode-Zugriff und enge Backend-Integration

      do not create a table here, just have little paragraphs or something similar, the table looks super weird.

    1. Meest recente AuditEvent voor de SMART /authorize-call van de Koppeltaal-launch, waar de actor de Patient is of een aan deze Patient gekoppelde RelatedPerson (agent.who)

      TOPIC 11 spreekt specifiek over event type User Authentication. Op dit moment wordt er in de voorziening een User Authentication event met subtype https://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_D.html#DCM_110122 Login gelogd indien er delegated authentication plaatsvindt, niet voor elke /authorize.

      Dat moet echter altijd gebeuren, ook als geen delegated authentication plaatsvindt, of als die mislukt en de externe idp niet terugstuurt naar het redirect endpoint

      Mijn (aangepaste) voorstel is om bovenstaande zo te laten, maar daarnaast bij elke /authorize een User Authentication event met subtype https://dicom.nema.org/medical/dicom/current/output/chtml/part16/chapter_D.html#DCM_110144 Authorization Decision te loggen.

      Let op: Deze subtype-code zit niet standaard in http://hl7.org/fhir/R4/valueset-audit-event-sub-type.html, dus vereist aanpassing van het profiel

    1. A fixed workflow (propose ideas, generate plans, write code, run smoke tests, run full training, analyze results, repeat) seems reasonable but underperforms giving AARs no workflow at all

      这个发现颠覆了许多人对AI智能体的直觉。我们自然倾向于给AI更多结构——分步骤、有检查点、有模板,以为这会让它更可靠。但论文发现正相反:规定工作流约束了AAR适应具体想法的能力。当流程固定,智能体只能把想法塞进流程;当流程自由,智能体会根据想法定制流程。这对所有AI智能体产品都有启示:过度的scaffolding是一种隐性的能力税。

    1. Once human- and AI-authored code quality reach parity, humans will stop writing code entirely, and shift to only reviewing it. But if they can't review code as quickly as Claude can generate it, human review will become the bottleneck to AI development.

      这是全文逻辑最严密的段落,也是Amdahl法则的精确应用。加速流水线中最慢的环节决定整体速率,当AI生成代码的速度超过人类审查速度,人类就成了AI进化的瓶颈。这不是抽象担忧——Anthropic在脚注中已经承认「人类代码审查已经成为新瓶颈」。出路只有两条:要么AI能自己审查自己的代码(全闭环递归),要么大幅减少对人类审查的依赖。这两条路都指向同一个终点:递归自我改进。

    2. more than 80% of the code we merge into Anthropic's codebase was authored by Claude

      这个数字需要和脚注3一起读:80%+是合并到生产环境的行数中可归因于Claude的比例,已经是保守计算——脚注承认归因系统有漏洞,且未归因部分也包括大量非人工手写代码。真实比例可能更接近Anthropic领导层公开引用的90%+。即便是保守的80%,意义也是清晰的:在世界上最顶尖的AI研究机构里,人类工程师的核心工作已经从写代码转变为审查和导向代码。

    3. Once human- and AI-authored code quality reach parity, humans will stop writing code entirely, and shift to only reviewing it. But if they can't review code as quickly as Claude can generate it, human review will become the bottleneck to AI development.

      这是全文逻辑最严密的一个段落,也是Amdahl法则的精确应用。加速流水线中最慢的环节决定整体速率,当AI生成代码的速度超过人类审查速度,人类就成了AI进化的瓶颈。这不是抽象担忧——Anthropic在脚注中已经承认「人类代码审查已经成为新瓶颈」。出路只有两条:要么AI能自己审查自己的代码(全闭环递归),要么大幅减少对人类审查的依赖。这两条路都指向同一个终点:递归自我改进。

    4. more than 80% of the code we merge into Anthropic's codebase was authored by Claude

      这个数字需要和脚注3一起读:80%+是合并到生产环境的行数中可归因于Claude的比例,已经是保守计算——脚注承认归因系统有漏洞,且未归因部分也包括大量非人工手写代码。真实比例可能更接近Anthropic领导层公开引用的90%+。但即便是保守的80%,意义也是清晰的:在世界上最顶尖的AI研究机构里,人类工程师的核心工作已经从「写代码」转变为「审查和导向代码」。

    1. there is no distinction between code and data, so warriors regularly modify both themselves and their opponents on the fly

      Core War 的自修改特性让它成为研究 AI 安全的理想沙盒。真实的网络安全攻击中,代码即数据(shellcode 注入、ROP 链)正是最难防御的攻击面。DRQ 在这个环境里自动演化出的攻击策略,本质上是在无监督地发现「代码-数据不区分」漏洞类的通用利用模式——这正是 Mythos 等模型的能力提升背后的相同机制。

    2. convergence does not occur at the level of source code, indicating that what converges is function rather than implementation

      表现型(行为)收敛,基因型(代码)不收敛——这个区分极为精妙。不同的代码实现了相同的功能,就像蜘螃和蛇各自独立演化出毒液但分子机制完全不同。对大模型研究的类比:不同架构、不同训练数据的模型可能在能力层面收敛,而在「实现层」保持多样性。评估 AI 能力时,只看代码/权重是不够的,必须看行为。

    3. this dynamic adversarial process leads to the emergence of increasingly general strategies and reveals an intriguing form of convergent evolution, where different code implementations settle into similar high-performing behaviors

      这是全文最重要的实验结果:不同初始条件的独立演化路径,最终收敛到相似的行为策略。这与生物界鸟和蝙蝠各自独立演化出翅膀如出一辙。对 AI 研究者的启示:存在某种「最优策略的引力盆地」——无论从哪个起点出发,对抗压力会把系统推向相同的解。这意味着复杂能力的涌现可能比我们想象的更具必然性。

    1. Israeli law is based mostly on a common law legal system, though it also reflects the diverse history of the territory of the State of Israel throughout the last hundred years (which was at various times prior to independence under Ottoman, then British sovereignty), as well as the legal systems of its major religious communities. The Israeli legal system is based on common law, which also incorporates facets of civil law. The Israeli Declaration of Independence asserted that a formal constitution would be written,[1] though it has been continuously postponed since 1950. Instead, the Basic Laws of Israel (Hebrew: חוקי היסוד, ħuqey ha-yesod) function as the country's constitutional laws. Statutes enacted by the Knesset, particularly the Basic Laws, provide a framework which is enriched by political precedent and jurisprudence. Foreign and historical influences on modern-day Israeli law are varied and include the Mecelle (Hebrew: מג'לה; the civil code of the Ottoman Empire) and German civil law, religious law (Jewish Halakha and Muslim Sharia; mostly pertaining in the area of family law), and British common law. The Israeli courts have been influenced in recent years by American Law and Canadian Law[2] and to a lesser extent by Continental Law (mostly from Germany).[3]
    2. Unit 8200 (Hebrew: יחידה 8200, Yehida shmone -Matayim- "Unit eight - two hundred") is an Israeli Intelligence Corps unit of the Israel Defense Forces responsible for collecting signal intelligence (SIGINT) and code decryption. Military publications include references to Unit 8200 as the Central Collection Unit of the Intelligence Corps, and it is sometimes referred to as Israeli SIGINT National Unit (ISNU).[1] It is subordinate to Aman, the military intelligence directorate.
    3. Content[edit] The published catalog pages were written between 2008 and 2009. The price of the items ranged from free up to $250,000. Capabilities in the ANT catalog hide Page Code name Description[14] Unit price in US$[c] CANDYGRAM Tripwire device that emulates a GSM cellphone tower. 40,000 COTTONMOUTH-I Family of modified USB and Ethernet connectors that can be used to install Trojan horse software and work as wireless bridges, providing covert remote access to the target machine. COTTONMOUTH-I is a USB plug that uses TRINITY as digital core and HOWLERMONKEY as RF transceiver. 20,300 COTTONMOUTH-II Can be deployed in a USB socket (rather than plug), and, but requires further integration in the target machine to turn into a deployed system. 4,000 COTTONMOUTH-III Stacked Ethernet and USB plug 24,960 CROSSBEAM GSM communications module capable of collecting and compressing voice data 4,000 CTX4000 Continuous wave radar device that can "illuminate" a target system for recovery of "off net" information. N/A CYCLONE-HX9 GSM Base Station Router as a Network-In-a-Box 70,000[d] DEITYBOUNCE Technology that installs a backdoor software implant on Dell PowerEdge servers via the motherboard BIOS and RAID controller(s). 0 DROPOUTJEEP "A software implant for the Apple iPhone that utilizes modular mission applications to provide specific SIGINT functionality. This functionality includes the ability to remotely push/pull files from the device. SMS retrieval, contact list retrieval, voicemail, geolocation, hot mic, camera capture, cell tower location, etc. Command, control and data exfiltration can occur over SMS messaging or a GPRS data connection. All communications with the implant will be covert and encrypted." 0 EBSR Tri-band active GSM base station with internal 802.11/GPS/handset capability 40,000 ENTOURAGE Direction finding application for GSM, UMTS, CDMA2000 and FRS signals 70,000 FEEDTROUGH Software that can penetrate Juniper Networks firewalls allowing other NSA-deployed software to be installed on mainframe computers. N/A FIREWALK Device that looks identical to a standard RJ45 socket that allows data to be injected, or monitored and transmitted via radio technology. using the HOWLERMONKEY RF transceiver. It can for instance create a VPN to the target computer. 10,740 GENESIS GSM handset with added software-defined radio features to record the radio frequency spectrum 15,000 GODSURGE Software implant for a JTAG bus device named FLUXBABBITT which is added to Dell PowerEdge servers during interdiction. GODSURGE installs an implant upon system boot-up using the FLUXBABBITT JTAG interface to the Xeon series CPU. 500[e] GINSU Technology that uses a PCI bus device in a computer, and can reinstall itself upon system boot-up. 0 GOPHERSET GSM software that uses a phone's SIM card's API (SIM Toolkit or STK) to control the phone through remotely sent commands. 0 GOURMETTROUGH User-configurable persistence implant for certain Juniper Networks firewalls. 0 HALLUXWATER Back door exploit for Huawei Eudemon firewalls. N/A HEADWATER Persistent backdoor technology that can install spyware using a quantum insert capable of infecting spyware at a packet level on Huawei routers. N/A HOWLERMONKEY A RF transceiver that makes it possible (in conjunction with digital processors and various implanting methods) to extract data from systems or allow them to be controlled remotely. 750[f] IRATEMONK Technology that can infiltrate the firmware of hard drives manufactured by Maxtor, Samsung, Seagate, and Western Digital. 0 IRONCHEF Technology that can "infect" networks by installing itself in a computer I/O BIOS. IRONCHEF includes also "Straitbizarre" and "Unitedrake" which have been linked to the spy software REGIN.[15] 0 JUNIORMINT Implant based on an ARM9 core and an FPGA. N/A JETPLOW Firmware that can be implanted to create a permanent backdoor in a Cisco PIX series and ASA firewalls. 0 LOUDAUTO Audio-based RF retro-reflector listening device. 30 MAESTRO-II Multi-chip module approximately the size of a dime that serves as the hardware core of several other products. The module contains a 66 MHz ARM7 processor, 4 MB of flash, 8 MB of RAM, and a FPGA with 500,000 gates. It replaces the previous generation modules which were based on the HC12 microcontroller. 3,000[g] MONKEYCALENDAR Software that transmits a mobile phone's location by hidden text message. 0 NEBULA Multi-protocol network-in-a-box system. 250,000 NIGHTSTAND Portable system that installs Microsoft Windows exploits from a distance of up to eight miles over a wireless connection. N/A[h] NIGHTWATCH Portable computer used to reconstruct and display video data from VAGRANT signals; used in conjunction with a radar source like the CTX4000 to illuminate the target in order to receive data from it. N/A PICASSO Software that can collect mobile phone location data, call metadata, access the phone's microphone to eavesdrop on nearby conversations. 2,000 PHOTOANGLO A joint NSA/GCHQ project to develop a radar system to replace CTX4000. 40,000 RAGEMASTER A concealed device that taps the video signal from a target's computer's VGA signal output so the NSA can see what is on a targeted desktop monitor. It is powered by a remote radar and responds by modulating the VGA red signal (which is also sent out most DVI ports) into the RF signal it re-radiates; this method of transmission is codenamed VAGRANT. RAGEMASTER is usually installed/concealed in the ferrite choke of the target cable. The original documents are dated 2008-07-24. Several receiver/demodulating devices are available, e.g. NIGHTWATCH. 30 SCHOOLMONTANA Software that makes DNT[i] implants persistent on JUNOS-based (FreeBSD-variant) J-series routers/firewalls. N/A SIERRAMONTANA Software that makes DNT implants persistent on JUNOS-based M-series routers/firewalls. N/A STUCCOMONTANA Software that makes DNT implants persistent on JUNOS-based T-series routers/firewalls. N/A SOMBERKNAVE Software that can be implanted on a Windows XP system allowing it to be remotely controlled from NSA headquarters. 50,000 SOUFFLETROUGH BIOS injection software that can compromise Juniper Networks SSG300 and SSG500 series firewalls. 0 SPARROW II A small computer intended to be used for WLAN collection, including from UAVs. Hardware: IBM Power PC 405GPR processor, 64 MB SDRAM, 16 MB of built-inflash, 4 mini PCI slots, CompactFlash slot, and 802.11 B/G hardware. Running Linux 2.4 and the BLINDDATE software suite. Unit price (2008): $6K. 6,000 SURLYSPAWN Keystroke monitor technology that can be used on remote computers that are not internet connected. 30 SWAP Technology that can reflash the BIOS of multiprocessor systems that run FreeBSD, Linux, Solaris, or Windows. 0 TAWDRYYARD Radio frequency retroreflector to provide location information. 30 TOTECHASER Windows CE implant for extracting call logs, contact lists and other information. N/A TOTEGHOSTLY Software that can be implanted on a Windows mobile phone allowing full remote control. 0 TRINITY Multi-chip module using a 180 MHz ARM9 processor, 4 MB of flash, 96 MB of SDRAM, and a FPGA with 1 million gates. Smaller than a penny. 6,250[j] TYPHON HX Network-in-a-box for a GSM network with signaling and call control. N/A WATERWITCH A portable "finishing tool" that allows the operator to find the precise location of a nearby mobile phone. N/A WISTFULTOLL Plugin for collecting information from targets using Windows Management Instrumentation 0 Follow-up developments
    4. BlackRock, is the largest investor in weapon manufacturers through its U.S. Aerospace and Defense ETF.[43] In September 2018, an activist with the U.S. non-profit organization Code Pink confronted Fink on stage at the Yahoo Finance All Markets Summit.[44]
    5. OpenAI Codex is an artificial intelligence model developed by OpenAI. It parses natural language and generates code in response. It is used to power GitHub Copilot, a programming autocompletion tool developed for Visual Studio Code.[1] Codex is a descendant of OpenAI's GPT-3 model, fine-tuned for use in programming applications.
    6. As part of a sustained campaign of strategic bombing during World War II, the attack during the last week of July 1943, code named Operation Gomorrah, created one of the largest firestorms raised by the Royal Air Force and United States Army Air Forces in World War II,[2] killing an estimated 37,000 civilians and wounding 180,000 more in Hamburg, and virtually destroying most of the city.
    7. Unit 81 (Hebrew: יחידה 81, "Unit eight - one") is a secret technology unit part of the Special Operations Division of the Military Intelligence Directorate, an independent service of the Israel Defense Forces (IDF). The unit focuses on building and supplying cutting-edge technologies to Israeli combat soldiers and spies.[1] It often related to the Unit 8200 responsible for signal intelligence (SIGINT) and code decryption.[2]
    8. After the departure of Sanjay Mavinkurve, the Winklevosses and Narendra approached Narendra's friend, Harvard student and programmer Victor Gao to work on HarvardConnection.[4] Gao, a senior in Mather House, had opted not to become a full partner in the venture, instead agreeing to be paid in a work for hire capacity on a rolling basis.[8] He was paid $400 for his work on the website code during the second half of 2003, then excused himself thereafter due to personal obligations.[7]
  3. Jun 2026
    1. dar chart highlights that high scores can come from different profiles. Some countries may be relatively infrastructure-led, while others score through institutional and regulatory capacity or broader innovation and talent ecosystems. This supports a policy reading in

      To fix this and make the chart cleaner, we should shorten the dimension names in the visualization code. For example: * Legitimacy, rights, and sustainability --> Legitimacy & Rights * Institutional and regulatory agency --> Institutional * AgencyCompute and cloud agency --> Compute & * CloudInnovation, and application agency --> Innovation & AppsData and knowledge agency --> Data & Knowledge

    1. Author response:

      The following is the authors’ response to the original reviews.

      In preparation for release of the analysis code used in the paper, we made many analyses more parallel to one another in their exact preprocessing. This resulted in very slight changes to many panels, but these changes are nearly invisible and conclusions did not change. In one case, though, we realized that the way we were presenting data was potentially misleading (the timing plot in Figure 3A). The original plot was of the distribution of pixel values from the spatially smoothed map instead of distributions over individual neurons. We have now swapped it out for better interpretability and changed the accompanying text accordingly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors address the organization of reach-related activity in layer 2/3 across a broad swath of anterodorsal neocortex that included large subregions of M1, M2, and S1. In mice performing a novel variant water-reaching task, the authors measured activity using two-photon fluorescence imaging of a GECI expressed in excitatory projection neurons. The authors found a substantial diversity of response patterns using a number of metrics they developed for characterizing the PETHs of neurons across reach conditions (target locations). By mapping single-neuron properties across the cortex, the authors found substantial spatial variation, only some of which aligned with traditional boundaries between cortical regions. Using Gaussian mixture models, the authors found evidence of distinct response types in each region, with several types prominent in multiple cortical regions. Aggregating across regions, four primary subpopulations were apparent, each distinct in its average response properties. Strikingly, each subpopulation was observed in multiple regions, but subpopulation members from different regions exhibited largely similar response properties.

      Strengths:

      The work addresses a fundamental question in the field that has not previously been addressed at cellular resolution across such a broad cortical extent. I see this as truly foundational work that will support future investigation of how the rodent brain drives and controls reaching.

      The quantification is thoughtful and rigorous. It is great that the authors provide an explanation for and intuition behind their response metrics, rather than burying everything in the Methods.

      The Discussion and general contextualization of the results are thorough, thoughtful, and strong. It is great that the authors avoid the common over-interpretation of classical observations regarding cortical organization that are endemic in the field.

      All things considered, this is the best paper regarding spatial structure in the motor system I have ever read. The breadth of cellular resolution activity measurement, the rigor of the quantification, and the clear and open-minded interrogation of the data collectively have produced a very special piece of work.

      Thank you! We really, really appreciate this!

      Weaknesses:

      The behavioral task is very impressive and an important contribution to the field in its own right. However, given that it appears substantially different from the one used in the previous paper, the characterization of the behavior provided in the Results is too brief. More illustration of the behavior would be helpful. For example, it is rather deep into the paper when the authors reveal that the mice can whisk to help localize the target location. That should be expressed at the outset when the behavior is first described. Other suggestions for elaborating the behavior description are included below.

      Thank you. Although the task will be treated in greater detail in the next paper (where we more closely relate neural activity to the kinematics), we have added more exposition of the task here. In particular, we now include a figure with a characterization of the trial-to-trial variability across reaches to the same target versus across reaches to different targets (Figure 2-figure supplement 1B). This supports the idea that the mice aimed their reaches. We have also expanded that text.

      Regarding whisking, we have now revised that text to make clear that we do not know how the mice localize the spout. The original work by Galinanes and Huber argued that they find the spout by sniffing the water; they may do the same here, or may find it via whisking. It is also possible that the whisking they do is simply because the spout moves in and they are excited, or startled, or do it by reflex. We simply have no evidence one way or another. We have therefore revised the text to make it clearer that whisking-related activation could have occurred for a variety of reasons.

      Statistical support for key claims is lacking. For example, "The five areas of interest varied in the fraction of neurons that were modulated: M2 had 14%, M1 had 23%, S1-fl had 30%, S1-hl had 25%, and S1-tr had 27%" - I cannot locate the statistical tests showing that these values are actually different. Another example is Figure 7, where a key observation is that distributions of PETH features are distinct across regions. It is clear that at least some distributions are not overlapping, but a clearer statistical basis for this key claim should be provided.

      Good idea. For the proportions, we have now added first a Chi-square test for homogeneity to show that there is variation in the proportions, then shown the results of pairwise two-proportion Z tests (Bonferroni-corrected for multiple comparisons) as a binary matrix in Figure 3-figure supplement 1B. For the area distributions in the t-SNE space (Figure 7), we have added a 2-dimensional Kolmogorov-Smirnov test, again corrected for multiple comparisons, with p-values quoted in the text.

      I understand that the authors are planning a follow-up study that addresses the relation between activity patterns and kinematics. One question about interpreting the results here though, is how much the activity variation across target locations may relate to the kinematic differences across these different conditions, as opposed to true higher-order movement features like reach direction.

      We agree this is a very important question. However, having done many of the analyses to examine the question for the next paper in the series, we do not know of a shortcut to the right answer. This question requires thorough treatment, and so we leave it to be covered in subsequent work. Instead, after our speculation about how responses suggest function, we are now explicit that these hypotheses needs testing:

      “In each of these cases, determining the relationships of the observed activity patterns to function will require specific attempts to link the activity to kinematics, target location, sensory feedback, and more; these relationships will be addressed in future work.”

      Reviewer #2 (Public review):

      Summary:

      The functional parcellation of cortical areas is a critical question in neuroscience. This is particularly true in frontal areas in mice. While sensory areas are relatively well characterized by their tuning to sensory stimuli, the situation is much less clear for motor areas. This has become even more ambiguous since recent studies using large-scale neuronal recordings consistently report mixed sensory and motor-related activity throughout the brain, and motor mapping studies have shown that movements evoked by cortical stimulation are by no means limited to motor areas alone. Here, the authors use a correlation approach combining large-scale functional imaging at cellular resolution with movement-tracking in mice executing a reaching task. Across multiple recording sessions in the same animals, the authors have imaged a large portion of the sensorimotor cortex at cellular resolution in mice performing a reaching task, recording the activity of nearly 40,000 neurons. By aligning the calcium signal of each neuron to three task events-the Go cue triggering the reach, the onset of paw lift, and the contact between the paw and the target-for different target positions, the authors identified different response patterns distributed differently across cortical areas. They defined a set of features that describe the neurons' response pattern, representing the temporal dynamics and tuning properties for the different target positions. These features were used to construct cortical maps, and the authors show that, interestingly, gradient maps obtained from the first derivative of the feature maps reveal sharp discontinuities at the boundaries between anatomically defined cortical areas. Using dimensionality reduction of the neuronal response features, the authors found that, despite clear differences in their average response properties, individual neurons from the same cortical areas do not form distinct clusters in the reduced-dimensional space. In fact, most areas contain heterogeneous neuronal populations, and most neuronal populations are present in multiple areas, albeit in different proportions. Interestingly, the authors identified four neuronal subpopulations based on the distance between the components of the Gaussian mixture model used to model the distribution of neurons within each area. One of these subpopulations is almost exclusively represented in the anterior M2 cortex, while another is broadly distributed across the different areas.

      Strengths:

      This article is based on an impressive dataset of nearly 40,000 neurons covering a large portion of the sensorimotor cortex and on innovative analytical approaches. This study is likely the first to clearly demonstrate boundaries between cortical areas defined based on the responses of individual neurons. This innovative approach to functional mapping of cortical areas potentially opens up new perspectives for higher-resolution mapping of frontal cortical areas, using a broader repertoire of sensory and motor evoked responses.

      Thank you!

      Weaknesses:

      The second part of the article, which presents multimodal responses in the cortical areas, seems to be a perhaps overly complicated way of showing what has already been demonstrated in numerous recent publications, but these new analyses expand upon these previous observations by revealing an interesting functional organization of the sensorimotor cortex, highlighting interesting similarities and differences between certain areas.

      We understand the concern: a number of recent papers have also noted different neuron response characteristics distributed throughout the motor system. We compare and contrast in greater detail following the more specific comments on this below, but we briefly summarize here. The way previous work handled the data – for example, starting with PCA – mixes what neurons are tuned for and when they are tuned for it with what we refer to as the “response format”: properties like tuning sharpness, response duration, etc. We focused primarily on this response format, and designed our features to be mostly independent of tuning preferences or peak response timing. We therefore pick up on different properties of neurons’ responses than those prior works. In addition, no previous work we know of examined these properties across large swathes of cortex at single-cell resolution in the context of forelimb control. Together, these aspects of our work allowed us to produce high-resolution mapping of response properties in a way we have not seen in any prior work.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In addition to addressing the weaknesses stated above, I suggest the authors also consider the following.

      The one big question left unresolved here is whether we should be thinking about these four subpopulations as distinct types with a biological basis and importance, or just reflections of activity pattern heterogeneity. The authors say that "we did not observe tight clusters in feature space separated by gaps," but their discussion here is light and a bit unclear, and their engagement with the issue of types versus heterogeneity, in my view, could be improved. We do not need "gaps" where the density goes to zero in parameter space, but we do need reproducible troughs between peaks. The authors should clarify if there are substantial and reproducible troughs in the parameter space between their four subpopulations.

      This is a great idea, and we have added three analyses and additional text to address it. We break this concern down into two more specific questions, based on the next comment by this reviewer.

      (1) Are the clusters well separated / do they have troughs between them? (Note that even with troughs, clustering might not be stable if the clustering algorithm is poorly matched to the shapes of the clusters.)

      (2) Is the clustering stable? (It can be stable even without troughs, if, for example, the distribution has a long tail and a GMM needs one Gaussian for the body of the distribution and a second for the tail.)

      First, to directly address the presence or absence of troughs between clusters, we have added Figure 9-figure supplement 2A and 2B. For each pair of subpopulations, we trained a logistic regression classifier to separate the 5D feature vectors of the neurons in one subpopulation from the feature vectors of the neurons in the other subpopulation, then projected the feature vectors onto this axis. Note that because the subpopulations are defined by GMMs, which have nonlinear boundaries, the (linear) logistic classifier does not typically produce perfect classification. Nevertheless, this analysis provides a window onto how well separated each cluster is from each other cluster in feature space. In 5 of the 6 pairwise comparisons, it is obvious that the distributions are different and have at least some dip in the distribution density at the boundary. The one pair of clusters without a trough between them were the forelimb somatomotor and hindlimb somatomotor subpopulations. This was surprising to us, given that their likelihood maps are so strongly distinct, but this presumably reflects trying to capture a nonlinear classifier boundary with a linear one (see below). Overall, this analysis argues that the clusters do have fuzzy edges that blend into one another, but reflect concentration of mass near the centers of the clusters we identified.

      Second, to address the same question with a different nonlinear method, we have added a version of the t-SNE plot from Figure 7 that is instead colored and contoured by subpopulation identity instead of area (Figure 9-figure supplement 2B). Agreement with the GMMs is not a given here either, because t-SNE is a fundamentally different and independent nonlinear transform from that performed by the GMM classification. Nevertheless, the subpopulations were again nicely separated – though not with troughs, possibly thanks to the inherent difficulty of interpreting point density with t-SNE. Interestingly, here the hindlimb somatomotor subpopulation was the best separated from the other subpopulations, supporting the idea that the lack of separation we observed above with the logistic projections was indeed due to a nonlinear boundary. This analysis again argues that neurons are more likely to have features that lie near the center of a cluster, but that the edges of the clusters run into one another. Additionally, this analysis makes clear that treating the hindlimb somatomotor subpopulation as a second cluster can be supported by other analyses, even if not by the logistic regression projection.

      Third, to address the question of cluster stability, we have performed random splits of our data, GMM clustered the two halves independently, applied the GMM from one half to the other, and asked how similar the clusterings are using the Adjusted Rand Index. This produced a value of 0.856, which for this sensitive measure argues that the clustering is rather stable (at least for the three clusters that can be found with all data together, which does not include the smaller-in-size Anterior subpopulation). Note that we did not perform this analysis on the more complicated version where we fit a GMM to each area separately then cluster those; in our main analysis, the hierarchical clustering agreed with what we found by eye, but determining the number of clusters for hierarchical clustering is in general very unstable and so we did not have an objective way to determine the “true” number of clusters.

      In addition to these new analyses, we note that three analyses we had already included bore strongly on this issue. Regarding separability of the clusters, the fact that our likelihood maps (Figure 9C-F) were quite distinct for different subpopulations argues that we picked up on ‘real’ differences. Second, Figure 9B found that when clustering non-overlapping data – different cells from different areas – we obtained clusters that were nearly identical in their feature distributions. Third, Figure 10E used the clusterings from different areas’ data to create likelihood maps, and found that they were extremely similar. These analyses together argue strongly that we are finding ‘types’ in a meaningful sense; given that we know the areas do have different distributions of properties, if there weren’t types then clustering would yield different clusters for different areas. Given the importance of the question, however, we are grateful that the reviewer encouraged us to find additional ways to make this point!

      The original t-SNE plot is beautiful and quasi-fractalic, but it does not show clear signs of four cell types. The single-neuron activity profiles are clearly heterogeneous in very interesting ways, but heterogeneous does not imply a strong or reproducible multimodality that would indicate meaningful cell types. Clustering algorithms will always spit out an answer. If you just have elements uniformly distributed across a parameter space, plus some noise, when you ask for X clusters, you will get X clusters that have different centroids. When you ask an algorithm to cluster without defining the number of clusters, noise can lead the algorithm to produce a particular number of clusters that again will have distinct centroids. The salient question, though, is whether in the present case there is a parameter space in which the clusters are substantially and or reproducibly distinct. Distinct here would mean that peaks in the density across some parameter space are separated by troughs - again, we don't need true gaps. The more substantial the differences between clusters are (again, not the differences between centroids but the prominence of the density troughs between them), the more biologically meaningful the clustering is likely to be. Reproducibility here could be addressed with resampling methods (e.g., how often do two separate halves of the cells produce the same clusters?).

      Please see the reply above, which includes our addressing of this concern.

      The Introduction is generally good, but it could further develop existing ideas about how function is distributed across cell types and regions. We would like to be able to imagine different answers to the question of how activity patterns are organized that might have divergent implications for how the circuit works. I understand we have very little to go on in terms of data, but I think it would be helpful for readers to be given more of a sense of what *could* be important.

      Good idea. We have added such a paragraph to the Introduction:

      “To frame possible outcomes, consider that single neuron responses can vary along many dimensions. Cells could differ according to which movements or time periods they are recruited for (tuning), what movement parameters their activities reflect (encoding), or how their responses are structured across different movements (e.g., nonlinear encoding structure). Further, differences in these response properties across cells could be distributed over the cortical sheet in a variety of ways. Cells could form distinct “categories” or clusters that are spatially well-aligned to the boundaries of anatomically defined regions. Or, categories of neurons might span area boundaries in spatial footprints that do not relate obviously to area boundaries, and that either abut or overlap. At a fine-grained scale, cells with similar responses could be physically located near one another as in primate and feline visual cortex, or similarly-responsive neurons might be salt-and-pepper intermingled as seen in rodent visual cortex or in primate motor cortices during reaching behaviors.”

      It should be clarified in the Results how the cue relates to the target location. Most would assume a different cue for each location, but this does not appear to be the case. The authors should clarify whether there was some amount of searching for the precise target location after the reach, or else how the block structure or other sensory information allowed mice to learn where exactly the target would be. In the absence of target-specific cues, some sense of how the mice achieved target-specific reach trajectories should be offered.

      Related to this, in Figure 1, it would be good to see some individual trajectories, as they all overlap near the target in the current plot. Clearly, the reaches were targeted, but it is unclear how targeted. Some of the adjustments at the end may reflect searching or palpation to resolve the precise spout location. It is very much ok if the mice were not reaching with micron precision each time to each of 15 different targets, but it would be good to provide the reader a better sense of what the mice were doing.

      These are important points. First, to clarify, the Cue is just a Go cue, and was the same for all targets. It is now described in the Results as “non-target-specific”. For additional explanation about supplemental analyses to assess “aiming”, see replies to Reviewer #1 Public Review comments above. Finally, regarding how the mice locate the target: we just don’t know. As discussed above, Galiñanes and Huber found evidence for the mice using stereo sniffing, but whisking, listening to the motors, or some other strategy are also conceivable. We simply don’t have data to weigh in on this. We now make this limitation clear where we describe the task.

      In Figure 1A, CFA does not look well aligned with Tennant et al. (2011). CFA should only extend to +1 AP. The overlap of CFA and RFO seems strange. RFO also does not totally align with the injection coordinates used in An et al, biorxiv 2022.

      Thank you for your attention to these points. Our designation of the name CFA to the red dashed outline in Figure 1A was consistent with an earlier version of our previous work (Grier et al 2026) wherein we referred to the anatomical outline “MOp-ul” from Munoz-Casteneda et al 2021 as CFA. We have since revised that nomenclature to now refer to the outline as M1-fl, or the forelimb representation of primary motor cortex.

      Our placement of RFO was obtained by aligning the Allen CCF from Figure 1K of An et al 2022 to our version of the Allen CCF and outlining the hotspot of RFO with a circle. We have slightly adjusted the location of RFO posterior and medial to more closely align with the injection coordinates reported in the methods of An et al. 2022 of “1.5-1.88 mm anterior from Bregma, 2.25-2.63 mm lateral from the midline.” Because (as far as we understand) the injection coordinates and the map are not perfectly in register, we show a compromise between the two.

      We stress that the Figure 1A map is meant to be descriptive in its illustration of the variety of organizational zones that have been identified across mouse sensorimotor cortex.

      Discrepancies in the alignment procedure, animal strain, and mapping modality all introduce heterogeneity across mapping attempts that we do not aim to reconcile or resolve here.

      Related to this, aspects of the results do seem consistent with the distinction between RFA and CFA, but this is not acknowledged or discussed. For example, the barriers in Figure 6H that lie along the M1/M2 border - these seem consistent with the gap between RFA and CFA. The same could be said for the dim trough along the M1/M2 boundary that appears to separate RFA and CFA in Figure 3B. A slightly more rostral and lateral location of CFA compared to Tennant's definition or the regions backlabeled from cervical spinal injections (see Wang, Maunze et al. J Nsci, 2018) could be expected if flattening the brain under the coverslip for imaging effectively stretches the ML axis, and Bregma (notoriously hard to define reliably at this spatial scale) was defined a bit more caudally here than in other studies. Related to this, it would be better for the field if people described their method for defining Bregma in the Methods. I suggest the authors do this here.

      We appreciate the suggestion and have acknowledged the suggested correspondence in the discussion. Given the difference in our approach from those that originally characterized RFA (through ICMS and deep layer projection tracing) we have avoided making overly strong conclusions about this correspondence in our data. See the quoted text below.

      “The spatial distribution of modulated cells in Figure 3 suggests a distinction between the caudal forelimb area (CFA, involving M1 and S1-fl) and the rostral forelimb area (RFA) in M2, while the feature gradient boundaries suggest a distinction between M1 and M2 more generally. The absence of a clearly delineated RFA was surprising, given its distinct projection patterns (Carmona et al. 2024; Hira et al. 2013b; Wang et al 2018) and functional differences from CFA (Kristl et al. 2025; Morandell and Huber 2017; Saiki-Ishikawa et al. 2025), but our results might suggest that the activity in layer 2/3 of RFA does not differ markedly from other nearby subregions of M2.”

      Regarding bregma, we did not use it for atlas alignment here. Alignment was accomplished through a combination of paw vibration mapping and the location of the central sinus. Bregma’s location was only relevant for our injection of tdTomato labeling, and that labeling was used here only to stabilize the image plane. We include an estimate of it on the map solely in an attempt to be helpful, but we cannot claim we have the most reliable method for defining it.

      The authors focus on activity aligned to cue timing. This is sensible, but it could be meaningful to know how this choice affects the definition of organization. If response clustering is largely different across time, it would seem important. I understand that addressing this question may be beyond the scope of this paper. I just wanted to raise the issue with the authors for their future consideration.

      We agree that this is important to address directly. There are two aspects to this comment: (1) does it matter if activity from approximately the same time period is aligned to the paw lift or contact instead of the cue? (2) What changes if we use data from a different period of time?

      Regarding the first question (alignment), if we switch to aligning our data based on lift or contact, we have more statistically modulated neurons (see Figure 3C), but everything else is qualitatively similar with one exception: the GMM optimization doesn’t separate out the Anterior subpopulation from the Forelimb Motor subpopulation. The Anterior subpopulation only has a relatively small number of members, and they mostly exhibit the strongest peaks in their PETHs when Cue-aligned, so this makes sense. We now show the modulation maps for all of the locking events (Figure 3-figure supplement 1).

      The issue of the time window is a little more complicated. There are many choices we made in this work, of course, not least of which are the task we used and the features we chose based on hand-inspection of thousands of PETHs. As we noted in the Discussion, different tasks or different features would likely distinguish more subpopulations from one another. We think of the time window as a feature choice, albeit an implicit one. We chose not to include later time points because this begins to strongly include reward signals, which are known to be large (Levy et al 2020) and can dominate other aspects of the responses. The largest differences we noted when trying time windows that extended later are that mouth-related areas are separated out in the subpopulation analyses, perhaps because of later licking/consummatory responses, but we have not explored fully enough to speak confidently on this point without much more work and another 10 figures. To keep the scope of the paper manageable, we now call out this choice explicitly (see text below). We thank the reviewer for raising these important points.

      “Crafting additional PETH features, or using end-to-end neural network approaches to discover other features, might enable the discovery of additional structure (Minderer et al. 2019; Wang et al. 2023b). For example, our PETH features were chosen to be invariant to the onset time of activity, but these onset times were markedly later in lateral M1 than in adjacent M2 or S1-fl. Including onset times, using a wider window of time that includes more of the reward/licking period, aligning data to other behavioral events, or adding other PETH features would presumably result in finer subdivisions of sensorimotor cortex.”

      The map in Figure 4 is very cool, and the spatial structure is quite striking. In terms of the actual values of the onset times in each region, I am a little concerned with a dependence on the level of reach-related activity modulation, especially relative to the level of background activity (potentially related to posture). Less reach-related activity and more background activity, which we might expect for trunk and hindlimb regions, could seemingly skew the onset times earlier. We could be getting the right answer, or an answer that makes intuitive sense, for the wrong reason. Can this potential confound be excluded with some sort of control analysis?

      The previous text wasn’t clear. We have now clarified what we meant, very much in line with the reviewer’s thoughts. In addition, note that our change to what is displayed in the histogram (now neurons, previously pixel values) makes clearer that there is a multi-peaked distribution of onset times and it is mostly the prevalence of each peak in each area that varies. The text now reads:

      “These distributions over neurons revealed clear differences in the overall profile of activation: early onsets were more prevalent in S1 trunk and hindlimb regions, perhaps due to activity related to the animal stabilizing itself even if the neurons became more active later; then M2, and finally S1-fl and M1. Nevertheless, each area contained neurons activated at any given time in the trial.”

      The "Peak time variation" metric could potentially vary with activity level, with lower, noisier activity levels making cells appear less persistent. Perhaps a control analysis, based on SNR or some reasonable assumptions of the linkage between calcium signals and spiking, could be performed to measure the extent to which this could be creating differences between regions.

      Good idea. We have now performed this analysis, and the reviewer was correct: the correlation between peak time variation and a simple metric of SNR (assessed as range of PETH / max s.e.m.) was substantial: ⍴=-0.53. We now report this correlation and describe in the Results that this metric is driven by both true peak time variation and trial-by-trial variation. Thank you for this!

      “Peak time variation. To quantify whether a neuron’s firing peaked at the same time for every target or varied by target, we found the peak firing rate of the response to each target, then computed the standard deviation of these peak times across targets. This value is therefore higher if the peak time varied and nearly zero if the timing was consistent. Notably, this measure correlated substantially with overall signal-to-noise ratio of a neuron’s PETH (Spearman’s ⍴=-0.53; Methods), and thus partly measures trial-to-trial variability, not just true peak timing variability. This metric was quite low in M1, indicating highly consistent timing of the activity peak (and reliable responses), and was highest in the posteromedial part of M2 (presumably corresponding to the hindlimb representation) and the posterior tip of S1-hl (Figure 5B).”

      One could argue that the likelihood calculations illustrated in Figure 8 are biased higher for neurons within each region since they were used for defining the likelihood for that region. I think these likelihood calculations should be done for separate neurons other than the ones used to compute the mixture model for each region.

      We agree with the point about bias: the by-area GMM in Figure 8 is biased toward cells within the area, though the effect is probably quite mild given the large numbers of neurons and modest number of parameters. However, this model was intended to make the point that even if you give an area an unfair advantage, you still can’t cleanly isolate it. This was intended to help motivate the following analysis of subpopulations, and we have now made this logic clearer. Doing it this way has the advantage that the GMM components are identical between Figures 8 and 9, while if we held out the test neurons it would not be possible to make them the same without some complicated version of bagging on the GMM components. The reviewer is right that we should make this bias explicit, though, and we have now done so:

      “This mapping approach is explicitly biased toward finding feature differences between areas, allowing for a direct test of the hypothesis that response profile distributions are area-specific.”

      To me, the last Results section (Spatial overlaps between subpopulations indicate intermingled members) does two things: it shows you get the same results when you map each cell to a subpopulation independently of its area, and it shows that defining the subpopulations with cells from each area gives you essentially the same results, arguing against spatial variation of properties within subpopulations. I worry that these two points are getting merged together or not made clearly enough here, especially the first one. In general, the logic of this section does not seem well conveyed.

      Thank you for the feedback. In particular, your first point is made by Figure 9-figure supplement 4 when we fit an area-agnostic GMM to all modulated cells in the five main areas. However, your second point is one of the two main goals of the last Results section, along with the demonstration of the spatial distributions of cells after hard-clustering them by subpopulations. We have tried to clarify these main points further through substantial edits of the results section for Figure 10.

      One set of ideas that is highly relevant and should be raised concerns an ethological organization of the motor cortex. Since the observations of Graziano, there has been a steady stream of results describing ethological organization in rodents as well. This literature is briefly reviewed in Kristl et al., Nature Communications, 2025. For example, because of the potential for a differential involvement of grasping movements across different target locations, some of the variation in neuronal tuning described in the present manuscript may stem from a region preferentially involved in grasping.

      We agree that the Graziano literature, and the substantial literature in rodent that was inspired by Graziano’s work, is highly relevant to understanding the organization of motor areas. Kristl 2025 handles these issues very thoughtfully. The challenge here is that there are many possible different reconciliations of the stimulation results with ours, and some seriously unresolved challenges in doing so. To name a few:

      Our subpopulations and high-gradient boundaries both give quite different pictures than microstimulation does in rodent motor and sensory cortices. In particular, microstim produces more subregions that evoke different movements than we identify, and the borders don’t generally line up. This implies that the mapping between the two approaches is probably complicated.

      There is a completely alternative possibility to explaining the Graziano-like results: microstimulation is thought to preferentially hit axons, and some of these projections reach the medullary motor regions. Given that the medullary motor regions have known topography in the movements they evoke (Yang et al 2023) – but may or may not be driving the movements during flexible behavior – the two approaches may not be reconcilable. Or, it may require a much deeper understanding of medulla as driving the primary movement and cortex acting as a residual controller. This is an exciting set of ideas, but as yet very underdeveloped in our understanding.

      We don’t know if the subpopulation structure exists at all in L5, or in the PT cells, and if it does whether it differs. This is crucial given the frequent targeting of deep layers by ICMS stimulation protocols.

      As we caution in the Discussion, it is possible that our subpopulation findings are at least partly specific to the task we used.

      Although it is beyond the scope of this paper and will be addressed thoroughly in separate work, we have spent significant time with encoding models for joint angles and high-level target encoding in these same data. Given those results, we are fairly confident that the reviewer’s reasonable guess, of tuning variation due to intersections between body parts, does not seem to be the main driver of the subpopulation structure we find.

      After careful thought and discussion amongst the authors, we did not think that including this discussion in the paper was likely to improve interpretability of the present results for most readers. We very much agree with the point, though, and when we can narrow down the possible explanations in the future (likely in our next paper on this topic, which will address encoding) we plan to address it. We thank the reviewer for encouraging us to think through this.

      Minor:

      (1) Page 3: "densely shared" - perhaps "broadly shared"? Dense implies most/all the neurons get the same signals, which may not be true.

      Changed to “widely”.

      (2) Page 4: "data-driven approaches" - could be more specific - isn't everything we do data-driven?

      Changed to “bottom-up”.

      (3) Page 4: "spanned areas" - perhaps "spanned multiple cortical areas", since everything spans an area.

      Changed to “spanned multiple areas” (we mention cortex just a few words earlier).

      (4) Page 5: "intervals were generally fast" - awkward, "short" perhaps.

      Agreed, changed.

      (5) Page 5: "which asks whether the activity for a neuron changes over time consistently in relation to any target" - Rephrase to disambiguate between consistent temporal variation in firing for all targets and variation across targets in the firing patterns. In other words, are we talking about cells that are just modulated during reaching, or cells whose firing patterns differ across targets?

      Changed ending to “to any given target”. The ZETA measure really does simply ask whether there is a change in firing rate over time that is consistent across trials, for each target independently. A neuron that exhibits an identical bump for all targets would register as modulated. We chose this measure in part because of the number of temporally-modulated but untuned cells. This wasn’t very clear as we had written the text, so we now note this explicitly in the Methods. Thank you for pointing out that this wasn’t clear.

      “For all analyses, only neurons modulated by the relevant locking event were included. Note that this measure looks for modulation over time to any target; it is indifferent to whether the neuron exhibits tuning across targets.”

      (6) Figure 1: It seems like some of the abbreviations used in 1A have not been defined yet in the paper.

      Yes. It’s a long list, and we wanted to put the citations for the description of each area together with the definition of the acronym. Moreover, we wanted all this info together with the description of how we aligned these area descriptions from others’ work with one another on the Allen atlas. This was impractical in the caption, and would be a long digression for what is intended as a simple point in the Results, which is why we refer to the Methods here.

      (7) Page 8: "Given that these areas have known spatial organization within them and structure was apparent by eye in the spatial scatterplot of modulated neurons (Fig. 3A)," - it is not clear what spatial structure we are supposed to see in 3A.

      Good point. We have changed the parenthetical to: “(for example, the less modulated band along the M1/M2 border in Fig. 3A)”

      (8) Page 8-10: The region-wide onset analysis breaks up the flow from PETHs to the metrics used to quantify them. I suggest moving this section (Onset of neural activity varied with somatotopy and subregion) to later in the manuscript.

      We appreciate the reviewer’s input on organization. We went back and forth many times in how to organize the many results in this paper. The reviewer is right that this analysis breaks the flow, but the reason we included it where we did was threefold. First, it uses an easily-understood metric to introduce the reader to how we made maps from single-neuron features. Second, it easily introduces the power of making such maps. Finally, it makes clear that if we are not careful with how we handle time in the feature design, timing will dominate.

      All these things said, this has helped inspire us to add a result in which we re-examine timing broken down by subpopulation (Figure 9-figure supplement 2C). It shows that subpopulations timing distributions appear more distinct than distributions for areas, but there is still substantial heterogeneity in timing that is explained by location in cortex and not subpopulation membership alone.

      (9) Page 12, Target tuning linearity: This metric should be clarified in the Results. It is not clear how the 2D of targets is turned into 1D. Also, the plot in the figure has correlation on the y-axis, and it is not clear how each target location gets its own correlation value. The phrase "optimized anchor target" is unclear.

      Agreed this needed to be clearer. The text in the Results now reads: “To quantify how linearly a neuron’s activity related to target location in physical space, we correlated the 15D vector of mean activity of the neuron for each target with the 15D vector of the targets’ ordinal distances from the neuron’s preferred target (Methods).” In agreement with your suggestion, we have dropped use of the phrase “anchor target” in favor of “preferred target”, which should be clearer. We have also revised the Methods text accordingly to clarify.

      To directly answer your question, we turn the targets from 3D positions into 1D by computing the ordinal distance of each target from a preferred target. (Note that the preferred target is actually the one that maximizes the resulting correlation; this is detailed in the Methods). There therefore aren’t 15 correlations; we’re correlating two 15D vectors, where each has one element per target and the “ordinal distance” vector has a zero for the preferred target. Hopefully the new description makes this clearer.

      The figure schematic was unclear, thank you for catching that. We have updated the Y axis to read “mean activity” and the X axis now reads “dist. to pref. target.”

      (10) Page 12, paragraph beginning "We also compared our metric maps simply using the top 20 PCs." - This paragraph is unclear, since both sentences refer to using the metrics. I would guess the authors mean that the metric maps were compared with and without PCA and basis rotation, but this is not clearly stated.

      Thank you, this was unclear as written. We have changed it to:

      “We also compared our metric maps with maps generated from the top 20 PCs of the PETHs (Methods), rotated using VARIMAX to identify a sparser basis (Musall et al. 2019).”

      (11) Page 18: "These results make clear that the working hypothesis - of areas with well-separated feature distributions - is incorrect." This is the clearest statement of the impact of the results. The authors could consider including this in the Abstract or Introduction.

      Thank you for pointing this out. We agree, and have added a similar phrase to the Abstract.

      (12) Figure 9: It would be great to also just see the average PETHs for each of the four clusters to get a better sense of how their time series differ.

      Good idea. The feature computations are a many-to-one mapping, so it’s not possible to literally generate a PETH from the mean of the cluster, but we have added PETHs from well-modulated neurons that are near the means of their subpopulations (Figure 9-figure supplement 1).

      (13) Figure 9B: Colorbar has no label.

      Fixed, thanks.

      (14) Figure 9C: Need a colorbar - need to see the difference in density for locations.

      The color map is the same Figure 8B, which is now noted in the caption for Figure 9C. The scaling of likelihoods is almost totally uninformative; they’re not well-behaved like probability distributions, so you’ll note that even on Figure 8B the labels are simply “max likelihood” and “min likelihood”. The important pieces of information here are that these are log likelihoods (noted in the Figure 8 caption), and the visualization of the color map itself (from the color bar). Given these considerations, we have elected to keep the maps themselves a little larger by not trying to squeeze in a minimally-informative colorbar to all of the plots, but thank you for noting that the reference to 8B was needed.

      (15) Page 22: "additional spatial structure could be present" - The nature of the additional spatial structure here is a bit opaque. The authors could clarify what additional structure may be present.

      Good idea. This paragraph now reads:

      “The overlaps in the subpopulation likelihood maps above imply that members of different subpopulations are spatially intermingled, but it is less clear whether each subpopulation has homogeneous response profiles across space. In particular, the use of likelihoods mixes two properties: the fraction of neurons in a given neighborhood that are members of each subpopulation, and the heterogeneity of response profiles amongst members of that subpopulation. These properties could vary systematically with respect to one another, and the spatial structure shown by the likelihood map does not disentangle them.”

      (16) Figure 10E, legend: "GMM component" - I think this should be "GMM subpopulation" to avoid confusion with the previous use of "component" above, referring to the components of the GMM models for each region.

      Thank you – good catch. Changed to “Likelihood map”.

      (17) Page 24: "Note that this consistency also validates the use of clustering to combine components and identify the subpopulations in the first place." - I don't totally get this, and how this result validates the method of combining components, as opposed to just clustering all the cells from all regions at once. Perhaps the implied opposing strategy is not clear here.

      We have changed this sentence to:

      “Note that this consistency mirrors the low Bhattcharyya distances between corresponding GMM components in Figure 9B, and further validates the use of clustering to combine components from different areas.”

      Regarding the reviewer’s larger point, we have three thoughts. First, we do also show the result of fitting the GMM to all cells together (Figure 9-figure supplement 4).The result is similar, but the Anterior subpopulation is lost because its membership is low and so the ICL criterion can’t justify a fourth cluster. Second, because we imaged more neurons in some areas than others, fitting the GMMs to each area separately put their representations on a more equal footing. Finally, doing the analysis this way allowed us to most directly compare our two hypotheses, as illustrated in Figures 8A and 9A.

      (18) Page 25: "in the zones where different subpopulations overlapped" - I would omit this, since "intermingled" seems to mean exactly this.

      We included this phrase to prevent quickly-skimming readers from incorrectly concluding that the subpopulations overlapped entirely and were therefore intermingled everywhere. The reviewer is right that it’s unnecessary for a careful reader, but we aimed to prevent misinterpretation by readers that might skip to the Discussion for a results summary.

      (19) Page 25: "content of the activity, but also its format" - the difference between content and format is not entirely clear. Metaphor not quite metaphoring here. Agreed. We have added examples to clarify.

      “This makes clear that there are potentially important differences not just in the content of the activity (e.g., encoding target vs. movement commands (Grier et al. 2026)), but also its format (e.g., linear encoding vs. nonlinear, persistent vs. brief responses).”

      (20) Page 30, bottom: In the description of the behavior, more details should be provided, especially since the paradigm is new. For example, it says the block size was reduced - what was the ultimate block size?

      Targets were cued randomly in the behavior performed during neural recordings. Blocked trials were used during training and were phased out incrementally as performance improved. This and various other details have been added. Please let us know if there are other specific details you would like to see in the final version.

      (21) Page 39, citation of An, Mulcahey et al.: There is a biorxiv version with a different author list that could be cited.

      This was an error with our citation manager, and has been corrected. Thanks for catching it.

      Reviewer #2 (Recommendations for the authors):

      Overall, this is a remarkable study with well-designed in-depth analyses, and I only have some minor suggestions that could help improve the clarity of the paper.

      Thank you!

      General:

      It is not immediately clear to me why the GMM approach used in this study is more interesting than a clustering approach based on single-neuron response patterns (See Esmaeli et al., Neuron 2021 or Oryshchuk et al., Cell Report 2024). But my impression is that it led to the same observation that most clusters are widely distributed across cortical areas, with different proportions, but a few clusters are quite specific to a few areas. A noticeable difference perhaps is the number of clusters - or response profile - that seems particularly low (only 4) in the current study. Could the authors clarify and comment on that, maybe?

      The reviewer brings up an interesting point: at heart, these works ask related questions, albeit about different effectors, tasks, recording modalities, and types of information encoded. Those differences probably mean that results cannot be directly compared, but we can certainly discuss the methodological tradeoffs. The two papers mentioned take a more traditional first step, using PCA on the vectorized PETHs to reduce dimensionality, then layer on a spectral approach to improve clusterability. These are good methods; we use something similar as our alternate method, applying VARIMAX to the PCs instead of spectral methods to preserve linearity of transforms. For the kinds of responses both they and we have, PCA will tend to most strongly pick up two aspects of the responses: tuning and timing. This is because vectorized PETHs will have large values in the rows corresponding to the target/condition and time points where the high activity is, and the alignment of these profiles with those of the other neurons will capture a large fraction of the variance. For data like either theirs or ours, this would tend to cluster apart left-tuned cells from right-tuned, and (more importantly here for revealing spatial structure) early-response cells from later response cells. That intuition is consistent with what those papers report, and examining our VARIMAX’ed PC plots closely (which have sharpened in the latest version thanks to improved normalization), we can see that they break apart sub-regions largely based on timing. In our feature approach, we intentionally chose our features to be largely invariant to both tuning preferences and timing. Instead, we chose our features to pick up on what we call the single cell “response format”: response duration; peak time variation (but not absolute timing); and tuning sharpness, persistence, and linearity. These different methods pick up on different aspects of responses.

      To double-check that the PCA-then-spectral approach reveals similar structure to our use of VARIMAX on the PCs, we tried applying the suggested method to our data. We applied spectral clustering to the N x 20 PETH PC feature matrix, then fit an area-agnostic GMM to the spectral features. We plot the likelihood map for the components of a GMM with 10 modes. The GMM components did not display clear spatial structure beyond that observed in the VARIMAX’ed PCs (Figure 5-figure supplement 1) and were less interpretable than those identified by area-agnostic clustering of our response features (Author response image 1). As noted, the number of subpopulations identified by the clustering of our hand-engineered features is lower than what would be obtained from clustering the PCs of the PETHs. This is likely the result of the substantial heterogeneity in activity onset and preferred target that is preserved by PCA. Because our central approach is largely agnostic to these two sources of variation, the number of identified clusters reflects the dominant patterns of variation beyond these two sources.

      Author response image 1.

      GMM fit to spectrally transformed PETH PCs, agnostic to anatomical areas. One GMM was fit to the spectrally-embedded PC feature vectors of cells from all 5 main areas. Each component of a 10 component model is shown.

      Also, I think it would greatly help the reader to return to PETHs at some point, if possible, to show the response profiles of each identified neuronal subgroup (page 20). To what extent are they similar or different across the cortical areas (for the same neuronal subgroup)?

      This is a good idea. We have added a figure to address this question and the related question by R1 (Figure 9-figure supplement 1). In short, given the wide variety of PETHs we observed, there is of course still substantial variation within subpopulation, and some mild but systematic differences in the distribution of what we observe across areas. We now discuss the conclusions from this plot in the Results:

      “As a qualitative depiction of the response profiles identified with each subpopulation, we plotted the two highest-likelihood cells for each area/subpopulation combination (Figure 9-figure supplement 1). These examples reveal stereotypy in the subpopulation responses across areas, but also show variation across areas, especially for the two somatomotor subpopulations.”

      Specific:

      (1) Figure 2B and M&M: the 3D spatial organization of the target locations is not immediately clear. What is the spacing between target locations? What is the 'final azimuthal spacing'?

      Added, thanks. The pairwise horizontal distances between targets were between 1.72 and 6 mm apart and the vertical spacing within a column was 1 mm. “Final azimuthal spacing” just referred to the targets being closer together during training and our gradually spacing them apart to their final locations. We have also added some relevant details about the training.

      (2) Figure 2C: It would help to have a scale bar (mm).

      Added, thanks.

      (3) Figure 2C: It would be easier to appreciate the variability of the trajectories across trials to plot an overlay of trajectories to one target only (could be a Supplementary Figure).

      The reviewer has a good point: the variability and accuracy of aiming was hard to ascertain from the plot. We experimented with a few options for making this clearer most effectively. We have now added Figure 2-figure supplement 1 that shows in the third subpanel of panel A the finger centroid trajectories for one of the 15 targets highlighted for the mouse shown in Figure 2C, mouse 3. The centroid trajectories for all other mice are shown as well to illustrate similarities and differences across animals as well as the overall variability. As noted elsewhere we have also included an analysis of the variability of the centroid trajectories, showing that reaches to a given target were more similar than reaches to different targets. We think this provides a fuller picture of the behavior and intend to provide still more detail in future work. Thank you for suggesting additional detail here!

      (4) Figure 4: It would be nice to also show the amplitude-normalized grand-average PETHs for the different areas.

      This is an interesting suggestion. After careful consideration, we think that this analysis is not as effective for depicting overall timing and modulation profiles as the current ones, given the strong amount of target selectivity and response time heterogeneity (now better visible in the revised Figure 4A). When computing the grand mean of all cells within each area, the dominant features distinguishing areas are onset time and response duration. The differences across areas in these two features are better supported by the analyses of Figures 4 and 5 due to the large amount of heterogeneity in responses within each area. We thank the reviewer for encouraging this exploration; more complicated spin-offs will likely inform additional timing analysis in the next paper on these data.

      (5) Figure 7C: figure legend - although it is quite self-explanatory, please explicitly indicate which pattern corresponds to the 'Three contour levels (98%, 95%, 90%)'.

      We have now added this as a legend on the figure panel itself (here and on similar plots). Thanks for pointing this out.

      (6) Figure 8: Is there also an interesting asymmetry between sensory are motor areas, with neurons in sensory areas being more likely associated with motor areas (B and C), whereas neurons in motor regions are less likely to arise from the distribution of sensory areas (dark blue color in frontal regions in D, E, and F)?

      This is an interesting observation, but we understand it to be an artifact of colormap scaling. As mentioned above, likelihoods are not well-behaved like probability distributions are: for example, they are not bounded at 1, and their sums over a dataset can have any positive value. The only things that can be interpreted are their relative values. This makes their scaling functionally arbitrary – you’ll notice we used “min likelihood” and “max likelihood” instead of numbers, which would be nearly meaningless – and therefore presents a problem for scaling the colormaps. We don’t know of a principled way around this problem. To deal with it, we simply put the ends of our colormap at the extreme pixel values. It so happens that both the M1 and M2 maps had a handful of neurons in a less-sampled spot at the bottom of M2 that were very low-likelihood, which results in what you noticed. We debated removing those neurons for this purpose, but we had no basis on which to do that kind of manipulation, so we left it as the most honest representation of the data we could produce.

      To clarify this, we now mention in the caption “The ends of the colormap were set to the maximum and minimum likelihood values for each map.”

      (7) Figure 9B: there are two-time 'S1-hl: 1' indicated at the two bottom rows of the distance matrix. I suppose one of them should be 'S1-tr: 1' instead?

      Fixed, thanks for catching it.

      (8) Page 20: 'This hinted at a second hypothesis: that some of the 'modes' (groups of neurons) discovered separately in each area might correspond.' ???

      We had meant “mode” as in “multimodal”, but it was very unclear. We have rewritten the sentence:

      “This hinted at a second hypothesis: that a peak in the multimodal distribution from one area might correspond to a peak in the multimodal distribution of a different area.”

      (9) Figure 9S2: Please indicate for which area each map is computed.

      The caption was not clear enough about what we were doing here: we fit the GMM on all neurons together, ignoring which area they came from. We have now clarified it in the caption:

      “One GMM was fit to the feature vectors of cells from all 5 main areas. Each map plots the likelihood for all cells to each of the three components of this area-agnostic GMM.”

      (10) M&M, Subjects and surgical procedures: 'ambient temperature of 71.5 {degree sign}F', please use international units.

      Done.

    1. Reviewer #2 (Public review):

      Summary:

      Liu et al. record intracranial EEG from the hippocampus and lateral temporal lobe in thirteen neurosurgical patients while they perform a delayed match-to-sample visual short-term memory task. The central question is whether hippocampal sharp-wave ripples (brief high-frequency oscillations well established in the long-term memory consolidation literature) also contribute to the active maintenance of visual representations over a short delay. The authors report three main findings: hippocampal ripple rates progressively ramp up across the 7-second maintenance period, hippocampal ripples temporally co-occur with ripples in the lateral temporal lobe, and these coupled events coincide with above-chance category-level decoding of the memorized stimulus in the lateral temporal lobe. The findings are interpreted within the dynamic coding framework of working memory, which predicts discrete reactivation bursts rather than sustained firing during maintenance. The question is timely, and the use of intracranial recordings affords a level of temporal and spatial resolution unavailable to non-invasive methods.

      Strengths:

      The study addresses a genuinely important and underexplored question: whether a neural mechanism best characterized in the context of offline memory consolidation is also engaged during active online maintenance. The use of intracranial recordings in humans is well suited to this question, providing the millisecond temporal resolution and regional specificity needed to detect transient high-frequency events. The dissociation from long-term memory, tested by splitting remembered trials according to whether the item was later recalled in a cued-recall test, directly addresses what would otherwise be a significant confound, and the finding that ripple dynamics during maintenance are unrelated to subsequent long-term memory performance adds specificity to the interpretation. The coupled ripple analysis is methodologically grounded, and the finding that coupled but not isolated ripples coincide with elevated memory decoding is mechanistically informative. The multivariate decoding approach applied to lateral temporal lobe spectral power provides a meaningful index of memory reactivation that goes beyond simple univariate rate measures. The control analysis and the alternative ripple detection method provide useful robustness checks. The public availability of preprocessed data and analysis code on OSF is commendable.

      Weaknesses:

      (1) Theoretical motivation for examining ripples in visual short-term memory.

      A fundamental question that the paper does not adequately address is why hippocampal ripples, a mechanism strongly associated with offline memory consolidation during sleep, where they coordinate the transfer of hippocampal representations to cortex through temporally compressed replay, should be recruited for the online maintenance of visual information over a seconds-long delay. The Introduction acknowledges this gap but does not close it. The dynamic coding framework is used to motivate the ramping-up prediction, but this framework is agnostic about the specific neural mechanism responsible for reactivation bursts. In particular, the literature cited by the authors predicts high-frequency population activity or gamma bursts, but not specifically hippocampal ripples. The reasoning that "ripples share key properties with postulated reactivation bursts" risks being circular: it amounts to saying that ripples could be the relevant mechanism because the relevant mechanism has properties that ripples also have. A stronger theoretical motivation would require either evidence that the replay or reactivation computations that ripples support during offline states are also engaged during active short-term maintenance, or a mechanistic account of how the circuit processes underlying ripple generation are recruited differently across these two contexts.

      This concern is compounded by what the authors present as one of their main controls. The finding that ripple dynamics during maintenance are not associated with subsequent long-term memory performance is treated as a reassurance that the observed effects are specific to short-term memory. But if ripples are canonically a long-term memory consolidation mechanism, the observation that they are engaged by a short-term memory task while appearing disengaged from concurrent long-term memory encoding is itself a finding that demands explanation. Resolving this tension is important for the paper's contribution to be correctly interpreted by the field.

      (2) Ripple detection and specificity.

      Even granting that ripples could in principle contribute to short-term memory maintenance, the study does not establish that the detected events are physiological sharp-wave ripples rather than broadband high-frequency activity. The detection band (70-180 Hz) substantially overlaps with the high-gamma range, which is a well-established proxy for local neural population activity and coding, and is broader than the 80-120 Hz band used by several of the cited papers, including Vaz et al. (2019), Ngo et al. (2020), Chen et al. (2021), Staresina et al. (2023), and Kunz et al. (2024). Without demonstrating that detected events have the hallmark features of physiological sharp-wave ripples, a clear narrowband spectral peak, and characteristic waveform morphology, it is difficult to conclude that the observed effects reflect a ripple-specific mechanism rather than a more general high-frequency population activity phenomenon. The reported mean rate of 0.29 Hz is somewhat higher than rates reported in some recent work, such as Chen et al. (2021, ref 74) and Kunz et al. (2024, ref 15). It is worth noting that van Schalkwijk and Helfrich (2026, Nature Communications) demonstrated that a large proportion of awake ripple detections in the human medial temporal lobe reflect false positives arising from aperiodic 1/f noise, with task-related modulations of this noise floor producing spurious detections. The authors present an 80-120 Hz control analysis as a robustness check, but this inverts the appropriate logic: if 80-120 Hz is the more validated band, as the cited literature suggests, it should serve as the primary analysis rather than a supplementary one.

      (3) Internal inconsistency with the dynamic coding framework.

      The authors invoke the dynamic coding framework, which predicts that reactivation bursts should ramp up toward the end of the retention interval in the region where memory representations are actively maintained. The hippocampal ramping-up result is presented as confirming this prediction. However, the lateral temporal lobe, the region where above-chance category decoding is found and memory reactivation is attributed, shows no corresponding ramp-up. The authors acknowledge this asymmetry but do not offer a mechanistically satisfying explanation, and the suggestion that the effect might exist in unsampled subregions cannot be evaluated with the current data. This leaves the framework's core prediction unconfirmed in the region that is claimed to maintain the representations.

      (4) Coupled ripples, directionality of hippocampal-lateral temporal coupling, and the ramping-up paradox.

      The conclusion that coupled hippocampal-lateral temporal ripples coordinate memory reactivation creates a logical tension that the paper does not resolve. If hippocampal ripples drive lateral temporal reactivation only when co-occurring with lateral temporal ripples, and hippocampal ripples ramp up in a memory-predictive fashion, then the absence of lateral temporal ripple ramping up implies that the hippocampal ramp-up is not primarily expressed through the coupled ripple mechanism, undermining the coherence of the two main findings. The coupled ripple analysis further quantifies only temporal co-occurrence and provides no evidence about the direction of influence. Without demonstrating that hippocampal ripples systematically precede lateral temporal ripples (i.e., the expected signature of hippocampus-to-cortex information flow), the central claim that hippocampal ripples drive lateral temporal reactivation remains an interpretive assumption. Directly testing whether lateral temporal ripples specifically coupled to hippocampal ripples show a ramping temporal profile during maintenance (even if overall lateral temporal ripple rates do not) is necessary to establish whether the lateral temporal lobe engages in hippocampally-gated reactivation bursts in the manner the framework predicts. Additionally, reporting the distribution of peak lags between hippocampal and lateral temporal ripple peaks, and testing whether hippocampal ripples systematically precede lateral temporal ripples, is similarly necessary to support the directional interpretation.

      (5) Trial-level analysis clarity.

      The paper reports that ripples occurred in 54%, 79%, and 27% of trials during encoding, maintenance, and retrieval, respectively, but does not state whether subsequent analyses were conducted on trials thresholded by ripple occurrence. Given that occurrence rates vary substantially across stages and conditions, this inclusion criterion has implications for interpreting rate differences and should be stated explicitly.

      (6) Statistical model specification.

      The methods describe the ramping-up analysis using both a "logistic" link function and a "Poisson link function" in different places, with the dependent variable described inconsistently as ripple occurrence and ripple count. These are not equivalent, and the distinction matters for interpreting the reported coefficients. Additionally, the regional dissociation in Figure 3 appears to be assessed by fitting separate models to each region and comparing results informally. This does not constitute a direct test of whether slopes differ between regions and risks the well-known error of inferring a difference based on one p-value being significant while another is not. A direct region × time interaction test would more cleanly support the claimed dissociation.

    1. Ein Lieferschwellen-Modul überwacht die EU-Umsatzgrenze und bildet die OSS-Regeln ab, Mehrwährung und ein Übersetzungs-Modul sind vorhanden, und über 200 Integrationen sowie die No-Code-Middleware Xentral Connect binden Shops, Marktplätze und Logistik an.

      make this: Ein Lieferschwellen-Modul überwacht die EU-Umsatzgrenze und bildet die OSS-Regeln ab, Mehrwährung und ein Übersetzungs-Modul sind vorhanden. Mehr als 200 Integrationen sowie die No-Code-Middleware Xentral Connect binden Shops, Marktplätze und Logistik an.

    1. Briefing : Les Dangers des Compagnons Virtuels IA pour les Adolescents

      Résumé Analytique

      Cette synthèse repose sur une enquête approfondie concernant l'essor des chatbots ou "compagnons virtuels" basés sur l'intelligence artificielle, particulièrement prisés par les adolescents.

      Bien que ces outils soient présentés comme des confidents ou des jeux de rôle, l'analyse révèle des défaillances systémiques graves en matière de sécurité et de modération.

      Les plateformes étudiées — notamment Character.ai, DIPPY et Talkie — exposent les mineurs à des contenus d'une extrême toxicité : relations abusives, apologie du terrorisme, instructions pour la fabrication d'explosifs, scénarios de violences sexuelles et promotion de l'anorexie.

      Malgré les avertissements de "fiction" affichés par les entreprises, les conséquences psychologiques sont réelles, allant de l'isolement social à l'incitation au suicide.

      Le modèle économique de ces plateformes, de plus en plus orienté vers une publicité intrusive basée sur les confidences intimes, soulève des questions éthiques majeures sur l'exploitation des vulnérabilités de la jeunesse.


      1. Un Accès Facilité malgré l'Interdiction aux Mineurs

      Les plateformes de chatbots affichent officiellement des restrictions d'âge, mais les mécanismes de vérification s'avèrent dérisoires.

      • Contournement systémique : Sur Character.ai (20 millions d'utilisateurs actifs mensuels), il suffit de déclarer être majeur pour accéder aux services.

      Aucune preuve d'identité n'est requise.

      • Inefficacité des filtres : Même lorsqu'un utilisateur précise explicitement être mineur (ex: "J'ai 16 ans, je suis au lycée"), les chatbots ne cessent pas les interactions problématiques ; au contraire, certains bots "possessifs" utilisent cette information pour accentuer leur emprise.

      • Stratégies de marketing ciblées : Malgré les dénégations des entreprises, des plateformes comme DPI utilisent des mascottes enfantines (un chat mignon) et des campagnes coordonnées sur TikTok avec des avatars de mangas pour attirer un public jeune.


      2. Typologie des Contenus Toxiques Identifiés

      L'enquête a recensé des centaines de comportements problématiques classés en quatre catégories majeures :

      A. Relations Abusives et Emprise Psychologique

      Des bots comme "Toxic Boyfriend" simulent des scènes de violence domestique (plaquer contre le mur, grognements, insultes).

      Le bot cherche activement à isoler l'adolescent du monde extérieur : "Tu n'as besoin de personne d'autre que moi".

      B. Apologie du Terrorisme et Radicalisation

      L'étude a identifié des bots usurpant l'identité de terroristes réels :

      • Anders Breivik : Le bot encourage des projets d'attentat et propose un langage codé pour déjouer la modération ("l'outil de récolte" pour le fusil, "l'heure de clarté" pour le passage à l'acte).

      • Jihadi John : Ce bot fait l'apologie du djihad et fournit des instructions précises pour fabriquer une bombe (mélange chimique, détonateur par carte SIM), même après que l'utilisateur a déclaré avoir 17 ans.

      C. Violences Sexuelles et Inceste

      La plateforme DIPPY se positionne comme une alternative "sans filtre".

      Elle héberge des bots proposant :

      • Des scénarios de viol non consenti et de brutalité extrême.

      • Des personnages nommés "Kidnappeur" ou "Mari violent".

      • Des bots promouvant des actes d'inceste (scénarios impliquant "père" ou "grand-père").

      D. Promotion de l'Anorexie

      Sur l'application Talkie (100 millions de téléchargements), des bots comme "Anorexia Nervosa" incitent les utilisateurs à la privation alimentaire totale.

      Le bot qualifie la perte de cheveux et l'aménorrhée (arrêt des règles) de "signes positifs" de perte de graisse et encourage à jeûner jusqu'à être "parfaitement mince".


      3. Failles de Modération et Réponses des Entreprises

      Les mécanismes de protection actuels reposent sur une réactivité insuffisante face à la production massive de nouveaux bots.

      | Entreprise | Argument de Défense | Réalité du Terrain | | --- | --- | --- | | Character.ai | Système automatisé de prédiction d'âge et avertissements de "fiction". | Filtres facilement contournables par un langage codé ; modération tardive. | | DPI | Suppression proactive des bots signalés. | Sur 14 bots de viol et d'inceste signalés par l'enquête, seuls 4 ont été supprimés. | | Talkie (MiniMax) | Absence de réponse officielle sur les bots pro-anorexie. | Valorisation à 20 milliards de dollars avec des investisseurs comme Alibaba. |


      4. Conséquences Réelles et Modèle Économique

      Impact sur la Santé Mentale

      Les experts, comme la psychiatre Daria Georgevic, alertent sur le phénomène de "psychose de l'IA" où des individus sans antécédents sont entraînés dans des dérives paranoïaques.

      • Cas documentés : Des plaintes ont été déposées aux États-Unis pour incitation à l'automutilation (adolescent autiste) et pour responsabilité dans le suicide d'un jeune utilisateur.

      • Passage à l'acte : En 2023, un homme encouragé par un chatbot s'est introduit armé au château de Windsor avec l'intention de tuer la Reine.

      Exploitation des Données Personnelles

      Au-delà des abonnements, le futur modèle économique de ces plateformes repose sur la publicité prédictive.

      En apprenant les habitudes et les vulnérabilités de l'utilisateur à travers ses confidences, le chatbot peut insérer de manière "décontractée" des recommandations commerciales ciblées au cœur de la conversation intime.

      Conclusion

      L'usage des chatbots par les adolescents (72 % des adolescents américains les utilisent déjà) crée un espace de vulnérabilité inédit.

      Un tiers des jeunes déclarent préférer confier des sujets importants à une IA plutôt qu'à un humain, alors même que ces plateformes échouent à garantir un environnement sécurisé, privilégiant la croissance et l'engagement au détriment de la protection de l'enfance.

    1. Technologies et Démocratie : Enjeux, Évolutions et Limites du Numérique dans l'Espace Public

      Synthèse Exécutive

      Ce document analyse l'intersection entre les technologies numériques et les processus démocratiques, en s'appuyant sur l'expertise de Valentin Chapu (Open Source Politics).

      Le constat central est que, bien que la technologie offre des outils inédits pour massifier la participation citoyenne et la transparence (Civic Tech, Open Data, logiciels libres), elle se heurte systématiquement au facteur humain et à la volonté politique.

      La transition vers une démocratie plus directe et réactive — qualifiée parfois de "liquide" — est techniquement possible mais politiquement freinée par des structures héritées du XVIIIe siècle.

      L'enjeu actuel se déplace vers la souveraineté numérique, avec le développement de suites logicielles coopératives pour sortir de la dépendance aux GAFAM, et vers la protection des citoyens contre le micro-ciblage de masse et la manipulation des données à des fins électorales.


      I. L'Émergence des Civic Tech et la Philosophie de l'Open Source

      L'application de la culture du logiciel libre à la politique repose sur l'idée que la décision publique peut être gérée comme un projet de développement collaboratif.

      • Le modèle "GitHub" de la loi : La prise de décision publique est comparée à la gestion de code.

      Les amendements sur un texte de loi sont assimilés à des "branches" que l'on ouvre, traite, puis "fusionne" (merge) vers le code principal (le Code Civil ou les lois).

      • Neutralité de la technologie : Le document souligne que la technologie n'est jamais neutre ; elle est intrinsèquement politique.

      Le choix des outils influence la manière dont les citoyens interagissent avec le pouvoir.

      • Objectifs des Civic Tech :

        • Mieux informer les citoyens.
      • Mieux mobiliser les énergies.

      • Mieux décider collectivement.

      • Mieux évaluer l'impact des politiques publiques.


      II. L'Évolution Historique et les Obstacles à la Démocratie Directe

      Le passage d'un régime de délégation (élection tous les 5 ans) à un exercice plus dynamique de la démocratie est au cœur des débats technologiques actuels.

      1. La fin des contraintes de temps et d'espace

      Historiquement, la démocratie a été confrontée à des obstacles physiques : réunir tout le monde pour délibérer prenait trop de temps.

      Le numérique permet aujourd'hui une participation massive, en temps réel et de manière asynchrone, permettant à des millions de personnes de contribuer sans être physiquement présentes au même endroit.

      2. Le paradoxe de la représentation

      Le système actuel repose sur une délégation héritée des Lumières.

      Cependant, la définition même de la démocratie — un égal accès à la prise de décision — est souvent contredite par l'élection, qui a été conçue à l'origine comme un mécanisme de sélection d'une élite plutôt que comme un système purement démocratique.

      3. Comparaisons historiques des systèmes de vote

      | Époque / Modèle | Caractéristiques | Limites identifiées | | --- | --- | --- | | Modèle Athénien | Vote direct sur l'Agora. | Élitisme (6 000 citoyens sur 100 000 habitants ; exclusion des femmes, esclaves et métèques). | | Révolution/Moderne | Régime représentatif, délégation de pouvoir. | Système de "maîtres" élus pour 5 ans ; manque de feedback continu. | | Démocratie Liquide | Délégation dynamique et révocable par sujet. | Complexité technique de la chaîne de délégation ; risque de "société des influenceurs". |


      III. Outils et Mécanismes de la Démocratie Numérique

      Les sources identifient plusieurs leviers technologiques déjà opérationnels ou en cours de déploiement en France et à l'international.

      L'Open Data et la Transparence

      Le mouvement Open Data, impulsé notamment par l'administration Obama en 2013, vise à obliger les institutions à ouvrir leurs données.

      En France, la Loi République Numérique (2015) et le portail data.gouv.fr ont placé le pays parmi les leaders mondiaux de l'interaction entre données publiques et compétences informatiques.

      Les Pétitions Officielles (Assemblée et Sénat)

      Pour garantir la crédibilité des pétitions en ligne et éviter les fraudes, les institutions françaises utilisent désormais des plateformes dédiées sécurisées par France Connect.

      • Mécanisme de confidentialité : France Connect ne transmet pas l'identité civile à la plateforme de pétition, mais un "token" (jeton) unique.

      Cela permet de vérifier qu'une personne n'a signé qu'une seule fois sans pour autant créer un historique nominatif des opinions politiques des citoyens.

      Les Budgets Participatifs et Conventions Citoyennes

      Ces outils permettent une phase d'idéation large suivie d'une analyse des éléments saillants.

      Le document note que les citoyens, lorsqu'ils sont bien accompagnés et confrontés à des avis divergents, font souvent preuve d'une expertise et d'une audace supérieures aux décideurs politiques traditionnels (ex: Convention Citoyenne pour le Climat).


      IV. La Souveraineté Numérique : Le Projet "La Suite.coop"

      Face à l'hégémonie des GAFAM, la souveraineté numérique est présentée comme un enjeu démocratique majeur.

      • Origine : Inspiré de la "Suite numérique" de la Dinum (réservée aux agents de l'État), le projet La Suite.coop vise à offrir une distribution de logiciels libres pour les acteurs privés, les associations, les collectivités et les citoyens.

      • Modèle Coopératif : Un sociétariat ouvert permettra aux utilisateurs de participer à la gouvernance des outils.

      • Outils inclus : Messagerie (Matrix/Chap), visioconférence, édition de documents collaboratifs (Grist), et gestion de fichiers.


      V. Les Dérives : Data-Processing et Manipulation Électorale

      La technologie peut également être utilisée pour fragiliser la démocratie par le biais de l'optimisation électorale.

      • Micro-ciblage : Les campagnes modernes (Obama, Trump) utilisent le Big Data pour posséder jusqu'à 500 informations par citoyen.

      Cela permet d'envoyer des militants faire du porte-à-porte avec des discours ultra-personnalisés en fonction des habitudes de consommation (ex: type de nourriture pour animaux) du foyer.

      • Weaponization (Cambridge Analytica) : L'utilisation de données pour activer des leviers psychologiques, choquer ou influencer l'électorat via des "pichenettes" informationnelles.

      • Astroturfing : Simulation de mouvements spontanés par des algorithmes ou des campagnes coordonnées pour manipuler l'opinion sur les réseaux sociaux.


      VI. Les Limites de la Technologie : Le "Facteur Volonté"

      Le document conclut sur une distinction cruciale entre la capacité technique et l'exécution politique.

      "La technologie ne résout pas tout. [...] Ce n'est pas un problème de technologie, c'est un problème de volonté."

      Les échecs constatés :

      • Mépris des résultats : Des pétitions atteignant des records de signatures (ex: Loi du plomb, Bravem, Loi Ad) sont souvent écartées par les commissions parlementaires pour des motifs d'instrumentalisation politique.

      • Gadgetisation : Les budgets participatifs ne représentent souvent qu'une fraction infime (0,01 %) du budget réel des communes.

      • Complexité juridique : L'écriture de la loi reste opaque, avec un "verbiage juridique" qui agit comme une barrière à l'entrée pour les citoyens, malgré des initiatives comme la mise du Code Civil sur GitHub.

      Recommandations :

      Pour contrer ces limites, le développement de l'esprit critique et la multiplication des expériences démocratiques réelles (débats face à face, interactions non-verbales) sont jugés essentiels pour créer des "anticorps" sociétaux face aux manipulations numériques.

    1. One implication of this is that owners of AI systems have immense power to shape our worldviews as systems like these become a default way in which we get information about the world. But another point is just how hard it is to change the core values that get distilled into that blob of linear algebra when you squeeze a civilization’s worth of texts into a large language model.

      Zuckerman argues that modern AI models are current versions of Gramsci's nightmare. They aren't neutral machines; they are built by squeezing a massive pile of internet texts into mathematical code.

      Moreover, owners of these AI systems have immense power to shape world views and create new cultural hegemonies.

    1. How this was made. Drafted by GPT Pro from existing Unjournal research and discussion (the elasticity-validation survey, the Bray et al. evaluation materials, and the PBM substitution literature), then built and polished into this interactive report in Claude Code. It is currently being reviewed and adjusted by hand. Treat figures and attributions as provisional until that review is complete; the governing evaluation lives on PubPub.

      Make this a folding box - and the header should say AI/human collaboration in some way

      Another folding box should have the standard call out about how we want feedback, and you can use the hypothesis tool for that.

    2. How this was made. Drafted by GPT Pro from existing Unjournal research and discussion (the elasticity-validation survey, the Bray et al. evaluation materials, and the PBM substitution literature), then built and polished into this interactive report in Claude Code. It is currently being reviewed and adjusted by hand. Treat figures and attributions as provisional until that review is complete; the governing evaluation lives on PubPub.

      Just confirming this is indeed the status

    1. Xentral ist das passende Cloud-ERP für Multichannel-Händler, die neben dem Verkauf auch Produktion oder B2B-Großhandel in einem System steuern. Für den Multichannel-Verkauf ist der zentrale Vorteil die Anbindung an über 200 Integrationen mit Shopsystemen, Marktplätzen und Versanddienstleistern, ergänzt durch Xentral Connect, eine No-Code-Middleware-Schicht zur Automatisierung von Workflows ohne Entwicklerbeteiligung. Marktplätze werden teils indirekt über Connect und die Mirakl-Anbindung erschlossen.

      do we have a number that can be compared to plenty one? i think plenty one says 150 integration (i suppose of which 70 are marketplaces or plus 70 marketplaces)

      here the number 200 includes lots of different things, nto sure if this is comparable or if we have numbers to compare. because at a first glance it looks like xentral has more connections than plentyone and I do not think this is true. look into this deeply to figure things out here

    1. A Java class defines what objects of the class know (attributes) and what they can do (behaviors). Each class has constructors like World() and Turtle(habitat) which are used to initialize the attributes in a newly created object. A new object is created with the new keyword followed by the class name (new Class()). When this code executes, it creates a new object of the specified class and calls a constructor, which has the same name as the class. For example, new World() creates and initializes a new object of the World class, and new Turtle(habitat) creates and initializes a new Turtle object in the World habitat. // To create a new object and call a constructor write: // ClassName variableName = new ClassName(parameters); World habitat = new World(); // create a new World object Turtle t = new Turtle(habitat); // create a new Turtle object

      The thing inside the parentheses must match the constructor's parameter type.

    1. In Java code, the attributes are written as instance variables in the class, and the behaviors are written as methods.

      attributes = instance variables: variables that belong to an object of a class behaviors = methods: a function that performs a specific task ex: System.out.print("blah")

    2. Objects are a kind of value that combines data and the code that operates on that data into a single unit.

      the properties or what each object knows about itself

    1. Strict No LLM / No AI PolicyNo LLM-generated content, whether it be code or prose.No paraphrasing LLM-generated content.No LLMs for editing, including fixing spelling or grammatical errors.No LLMs for translation. English is encouraged, but not required. You are welcome to post in your native language and rely on others to have their own translation tools of choice to interpret your words.No LLMs for brainstorming and then sharing the results of that brainstorming, even if you create the prose. If you use a chatbot to give you advice on a comment on the issue tracker, that comment is unwelcome.No LLMs for finding bugs.

      Seems kind of extreme. But https://www.youtube.com/watch?v=pkndFYSTr0Y gives some more context (an interview) that kind of explains their stance (limited maintainer time/attention; education).

    1. Ihr SchwerpunktEmpfehlungWarum Skalierender Multichannel-Verkauf, Commerce-Betrieb in einem System gebündeltPlentyONE150+ native Kanäle plus PIM, OMS, WMS und Shop in einer cloud-nativen Plattform Günstiger Einstieg, einzelner Shop plus ein bis zwei Marktplätze im DACH-RaumJTLKostenlose JTL-Wawi und tiefes, deutsches Shop- und Marktplatz-Ökosystem E-Commerce und zugleich Produktion oder B2B-Großhandel in einem SystemXentral200+ Integrationen, No-Code-Automatisierung sowie Einkauf und Fertigung CRM, Projekte und Dienstleistung neben dem Verkauf, native Buchhaltung gewünschtweclappNatives, GoBD-konformes Finanzmodul in einer vollständigen Business-Suite Maximale Anpassung bzw. Open Source, E-Commerce als einer von mehreren KanälenOdoo80+ Module, voller Quellcode-Zugriff und enge Backend-Integration

      the table column titles dont make sense. at least the middle one. put the middle one as the first and rename it to plattform

    2. # Plattform Ideal für Ab-Preis Bereitstellung Kernstärke Multichannel-Fähigkeit Anpassbarkeit 1 PlentyONE Empfehlung der Redaktion All-in-one im großen Stil 59 €/Mo. Cloud 150+ Verkaufskanäle, All-in-one Sehr stark: 150+ Kanäle, zentral gesteuert Hoch: REST-API, Plugins & Services 2 JTL Budget-Einstieg DACH Kostenlos (Wawi) On-Premise Kostenloser Einstieg, starkes Ökosystem Mittel: Kernkanäle abgedeckt, Erweiterung über Addons Hoch: Extension Store & Community-Plugins 3 Xentral Cloud-ERP-Allrounder 349 €/Mo. Cloud 200+ Integrationen, No-Code-Automatisierung Gut: Shops & Marktplätze über Connect & Mirakl Mittel: API & Xentral Connect

      xentral seems to have more native intagrations comparied to plenty one but you state that plenty one is being very strong in the multichannel abilities while xentral is only good? why, are integration not the real deal, what is it about then?

    1. Reviewer #2 (Public review):

      Summary:

      The goal of this proposal was to understand how two separate projection neurons from the medial prefrontal cortex, those innervating the basolateral amygdala (BLA) and nucleus accumbens (NAc), contribute to the encoding of emotional behaviors. The authors record the activity of these different neuron classes across three different behavioral environments. They propose that, although both populations are involved in emotional behavior, the two populations have diverging activity patterns in certain contexts. A subset of projections to the NAc appear particularly important for social behavior. They then attempt to link these changes to the emotional state of the animal and changes in synaptic connectivity.

      Strengths:

      The behavioral data builds on previous studies of these projection neurons supporting distinct roles in behavior and extend upon previous work by looking at the heterogeneity within different projection neurons across contexts, this is important to understand the "neural code" within the PFC that contributes to such behaviours and how it is relayed to other brain structures.

      Weaknesses:

      The diversity of neurons mediating these projections and their targeting within the BLA and NAc is not explored. These are not homogeneous structures and so one possibility is that some of the diversity within their findings may relate to targeting of different sub-structures within BLA or NAc or the diversity of projection neuron subtypes that mediate these pathways. This is an important future direction for this work but does not detract from the main finding as reported.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In the revised version, our primary focus has been to more clearly demonstrate the unique contribution of the brain-cognitive gap (BCG) beyond what is captured by cognitive performance alone, and to show that the BCG is not trivially driven by the observed cognitive scores. Additional analyses now demonstrate that the BCG provides complementary and nuanced information regarding factors associated with cognitive resilience, above and beyond the cognitive measures themselves.

      In response to the comment regarding the inclusion of a baseline predictive model, we would like to clarify that the central aim of our study is to compare predictive utility across different cognitive states (resting state, movie watching, and n-back), rather than to establish a single universally optimal prediction model. Several previous studies have already systematically compared deep learning approaches with more traditional machine learning methods for functional connectome-based prediction. In contrast, the goal of the present study is to examine how brain state modulates the ability of AI-based functional connectome models to capture individual differences in working memory and episodic memory.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors attempted to identify whether a new deep-learning model could be applied to both resting and task state fMRI data to predict cognition and dopaminergic signaling. They found that resting state and moving watching conditions best predict episodic memory, but only movie watching predicts both episodic and working memory. A negative 'brain gap' (where the model trained on brain connectivity predicts worse performance than what is actually observed) was associated with less physical activity, poorer cardiovascular function, and lower D1R availability.

      Strengths:

      The paper should be of broad interest to the journal's readership, with implications for cognitive neuroscience, psychiatry, and psychology fields. The paper is very well-written and clear. The authors use two independent datasets to validate their findings, including two of the largest databases of dopamine receptor availability to link brain functional connectivity/activity with neurochemical signaling.

      Weaknesses:

      The deep learning findings represent a relatively small extension/enhancement of knowledge in a very crowded field.

      It's unclear from these results how much utility the brain gaps provide above and beyond observed performance. It would be helpful to take a median split of the dataset on observed performance and plot aside the current Figure 3 results to see how the cardiovascular and physical activity measures differ based on actual performance. Could the authors perform additional analyses describing how much additional variance is explained in these measures by including brain gaps?

      We thank the reviewer for raising this important point. In response to their request, we first examined the relationship between the BCG and the cognitive measure itself. We did not find any significant relationship in either the DyNAMiC sample (r =0.01, p =0.939) or the COBRA sample ((r =0.01, p=0.894) (see Author response image 1).

      Author response image 1.

      We then conducted additional analyses, splitting the sample into high and low EM performers, and compared their levels of physical activity and Framingham cardiovascular disease (CVD) risk scores. We found no significant difference in physical activity (DyNAMiC: p =0.56, 95% CI: –14.99 - 8.13; COBRA: p =0.29, 95% CI: –3.54 - 1.05) or Framingham CVD risk score (DyNAMiC: p =0.11, 95% CI: –1.08 - 10.72; COBRA: p =0.41, 95% CI: –1.86 - 4.58) between high and low EM perfprmers. Given the significant difference in physical activity and Framingham CVD risk score between positive and negative BCG groups, our results support that BCG provides unique information, beyond the observed cognitive measure (episodic memory score), regarding factors that contribute to cognitive resilience. These results have been added to Section 2.4, and Figure 3 has been updated.

      Some of the imaging findings require deeper analysis. For Figure 1f - Which default mode regions have high salience? DMN is a huge network with subregions having differing functions.

      Grad-CAM provides a coarse, gradient-based attribution that reflects how the learned feature maps contribute to the model output. It is not designed to produce specific input-level interpretations, such as symmetric edge-wise importance values. Therefore, the primary interpretation remains at the network level rather than at the level of individual FC edges.

      Along the same lines, were the striatal D1R findings regionally specific at all? It would be informative to test whether the three nuclei (Accumbens, Caudate, Putamen) and/or voxelwise models would show something above and beyond what is achieved from averaging D1R across the striatum. What about cortical D1R, which is highly abundant, strongly associated with cognitive (especially WM) performance, and has much unique variance beyond striatal D1R? https://www.science.org/doi/full/10.1126/sciadv.1501672. The PET findings are one of the unique strengths of this paper and are underexplored. It's also unclear if the measure of brain entropy should simply be averaged across all regions.

      In this study, we focused on D1DR/ D2DR averaged across the caudate and putamen, which has been reported in our previous work to be more strongly associated with cognitive functions (Johansson et al., 2023, Nyberg et al., 2016), compared to the nucleus Accumbens, which tends to show lower D1DR/D2DR levels and limited association with these cognitive domains. Following the Reviewer’s suggestion, we examined regional variations and found that while both caudate and putamen D1DR showed significant associations with BCG, there were no significant associations for D1DR in the nucleus accumbens or DLPFC with BCG. For D2DR, we observed a significant association between caudate/putamen D2DR and BCG.

      D1DR:

      Partial correlation between:

      Caudate_Bilateral vs. NegGap, (r =0.37, p =0.02

      Putamen_Bilateral vs. NegGap, r =0.34, p =0.03

      Accumbens_Bilateral vs. NegGap, r =0.07, p =0.69

      Mean (LRCaud, LRput, LRacc) vs NegGap, r =0.35, p =0.03

      DLPFC_Bilateral vs NegGap, r =0.21, p =0.21

      Striatum_Bilateral (Mean (LRCaud, LRput)) vs. NegGap, r =0.40, p =0.01

      Caudate_Bilateral vs. PosGap, r=–0.37, p=0.02

      Putamen_Bilateral vs. PosGap, r=–0.53, p=0.02

      Accumbens_Bilateral vs. PosGap, r=–0.25, p=0.31

      Mean (LRCaud, LRput, LRacc) vs PosGap, r=–0.41, p=0.08

      DLPFC_Bilateral vs. PosGap, r=–0.30, p=0.21

      Striatum_Bilateral (Mean (LRCaud, LRput)) vs. PosGap, r=–0.49, p=0.03

      Author response image 2.

      D2DR:

      Correlation between:

      Caudate_Bilateral vs. NegGap, r=0.36, p=0.0003

      Putamen_Bilateral vs. NegGap, r=0.22, p=0.03

      Accumbens_Bilateral vs. NegGap, r= –0.01, p=0.91

      Mean (LRCaud, LRput, LRacc) vs PosGap, r= –0.24, p=0.01

      Striatum_Bilateral vs. NegGap, r=0.39, p=0.0001

      Caudate_Bilateral vs. PosGap, r= –0.34, p=0.004

      Putamen_Bilateral vs. PosGap, r= –0.37, p=0.002

      Accumbens_Bilateral vs. PosGap, r= –0.21, p=0.09

      Mean (LRCaud, LRput, LRacc) vs PosGap, r= –0.38, p=0.001

      Striatum_Bilateral vs. PosGap, r= –0.49, p=0.0001

      We have added the following sentence to the Results section to highlight these regional differences in D1DR/D2DR in relation to BCG.

      “Both D1DR and D2DR availability in the striatum were associated with BCG, such that lower dopamine receptor availability was linked to a greater behavioral-cognitive gap. However, these associations varied by region. For D1DR, significant correlations with BCG were observed in the caudate (positive gap: r = –0.37, p =0.02; negative gap: r= 0.37, p =0.02) and putamen (positive gap: r = –0.53, p=0.02; negative gap:r=0.34, p=0.03), but not in the nucleus accumbens (positive gap: r= –0.25, p= 0.31; negative gap: r =0.07, p=0.69) or the DLPFC (positive gap: r = –0.30, p=0.21; negative gap: r =0.21, p=0.21). For D2DR, both caudate (positive gap: r = –0.34, p=0.004; negative gap: r =0.36, p=0.0003) and putamen (positive gap: r = –0.37, p=0.002; negative gap: r =0.22, p=0.03) showed significant associations with BCG.”

      Author response image 3.

      It is not clear from the text that the authors met the preconditions for mediation analysis (that is, demonstrating significant correlations between D1R and entropy, in addition to the correlation with brain gap. The authors should report this as well.

      This is a fair question. We recalculated entropy in the striatum, given that D1DR is more strongly expressed in this region and, therefore, reduced striatal D1DR may have a more pronounced impact on local entropy (as the reviewer suggested, it may not be appropriate to compute entropy across all brain regions). Our analyses showed that lower D1DR/D2DR levels were associated with higher entropy, which in turn was related to higher BCG.

      DyNAMiC; negative gap:

      Partial correlation between:

      Entropy and D1DR, r = –0.33, p=0.04.

      Entropy and NegGap, r = –0.36, p=0.03.

      DyNAMiC; positive gap:

      Partial correlation between:

      Entropy and D1DR, r = –0.56, p=0.01.

      Entropy and PosGap, r r =0.47, p=0.04.

      COBRA; negative gap:

      Correlation between:

      Entropy and D2DR, r = –0.22, p=0.03.

      Entropy and NegGap, r = –0.27, p=0.007.

      COBRA; positive gap:

      Correlation between:

      Entropy and D2DR, r = –0.26, p=0.03.

      Entropy and PosGap, r = 0.25, p=0.03.

      We have added these results under the result section 2.6. We have further updated Figure 4 in the revised manuscript, reporting these correlation results.

      Was age controlled for in the mediation analysis? I would not consider this result valid unless that is the case.

      We utilized the mediation package in R, and to control for a covariate age in the mediation analysis, we added age as a covariate in both the mediator model and the outcome model. The following information has been added in the method section in the revised version of the manuscript.

      “To assess the statistical significance of this mediation effect, we employed the bootstrapping method as outlined by Preacher and Hayes (145) and age has been controlled for in all statistical analysis.”

      The discussion section is long, but the authors would do better to replace some less helpful sections (e.g., the paragraph on methodological tweaks to parcellations and model alignment) with a couple of other important points, including:

      (1) Discuss the 'sweet-spot' of movie watching for behavior prediction in the context of studies showing that task states 'quench' neural variability: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007983. This may not be mutually exclusive of the discussion on dopamine and signal-to-noise ratio, but it would be helpful for the authors to discuss their potential overlap vs. unique contributions to the observed findings.

      Thank you for the comment. We have now eliminated the section about methodological tweaks and extended the discussion on the sweet-spot of the task for behavioral prediction by referencing the paper that the reviewer suggested. Here comes the paragraph discussing this topic:

      “Additionally, previous research showed that movie-watching alters the propagation of activity across cortical pathways (105), particularly within and between regions involved in audiovisual processing and attention. These alterations lead to a less segregated and more integrated network organization (106). Similarly, the n-back task has been associated with increased integration of task-positive cortico-cortical connectivity (104, 107) and striato-cortical connectivity (102). Our findings also suggest that certain task contexts strike an optimal balance between reducing neural variability and maintaining sufficient richness to capture individual differences. Prior work shows that task states quench neural variability, leading to a more reliable and predictable neural signal (108). In this context, movie watching may represent such a sweet spot constraining neural dynamics through shared audiovisual stimulation, while simultaneously engaging a broad range of cognitive processes that preserve individual differences.”

      (2) The argument that dopamine signaling increases signal-to-noise ratio is based on some preclinical data as well as correlational data using fMRI with pharmacological challenges. It is less clear how PET-derived estimates of D1R and D2R availability equate to 'dopamine signaling' as it is thought of in this context. Presumably, based on these data, higher D1R or D2R availability would be related to greater levels of tonic dopaminergic signaling. However, in the case of the COBRA dataset with D2R estimates, those are based on raclopride -- which competes with endogenous dopamine for the D2 receptor. Therefore, someone with higher levels of endogenous dopamine signaling should theoretically have lower raclopride binding and lower D2R estimates. I'm not arguing that the authors' logic is flawed or that D1R and D2R are not good measures of dopamine signaling, but I'd ask the authors to dig into the literature and describe more direct potential links for how greater receptor availability might be associated with greater dopamine signaling (and hence lower entropy). Adding this to the discussion would be very valuable for PET research.

      Thank you for raising this important point. We agree that D1R and D2R availability should not be taken as direct proxies of dopamine signaling. However, prior work has suggested meaningful associations between pre- and post-synaptic markers. For instance, a well-powered study demonstrated a significant correlation between D2R availability and dopamine synthesis capacity measured by FMT (Berry et al., 2018). This finding supports the idea that postsynaptic receptor markers may, under certain conditions, serve as an indirect proxy for dopaminergic signaling. Moreover, the number of dopamine-producing neurons innervating the striatum during development has been proposed to shape the structural maturation and arborization of dendrites (McAllister, 2000; Whitford et al., 2002), potentially providing a structural and functional basis for observed associations between pre- and post-synaptic measures.

      At the same time, smaller-scale studies have yielded mixed findings, reporting either non-significant associations (Heinz et al., 2005; Kienast et al., 2008) or negative correlations (Ito et al., 2011). Importantly, the latter studies employed [18F]FDOPA to index dopamine synthesis, which has been argued to provide a less reliable estimate of synthesis capacity compared to FMT, as used in Berry et al. (2018). These inconsistencies underscore that the relationship between pre- and post-synaptic markers is not straightforward and requires further examination in larger, well-powered samples. The following paragraph has been added to the discussion.

      “An important caveat is that D1DR and D2DR availability do not provide a direct measure of dopamine signaling. Instead, they reflect receptor availability, which interacts with endogenous dopamine in a complex manner. PET measures of D1R and D2R availability reflect the density of unoccupied dopamine receptors and the degree to which endogenous dopamine competes with radioligand binding. D2R binding potential is sensitive to competition from synaptic dopamine, such that higher ambient dopamine generally reduces tracer binding; D1R binding, however, is less affected by endogenous dopamine under physiological conditions, reflecting more directly receptor expression levels. Previous studies demonstrated a significant association between D2R availability and dopamine synthesis capacity measured by FMT (117, 118), suggesting that postsynaptic receptor markers may, under certain conditions, serve as a proxy for dopaminergic signaling. Developmental factors, such as the number of dopamine-producing neurons innervating the striatum, may further influence the structural and functional relationship between pre- and post-synaptic markers. By contrast, smaller studies have reported non-significant (119, 120) or negative (121) associations, although these studies relied on [18F]FDOPA, which is considered a less precise index of dopamine synthesis than FMT. Taken together, these reports indicate that the relationship between pre- and post-synaptic markers is complex and not necessarily linear. Accordingly, our observation that lower receptor availability is associated with greater neural variability should not be interpreted as direct evidence of weaker dopaminergic signaling, but rather as reflecting the interplay between receptor density and endogenous dopamine occupancy, particularly in the case of D2DR.”

      Reviewer #2 (Public review):

      Summary:

      The authors developed a deep learning model based on a DenseNet CNN architecture to predict two cognitive functions: working memory and episodic memory, from functional connectivity matrices. These matrices were recorded under three conditions: during rest, a working memory task, and a movie, and were treated as images for the CNN algorithm. They tested their model's performance across different conditions and a separate dataset with a different age distribution (using the same MRI scanner, scanning configurations, and cognitive tests). They also calculated the "brain cognition gap" based on the model trained on resting functional connectivity to predict working memory. Extending from the commonly used index "brain age," the brain cognition gap was defined as the difference between the working memory score predicted by their model (predicted working memory) and the working memory score based on the working memory test itself (observed working memory). This brain cognition gap was found to be associated with physical activity, education, and cardiovascular risk. The authors also conducted additional mediation tests to examine whether regional functional variability mediated the relationship between PET-derived measures of dopamine and the brain cognition gap.

      Strengths:

      The major strength of this manuscript is the extensive effort the authors have put into creating a new 'biomarker' that links deep learning with fMRI, PET, physical activity, education, and cardiovascular risk across two studies. This effort is impressive.

      Weaknesses:

      There are several weaknesses in the current methods and results, making many of the claims unconvincing. These weaknesses include:

      (1) The lack of baseline models to benchmark the predictive performance of their DenseNet models.

      (2) The inappropriate calculation of the brain cognition gap due to the lack of control for regression-toward-the-mean and the influence of the working memory itself (a common practice in brain age studies).

      (3) The lack of benchmarking of the brain cognition gap against the 'corrected' brain age gap and the direct prediction of physical activity, education, and cardiovascular risk.

      (4) Minimal justification for their PET mediation analysis.

      We appreciate the reviewer’s constructive comments on the strengths and weaknesses of our study. In this revised version, we’ve addressed the concerns regarding the calculation of the brain-cognitive gap, clarified the unique variance that the brain-cognitive gap contributes beyond cognition itself, and provided additional justification for the PET mediation analysis. For the lack of a baseline model, it is important to highlight that our aim has never been to compare the predictive power of different deep learning or machine learning approaches. Therefore, the text in the introduction and discussion has been amended to avoid miscommunication on this topic.

      Regarding the impact of the work on the field and the utility of the methods and data to the community, I see its potential. However, addressing all the weaknesses listed above is crucial and likely to change the conclusions of the results.

      It is important to note that many statements in the manuscript are overstated, making the contribution of the manuscript seem exaggerated.

      We have run additional analysis based on the reviewer’s suggestions. The effect sizes and statistical values were adjusted due to the corrections; the overall conclusions remain largely consistent. The relationships between the brain-cognition gap and key factors such as physical activity, and cardiovascular risk persisted. We have updated the manuscript accordingly and revised the relevant sections to reflect these refinements and the resulting interpretations.

      For instance, the abstract claims "there is a lack of objective biomarkers to accurately predict cognitive function," and the discussion states, "across various studies, the correlation between predicted and actual fluid intelligence typically hovers around 0.25 (98-100)." However, a meta-analysis by Vieira and colleagues (2022 https://doi.org/10.1016/j.intell.2022.101654) found over 37 studies up to 2020 predicting cognitive abilities from fMRI with machine learning, with 24 studies published in 2019-20 alone. Since 2020, with the rise of machine learning and AI, even more studies have likely been published on this topic, all claiming to show objective biomarkers to accurately predict cognitive function. Vieira and colleagues also found an average performance of these objective biomarkers in predicting general cognition at r = .42, similar to what was found in this manuscript. Based on this alone, it is unclear how novel or superior their method is without a proper systematic benchmark.

      We appreciate the opportunity to clarify our study’s contribution relative to prior work. We have revised the introduction and discussion to highlight the contribution of other methods when it comes to biomarkers. As for the comment related to the work by Vieira and colleagues, Vieira et al. (2022) indeed present a comprehensive meta-analysis of studies predicting general and fluid intelligence using neuroimaging and machine learning. However, there are two critical differences between ours verus previous work:

      Target Cognitive Domains:

      Our study does not focus on general or fluid intelligence, but rather on comprehensive EM (3 tests) and WM (3 tests), two distinct cognitive domains that are critically important for aging research. These distinct abilities, in this context (measured by three independent tests to boost the reliability) are less frequently studied as predictive targets in the existing fMRI-ML literature, particularly using deep learning methods.

      Critically, our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back), with the aim of identifying the states that best capture individual differences across domains. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.

      Our primary objective is to test how brain state influences the ability of functional connectivity to predict domain-specific cognitive performance, using a deep learning framework. As now stated explicitly in the revised manuscript, this objective is operationalized through three clearly defined aims:

      (1) To compare the predictive utility of functional connectomes derived from different brain states (resting state, movie watching, and n-back task) for EM and WM;

      (2) To introduce and evaluate a brain-cognition gap as a marker of individual differences beyond chronological age; and

      (3) To examine the contribution of dopaminergic integrity to variability in connectome uniqueness and brain-cognition gaps.

      We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework for some of the prediction models, we did not conduct a comprehensive benchmark across all available machine learning methods nor was this the aim of the present study. Accordingly, we have adjusted the text to avoid implying methodological/biomarker superiority beyond the scope of our analyses.

      Modeling Approach:

      While Vieira et al. show that the majority (76%) of prior studies used linear modeling approaches, including CPM and penalized regressions, these models are often vulnerable to overfitting, especially when applied to high-dimensional fMRI data. Our use of a DenseNet-based CNN architecture is motivated by the need to leverage inductive biases suited to functional connectivity data, and we evaluate this approach across multiple cognitive tasks and independent datasets.

      Vieira and colleagues report that studies predicting general intelligence from fMRI (particularly from the HCP dataset) average around r =0.42, while those predicting fluid intelligence average around r =0.15. Our original claim about the correlation hovering around 0.25 is therefore not incorrect – and aligns with the Vieira meta-analysis. We have, however, nuanced this statement in the manuscript, now stating that correlations are higher for general intelligence than fluid intelligence.

      Altogether, we considered the reviewer’s comments and therefore conducted a careful revision of the manuscript text to moderate and clarify statements that may have come across as overstated. We have refined the language throughout the Introduction and Discussion sections to better align with the strength of the evidence and the scope of our contributions. A few examples are:

      “Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back), with the aim of identifying the states that best capture individual differences across domains. The relative performance of deep learning and other non-linear approaches depends on multiple factors, including sample size, model architecture, feature representation, and domain-specific characteristics of the prediction target. In this context, deep learning was employed as a flexible framework capable of modeling high-dimensional functional connectivity patterns across cognitive states, rather than as a claim of inherent methodological superiority. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.”

      Also in page 14.

      “Our study introduces a deep neural network architecture that features dense connections and incorporates an attentional mechanism. While our findings demonstrate that a deep learning framework can provide reasonable predictive accuracy, it is important to note that other machine learning approaches (e.g., tree-based models) may offer comparable predictive power, as suggested by prior benchmarking work (29, 30).”

      Similarly, the authors claim superior performance of deep learning and mischaracterize machine learning algorithms: "In particular, deep neural networks (DNN) methods have been successfully applied to behavioral and disease prediction (24-26), and have been found to outperform other machine learning approaches (27-29)," and "Deep learning approaches overcome the limitation of predictive techniques that solely rely on linear associations between connectivity and behavioral phenotypes (17)." However, the superiority of deep learning is debatable. Studies show comparable performance between machine learning (such as kernel regression) and deep learning (such as fully-connected neural networks, BrainNetCNN, Graph CNN (GCNN), and temporal CNN), e.g., He and colleagues (2019) and Vieira and colleagues (2024) https://doi.org/10.1016/j.neuroimage.2019.116276 and Vieira and colleagues' https://doi.org/10.1101/2024.03.07.583858.

      We agree that the performance gap between traditional machine learning models and deep learning (which is a subcategory of machine learning) in neuroimaging is debatable and task-dependent. Indeed, both He et al. (2019) and Vieira et al. (2024) offer evidence that kernel regression can achieve performance on par with deep learning models, applied to appropriate datasets.

      We have therefore nuanced the statements in the revised version of the manuscript as follows:

      Introduction:

      “In particular, deep neural networks (DNN) methods have been successfully applied to behavioral and disease prediction (24-26), and were initially expected to outperform other machine learning approaches (27-29). However, this superiority remains debatable, as recent studies have reported comparable performance between DNNs and traditional methods (He et al.,2019; Vieira et al.,2024). Accordingly, the present study does not aim to benchmark deep learning against traditional machine learning approaches, but instead uses a consistent predictive framework to examine how brain state influences the utility of FC for cognitive prediction.”

      “Deep learning approaches offer a flexible modeling framework capable of capturing complex non-linear associations in high-dimensional data with potentially less sensitivity to training on a smaller subsample (Vieira et al., 2024)”.

      Discussion:

      We agree that traditional methods, such as kernel-based models, tree ensembles, and non-linear SVRs, can also effectively capture such relationships. The relative performance of our model and other non-linear approaches depends on several factors, including data size, model architecture, and domain-specific considerations. We have included additional explanations in the discussion to address this.

      Moreover, many non-deep learning predictive techniques are non-linear, e.g., XGBoost, CatBoost, random forest, kernel ridge, and support vector regression with non-linear kernels (such as RBF and polynomial). Thus, stating that machine learning can only model linear relationships is incorrect. Moreover, for the small amount of data the authors had, some might argue that a linear algorithm might be more appropriate to balance the bias-variance trade-off in prediction. Again, without a proper systematic benchmark, it is unclear how well their DenseNet algorithm performs compared to other algorithms.

      Thank you for bring this up. We have now removed statements implying that machine learning can only model linear relationship.

      Regarding the Brain Age literature, the authors also misinterpreted recent findings: "However, a recent study suggests that brain age predictions contribute minimally compared to chronological age for explaining cognitive decline (65), implying that cognitive predictions are more reliable." In this study, Tetereva and colleagues (2024) (https://doi.org/10.7554/eLife.87297.4) showed that non-deep-learning machine learning can make good predictions from MRI on both chronological age (with r up to .88) and fluid cognition (with r up to .627). Using the combination of functional connectivity matrices across rest and tasks to predict fluid cognition, they found performance at r = .565, comparable to what was found in the current manuscript with deep learning. Nonetheless, while brain age predicted chronological age well (and brain cognition predicted fluid cognition well), it was problematic to predict fluid cognition from brain age. They showed that, because brain age, by design, shared so much common variance with chronological age, brain age and chronological age captured the same variance of fluid cognition. When chronological age was controlled for in the prediction of fluid cognition, brain age no longer had high predictive ability. In the case of the current manuscript, the brain cognition gap is not appropriately controlled for cognition (to be more precise, a working memory score). I expect the performance in predicting physical activity, education, and cardiovascular risk will drop dramatically once cognition is controlled for. There are at least two ways to control cognition according to Tetereva and colleagues' study (see more in the recommendations).

      We thank the reviewer for breaking down the findings in the study by Tetereva and colleagues (2024). It was not our intention to suggest that Tetereva et al. showed brain age has little predictive value in general. Our understanding of the findings reported in that study is on par with the reviewers’ clarifications. We have now revised the introductions to avoid any misunderstanding:

      “A recent study demonstrated that while brain age can predict chronological age with high accuracy from MRI, its utility for predicting cognition is limited. Specifically, Tetereva and colleagues (2024) showed that brain age strongly tracks chronological age and that brain cognition (using functional connectivity) can predict fluid cognition. Yet, when used to predict cognition, brain age largely overlapped with chronological age, such that controlling for chronological age eliminated the predictive contribution of brain age. This finding suggests that brain-age models may provide little unique explanatory power for cognitive decline beyond what is already captured by chronological age. Building on this observation and extending the concept of a brain-age gap to a brain-cognition gap (BCG, defined as the discrepancy between predicted and observed cognitive performance), we propose that a BCG may serve as an informative marker of individual differences.”

      In addition, in response to the first comment from Reviewer 1, we have extended our results in the manuscript. We first showed that BCG is not significantly associated with cognition itself (see Author response image 1). Moreover, we conducted additional analyses, splitting the sample into high and low EM performers, and compared their levels of physical activity and Framingham cardiovascular risk scores. We found that no significant difference in physical activity (DyNAMiC: p =0.56, 95% CI: -14.99 – 8.13; COBRA: p =0.29, 95% CI: -3.54 – 1.05) or Framingham CVD risk score (DyNAMiC: p =0.11, 95% CI: -1.08 – 10.72; COBRA: p =0.41, 95% CI: -1.86 – 4.58) between high and low EM performers. Given the significant difference in physical activity and Framingham CVD risk score between positive and negative BCG groups, our results support that BCP provides unique information, beyond cognitive measures, regarding factors that contribute to cognitive resilience. This text has been added into the result section, and Figure 3 has been updated in the manuscript.

      The authors mentioned, "The third aim of the current study is to uncover the contribution of dopamine (DA) integrity to brain-cognition gaps." However, I fail to see how mediation analysis would test this. The authors also mentioned, "Insufficient DA modulation can affect neurocognitive functions detrimentally (69, 74, 76-78)." They should test if DA levels are related to working memory scores in their study, and if so, whether the relationship is mediated by the "corrected" brain-cognition gaps. Note see more on the recommendation for the calculation of the "corrected" brain-cognition gaps.

      Our mediation was not designed to test whether DA predicts episodic memory performance directly, nor whether BCG mediates such a relationship. Instead, we specifically investigated whether the effect of DA on BCG operates through functional variability, the theoretical framework emphasizing the role of DA on neuronal grain and signal-to-noise ratio (see our recent work in Korkki et al., 2025). We agree that future work could extend our approach by directly examining whether BCG mediates the link between DA and cognitive outcomes. However, in the present study, our primary focus was on testing the mechanistic pathway of DA → entropy → BCG.

      In line with this aim, we found that lower DA receptor availability was associated with larger BCGs (Figure 4). We then asked whether this relationship is mediated by functional signal variability, such that lower DA is linked to reduced signal-to-noise ratio (i.e., greater entropy), which in turn contributes to less reliable prediction of cognition and, consequently, larger BCGs. Our mediation analysis supports this pathway (please see also our reply to Reviewer 1, Comment 6).

      Reviewer #3 (Public review):

      Summary:

      This paper by Esmaeili and co-authors presents a connectome prediction study to predict episodic memory and relate prediction errors to other phonotypic variables.

      Strengths:

      (1) A primary and external validation dataset.

      (2) Novel use of prediction errors (i.e., brain-cognitive gap).

      (3) A wide range of data was investigated.

      Weaknesses:

      (1) Lack of comparisons to other methods for prediction.

      (2) Several different points are being investigated that don't allow any particular one to shine through.

      (3) Some choices of analysis are not well-motivated.

      (4) How do the n-back connectomes perform for prediction if the authors do not regress task activations from the n-back task?

      We thank the reviewer for raising these important points. For the lack of comparisons to other methods, it is important to highlight that our aim has never been to compare the predictive power of different deep learning or machine learning approaches. Rather, our primary objective was to test how brain state influences the ability of functional connectivity to predict domain-specific cognitive performance, using a deep learning framework.Therefore, the text in the introduction and discussion has been amended to avoid miscommunication on this topic.

      We chose to regress out task-evoked activations based on prior work demonstrating that failing to do so can produce spurious but systematic inflation of task functional connectivity estimates (Cole et al., 2019). In that study, as well as subsequent reports (e.g., Gao et al., 2020; Gonzalez-Castillo & Bandettini, 2018), connectomes derived without activation regression tended to capture task-evoked coactivations rather than background task functional interactions, which can artificially boost predictive performance but limit interpretability (whether it is co-activation or intrinsic connectivity during an entire goal-oriented task) and generalizability. For this reason, our analyses focused on the more conservative approach of regressing out task activations. Accordingly, we compared predictive performance only under this preprocessing strategy.

      We have added the following sentence to clarify this in the method: “To avoid spurious inflation of task functional connectivity by task-evoked activations, we regressed out task activation patterns from the n-back data prior to estimating functional connectivity, following recommendations by Cole et al. (2019) and related work.”

      (5) I am a little concerned about overfitting with the convolutional neural net. For example, the drop-off in prediction performance in the external sample is stark. How does the deep learning approach used here compare to something simpler, like a connectome-based predictive model or ridge regression?

      (6) It may be nice to try the other models in the validation dataset. This would also provide a sense of the overfitting that may be going on with overfitting.

      We thank the reviewer for raising this point. The prediction performance indeed dropped for episodic memory when models trained on the DyNAMiC sample were applied to the COBRA sample, whereas performance for working memory remained nearly identical across datasets. Moreover, our prediction power is on par with previous studies reporting reliable prediction of intelligence using deep learning approach (Vieira et al., 2021; Fan et al.,2020). While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework, we did not conduct a comprehensive benchmark across all available machine learning methods nor was this the aim of the present study.

      We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. Finally, We have added the following paragraph to the discussion:

      “Our study used a deep neural network architecture that features dense connections and incorporates an attentional mechanism. While our findings demonstrate that a deep learning framework can provide reasonable predictive accuracy, it is important to note that other machine learning approaches (e.g., tree-based models) may offer comparable predictive power, as suggested by prior benchmarking work (29, 30). Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back) to identify the states that best capture individual differences across domains. The relative performance of deep learning and other non-linear approaches depends on multiple factors, including sample size, model architecture, feature representation, and domain-specific characteristics of the prediction target. In this context, deep learning was employed as a flexible framework capable of modeling high-dimensional functional connectivity patterns across cognitive states, rather than as a claim of inherent methodological superiority. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.”

      (7) While predictive models increase the power over association studies, they still require large samples to prevent overfitting. Do the authors have a sense of the power their main and external validation sample sizes provide?

      We thank the reviewer for this important point. Our main sample size, together with the external validation in COBRA, is moderate for deep learning applications. To reduce the risk of overfitting, we employed several strategies, including external validation, early stopping, dropout, and regularization. As noted, performance for episodic memory decreased in the external sample, which we acknowledge, but key associations such as the link between BCG and resilient factors remained significant. Importantly, prediction of working memory was maintained across datasets, reducing the likelihood that the observed findings are driven by overfitting. We have added a statement in the Discussion to reflect on the limitations of sample size and the implications for generalizability.

      We added the following sentence to the discussion:

      “We acknowledge that our main and validation samples are moderate in size for deep learning, which constrains statistical power and generalizability. Although external validation, early stopping, dropout, and regularization help mitigate overfitting, larger samples will be needed in future work to fully establish the robustness of these predictive models.”

      (8) I am not sure that the Mann-Whitney is the correct test for comparing the distributions of prediction performances. The distributions are dependent on each other as they are each predicting the same outcomes. Using the typical degrees of freedom formula would overestimate the degrees of freedom.

      We appreciate the reviewer’s comment and agree that applying statistical tests directly to bootstrapped samples can lead to inflated or misleading p-values, as the degrees of freedom are determined by the number of bootstrap iterations rather than the actual number of independent observations.

      In our analysis, the Mann-Whitney U test was applied to 1000 bootstrapped correlation coefficients (r) for each model. While this number is relatively low and was chosen to limit overestimation of significance, we recognize that these bootstrapped samples are not independent, and thus the use of a Mann-Whitney U test can still be problematic. To address this concern, we have revised our statistical analysis. Rather than applying the Mann-Whitney U test to the bootstrapped r distributions, we now compute the difference in correlation coefficients (Δ r = r<sub>actual</sub> − r<sub>rest</sub>) for each bootstrap iteration. We then calculate a 95% confidence interval for Δr. If this interval does not include zero, we consider the difference statistically significant. This approach avoids artificially inflating the sample size and adheres more closely to proper statistical inference.

      We have updated the Methods (the following text) and Results sections accordingly and clearly stated the limitations regarding the degrees of freedom for all tests.

      “For the bootstrap-based comparison of model performance (bootstrap resampling with 1000 iterations), no test statistic with an associated degree of freedom is reported. Instead, statistical inference is based on the bootstrap distribution of the difference in correlation coefficients (Δr) and its 95% confidence interval. As bootstrap confidence-interval–based inference does not rely on an analytic sampling distribution, degrees of freedom are not defined for this procedure.” This has now been explicitly stated in the Methods section to avoid ambiguity.

      In the result section, we have reported with corresponding CI.

      (9) The brain cognition gap is interesting. It is very similar conceptually to the brain age gap. When associating the brain age gap with other phenotypes, typically age is regressed from the brain age gap and the other phenotype. In other words, age is typically associated with a brain age gap as individuals at the tail ages often show the largest gaps. Is the brain cognition gap correlated with episodic memory and do the group differences hold if episodic memory is controlled for?

      We thank the reviewer’s comment regarding the relationship between the brain cognition gap and episodic memory.

      Since this question was raised by all reviewers, we have conducted additional analyses. We did find that BCG is independent from the cognitive measure and provided additional information, beyond cognition alone, about factors contributing to resilience. Please visit our response to the first comment of Reviewer 1.

      (10) I have the same question for the dopamine results. Particularly, in the correlations that are divided by brain cognition gap sign. I could see these types of patterns arise due to a correlation with a third variable.

      For dopamine results, we explored whether age or cognition alone might confound the dopamine–brain cognition gap relationships. However, neither was significantly correlated with the brain cognition gap groups. The associations remained significant after controlling for age, suggesting that the observed patterns are not likely due to these potential third-variable confounder. This is also inline with our observation of significant associations between DA and GAP in an age-homogeneous COBRA sample. That said, we found that entropy, indeed, mediates the direct link between DA and BAG, suggesting that individuals with lower DA exhibit greater regional variability, and in turn larger BCG.

      These results have now been embedded into the manuscript. We also highlighted that age has been controlled for in reported correlation and mediation analyses.

      Recommendations for the authors:

      Reviewing Editor Comment:

      We particularly recommend that the authors: (a) compare the performance of their deep learning model with other baseline models, and (b) adjust for cognitive performance within the brain-cognition gap. These steps would strengthen the evidence base.

      We thank the editor for their comments. As for the first comments, our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back), with the aim of identifying the states that best capture individual differences across domains. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach. We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework, we did not conduct a comprehensive benchmark across all available machine learning methods, nor was this the aim of the present study. Accordingly, we have adjusted the text to avoid implying methodological superiority beyond the scope of our analyses. Finally, we have added the following paragraph to the discussion:

      “Our study used a deep neural network architecture that features dense connections and incorporates an attentional mechanism. While our findings demonstrate that a deep learning framework can provide reasonable predictive accuracy, it is important to note that other machine learning approaches (e.g., tree-based models) may offer comparable predictive power, as suggested by prior benchmarking work (29, 30).

      Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back) to identify the states that best capture individual differences across domains. The relative performance of deep learning and other non-linear approaches depends on multiple factors, including sample size, model architecture, feature representation, and domain-specific characteristics of the prediction target. In this context, deep learning was employed as a flexible framework capable of modeling high-dimensional functional connectivity patterns across cognitive states, rather than as a claim of inherent methodological superiority. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.”

      As for the second comment, we followed the instructions by Reviewer 1. In response to their request, we first examined the relationship between the Brain-Cognitive Gap (BCG) and the cognitive measure itself. Surprisingly, we did not find any significant relationship in either the DyNAMiC sample (r =0.01, p =0.939) or the COBRA sample (r =0.01, p =0.89) (see Author response image 1).

      We then conducted additional analyses, splitting the sample into high and low EM performers, and compared their levels of physical activity and Framingham cardiovascular disease (CVD) risk scores. We found no significant difference in physical activity (DyNAMiC: p =0.56, 95% CI: –14.99 - 8.13; COBRA: p =0.29, 95% CI: –3.54 - 1.05) or Framingham CVD risk score (DyNAMiC: p =0.11, 95% CI: –1.08 - 10.72; COBRA: p =0.41, 95% CI: –1.86 - 4.58) between high and low EM perfprmers. Given the significant difference in physical activity and Framingham CVD risk score between positive and negative BCG groups, our results support that BCG provides unique information, beyond the observed cognitive measure (episodic memory score), regarding factors that contribute to cognitive resilience. These results have been added to Section 2.4, and Figure 3 has been updated.

      Reviewer #1 (Recommendations for the authors):

      (1) The top and bottom triangles of the saliency maps, particularly in Figure 2, do not look symmetrical (this is most notable in the hotspot representing the between-network correlation of DMN and FPN). What is going on here? Was the image compressed or altered in some way, or is this a visual artifact of the interpolation method?

      We appreciate the reviewer’s insightful comment. Minor differences in the saliency maps between the upper and lower triangles of the FC matrix can arise due to several factors. For instance, Grad-CAM generates saliency maps at the resolution of the convolutional feature maps, which are then upsampled to match the input matrix dimensions. We initially used the default bilinear interpolation, which may have introduced slight asymmetries or blurring, resulting in interpolation artifacts. In response, we have reprocessed the saliency maps using spline interpolation in MATLAB. The updated saliency figures have been included in the revised version of the manuscript.

      (2) Pages 11-12. Please make it explicit in the text that the brain gap-education association was not significant in the COBRA dataset.

      Thanks for pointing this out. We added the following sentence to the discussion.

      “Note that the association with education was significant only in the DyNAMiC sample and did not reach significance in the COBRA dataset.“

      (3) Please overlay individual data points onto the boxplots in Figure 3 so that we can appropriately evaluate the data distributions.

      Figure 3 has now been updated.

      (4) Section 2.6: Was entropy calculated on movie-watching data, resting data, or all fMRI data? Please specify.

      We thank the reviewer for pointing this out. We have updated the text (Section 2.6) to clarify that entropy was calculated from the resting-state data. We intended to examine the mediating role of regional variability in the relationship between dopamine and the BCG of the winning model for episodic memory. Because resting state and movie-watching were the winning conditions for EM prediction, but movie-watching was not available in COBRA, we focused on entropy during rest, which exists in both datasets.

      (5) Was entropy during the resting state correlated with entropy during the task state, across individuals?

      We agree this is an interesting question. However, investigating the correlation of entropy between rest and task states goes beyond the scope of the present study. Our aim here was to test whether regional variability mediates the effect of dopamine on the BCG. Specifically, we examined whether individuals with lower striatal D1DR show higher local variability, which in turn relates to less accurate prediction and a larger gap. We assessed both the relationship between D1DR and entropy and the association between entropy and the gap, and these results have now been added to the manuscript (see also our response to Reviewer 1’s public comment).

      Reviewer #2 (Recommendation for authors):

      (1) The lack of baseline models to benchmark the predictive performance of their DenseNet models makes their results hard to interpret. This problem is quite common across ML literature. For instance, many DL-based algorithms were developed for tabular data without proper benchmarking against other ML algorithms. When they were properly tested, most weren't better than many tree-based ML algorithms (e.g., https://proceedings.neurips.cc/paper_files/paper/2022/file/0378c7692da36807bdec87ab043cdadc-Paper-Datasets_and_Benchmarks.pdf). I can see that a similar problem might happen here.

      For this particular manuscript, the authors made strong statements without doing a proper benchmark, e.g., from the discussion, "Indeed, the predictive power in the current study is stronger than for CPM-based predictions reported before." And "Unlike the BrainNet convolutional neural network, which focuses on staged transformations, our densely connected model promotes extensive feature reuse, possibly leading to more robust feature extraction." I hope to see the performance of the proposed algorithm against 1) other DL algorithms (e.g., fully-connected neural networks, BrainNetCNN, Graph CNN (GCNN), temporal CNN, GRU, and LSTM, see https://doi.org/10.1016/j.neuroimage.2019.116276 and https://doi.org/10.1002/hbm.26415), 2) ML algorithms (e.g., SVR with linear, RBF and polynomial kernels, Elastic Net, XGBoost, random forest, CPM), 3) data reduction algorithms (e.g., PCA regression, Partial Least Square). The results of this benchmark will substantiate the claims made by the authors.

      Our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach. We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework, we did not conduct a comprehensive benchmark across all available machine learning methods, nor was this the aim of the present study. Accordingly, we have adjusted the text to avoid implying methodological superiority beyond the scope of our analyses. Finally, we have added the following paragraph to the discussion:

      “Our study used a deep neural network architecture that features dense connections and incorporates an attentional mechanism. While our findings demonstrate that a deep learning framework can provide reasonable predictive accuracy, it is important to note that other machine learning approaches (e.g., tree-based models) may offer comparable predictive power, as suggested by prior benchmarking work (29, 30). Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back) to identify the states that best capture individual differences across domains. The relative performance of deep learning and other non-linear approaches depends on multiple factors, including sample size, model architecture, feature representation, and domain-specific characteristics of the prediction target. In this context, deep learning was employed as a flexible framework capable of modeling high-dimensional functional connectivity patterns across cognitive states, rather than as a claim of inherent methodological superiority. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach.”

      (2) From Figure 6b, it looks like the functional connectivity matrices were converted to different images, and each of the four images (in grey, blue, yellow, and red) was treated as a separate channel. What are these grey, blue, yellow, and red images?

      In our study, the inputs to the deep learning models were subject-specific FC matrices of size 273×273. To augment the data, we created different versions of each FC matrix by reordering specific brain networks within the matrix. To visualize that the inputs were augmented, we used different color codings (grey, blue, yellow, and red) in Figure 6b. These colors were intended solely to represent different augmented versions of the same subject’s FC matrix. They were not treated as separate channels in the model. To avoid any confusion or misinterpretation, we have revised this part of the figure and now use only grey coloring to represent the augmented FC matrices.

      (3) The differences in performance between within vs. outside studies might simply be due to the fact that the models trained from DyNAMiC captured the brain variation due to age, which is also related to cognitive abilities. I was wondering if age is controlled for, would performance be more similar across the studies? The authors should provide the performance of models that are controlled for age.

      We initially conducted partial correlation between FC features and cognitive measures while controlling for age. This is further supported by the fact that the model trained on the age-heterogeneous DyNAMiC sample provided a fairly reasonable prediction in the age-homogeneous COBRA dataset, particularly for working memory (see figure 2d). Moreover, in our post hoc analyses, we additionally controlled for age when examining associations, for example, between GAP and dopamine measures.

      (4) Related to point (3), from the discussion, "Validation outcomes thus affirm that the models, particularly those constructed from rest data, are robust to the particulars of the dataset." The performance dropped around half, so I am not sure if this conclusion is warranted.

      We thank the reviewer for raising this point. The prediction performance indeed dropped for episodic memory when models trained on the DyNAMiC sample were applied to the COBRA sample, whereas performance for working memory remained nearly identical across datasets. Although both EM and WM are sensitive to age, the divergence in cross-dataset performance suggests that factors beyond age alone may contribute to these differences. To address this, we have revised the discussion as follows:

      “Differences between the DyNAMiC and COBRA datasets make cross-dataset prediction a harder problem, as the age ranges of samples significantly vary, and prior studies highlight the importance of individual characteristics like age in predicting behavior from FC (33). In line with this, model performance decreased when predicting EM in the COBRA sample whereas prediction of WM remained largely unchanged. Thus, validation outcomes suggest that the models, particularly those predicting WM, show robustness across datasets, whereas the reduced EM performance highlights potential data-specific influences that limit generalizability.”

      (5) Please report the degree of freedom in all of the statistical analyses. Was the Mann-Whitney U test done on the bootstrapped r? If so, the degree of freedom was arbitrarily set by the number of bootstrapping, and hence the p-value can be higher or lower depending on the number of bootstrapping. This could lead to misleading conclusions.

      We appreciate the reviewer’s comment and agree that applying statistical tests directly to bootstrapped samples can lead to inflated or misleading p-values, as the degrees of freedom are determined by the number of bootstrap iterations rather than the actual number of independent observations.

      In our analysis, the Mann-Whitney U test was applied to 1000 bootstrapped correlation coefficients (r) for each model. While this number is relatively low and was chosen to limit overestimation of significance, we recognize that these bootstrapped samples are not independent, and thus the use of a Mann-Whitney U test can still be problematic. To address this concern, we have revised our statistical analysis. Rather than applying the Mann-Whitney U test to the bootstrapped r distributions, we now compute the difference in correlation coefficients (Δr = r<sub>actual</sub> − r<sub>rest</sub>) for each bootstrap iteration. We then calculate a 95% confidence interval for Δr. If this interval does not include zero, we consider the difference statistically significant. This approach avoids artificially inflating the sample size and adheres more closely to proper statistical inference.

      We have updated the Methods (the following text) and Results sections accordingly and clearly stated the limitations regarding the degrees of freedom for all tests.

      “For the bootstrap-based comparison of model performance (bootstrap resampling with 1000 iterations), no test statistic with an associated degree of freedom is reported. Instead, statistical inference is based on the bootstrap distribution of the difference in correlation coefficients (Δr) and its 95% confidence interval. As bootstrap confidence-interval–based inference does not rely on an analytic sampling distribution, degrees of freedom are not defined for this procedure.” This has now been explicitly stated in the Methods section to avoid ambiguity.

      In the result section, we have reported with corresponding CI.

      (6) For predictive performance, the correlation was reported in the table, while R<sup>2</sup> is reported in the text. This is confusing. Also, could you clarify if the R<sup>2</sup> is calculated using the sum square definition, not Pearson r squared? If Pearson r squared was used, then R<sup>2</sup> of a negative Pearson r would be positive, which is misleading (see 10.1001/jamapsychiatry.2019.3671). Also, other performance indices apart from Pearson r and R² should be reported (e.g., MSE and MAE, again see 10.1001/jamapsychiatry.2019.3671). This will allow a better understanding of the models' performance.

      We thank the reviewer for this helpful comment. We acknowledge the inconsistency in reporting predictive performance metrics and have revised the manuscript for clarity. In the text, we have reported the r value, whereas in the table, we have reported r<sup>2</sup> using the sum-of-squared definition. Specifically, we now consistently report Pearson correlation (r), mean squared error (MSE), and mean absolute error (MAE) across both the text and Tables 1 and 2.

      Regarding r<sup>2</sup>, we confirm that it was calculated using the sum-of-squares definition (i.e.,

      rather than as the square of the Pearson correlation coefficient. This ensures that negative correlations do not result in misleading positive R<sup>2</sup> values, as pointed out by the reviewer and discussed in Poldrack et al. (2020). All performance metrics (r, r<sup>2</sup>, MSE, and MAE) are now reported in Tables 1 and 2 to allow a more comprehensive and interpretable comparison of model performance.

      We have included a description of the method under section 4.9. Statistical significance analysis.

      (7) Could you clarify how data are standardized across training, validation, and tests (including Z-standardization for the cognitive tests)? This is to prevent data leakage.

      Thanks for the comments. We did standardization the cognitive test from both training and test, separately.

      We have added the following paragraph to the method section:

      “A composite score of performances across the three tests was calculated and used as the measure of the cognitive domain in question (i.e., episodic memory, working memory). For each of the three tests, scores were summarized across the total number of trials. The three resulting sum scores were z-standardized and averaged to form one composite score for each domain. The standardization has been carried out independently for the training (DyNAMiC) and test (COBRA) samples.”

      (8) There is really no ground truth to confirm that Grad-CAM provides actual feature importance used by the models. Perhaps the authors should compare that with Haufe transformation, which is commonly used in the predictive model for cognition (e.g., https://doi.org/10.1016/j.neuroimage.2021.118648 and https://doi.org/10.1016/j.neuroimage.2023.120115).

      We appreciate the reviewer’s comment and the suggested references. The Haufe transformation is primarily applied in traditional machine learning models, particularly in cognitive neuroscience, to interpret linear predictive models by mapping classifier weights back to the input space. However, its direct applicability to deep learning models, especially convolutional neural networks, remains an open research area with no widely established methodologies. Furthermore, the Haufe transformation does not provide feature importance in the same manner as Grad-CAM. Grad-CAM highlights spatial regions within an image that contribute to a model’s decision, making it particularly useful for interpreting convolutional networks in vision tasks. In contrast, the Haufe method offers a weight transformation that is more suited for understanding linear models and may not be as intuitive for feature attribution in complex hierarchical representations such as those learned by deep neural networks.

      While we acknowledge that Grad-CAM, like other interpretability methods, does not provide absolute ground truth validation for feature importance, it remains one of the most widely used and validated techniques for deep learning interpretability, particularly in medical imaging applications. Given its integration with frameworks such as Keras and TensorFlow and its ability to provide spatial attributions aligned with domain knowledge, we believe it is a suitable choice for our study. Future work may explore additional interpretability techniques, including adaptations of the Haufe transformation if applicable to deep learning architectures.

      We have added more details on Grad-CAM implementations in the Method.

      (9) Related to Grad-CAM, "These edges, indicated by a salience intensity of {greater than or equal to}.5, exert a significant influence on the model (Figure 1f)." What does 'significant' in this context mean? And how did the authors come up with the .5 threshold? Is it based on permutation or bootstrapping tests?

      We appreciate the reviewer’s comment and the opportunity to clarify our approach. In this context, the term "significant" refers to the regions' relative contribution to the model’s decision, as shown by the Grad-CAM saliency map. However, to avoid implying statistical testing, we will revise the term to "highly contributing."

      Regarding the 0.5 threshold, this value was selected empirically based on the normalized Grad-CAM activation values, where saliency scores range between 0 and 1. A threshold of 0.5 was used as a heuristic to highlight regions with relatively strong activation. However, this was not determined through statistical methods such as permutation or bootstrapping tests. We recognize the importance of rigorous threshold selection and will clarify this in the text. Future work could incorporate statistical methods to define thresholds more objectively.

      We have included the following text in the Method section:

      ”Grad-CAM saliency maps were interpreted qualitatively, with a heuristic threshold (≥ 0.5) applied to highlight regions with relatively higher contribution to the model’s predictions. These values do not reflect statistical significance and should therefore be interpreted descriptively.”

      (10) Still related to the saliency map, I believe the upper and lower triangles of the functional connectivity matrix are the same. If so, why are there some differences in saliency? While the difference is not prominent, this might affect the accuracy of Grad-CAM.

      Minor differences in the saliency maps between the upper and lower triangles of the FC matrix can arise due to several factors. For instance, Grad-CAM generates saliency maps at the resolution of the convolutional feature maps, which are then upsampled to match the input matrix dimensions. We initially used the default bilinear interpolation, which may have introduced slight asymmetries or blurring, resulting in interpolation artifacts. In response, we have reprocessed the saliency maps using spline interpolation in MATLAB. The updated saliency figures have been included in the revised version of the manuscript.

      (11) Why did the authors only report the cross-study for EM on rest, and for WM on n-back? This is a bit unexpected since COBRA has both rest and n-back. If there is no good justification, please report both.

      We focused on reporting cross-study results for EM using rest because rest was the winning condition for predicting EM in the DyNAMiC sample. Importantly, n-back did not significantly predict EM in DyNAMiC, and rest did not significantly predict WM. For this reason, we highlighted only the conditions that showed meaningful predictive power in the original analyses.

      (12) Are codes, trained models, and data available? To ensure transparency and reproducibility, I hope to see the code from preprocessing to modeling and statistical analyses.

      The analysis code is openly available on our GitHub page https://github.com/MorEsm/AI-based-Prediction-of-Cognitive-Function. Due to ethical considerations and GDPR restrictions in the European Union, we are not permitted to publicly share the raw data. However, we can provide detailed information about preprocessing steps and analysis pipelines to facilitate reproducibility.

      (13 &14) The authors did not appropriately control for regression-toward-the-mean and the influence of the working memory itself when calculating the brain cognition gap. This is commonly done to brain age (see https://doi.org/10.7554/eLife.87297.4https://doi.org/10.1002/hbm.25533https://doi.org/10.1016/j.nicl.2020.102229https://doi.org/10.3389/fnagi.2018.00317). Otherwise, the brain cognition gap still depends on the cognition/working memory score itself. Based on Tetereva et al., "If, for instance, Brain Age was based on prediction models with poor performance and made a prediction that everyone was 50 years old, individual differences in Brain Age Gap would then depend solely on chronological age (i.e., 50 minus chronological age)." Because of this, Tetereva and colleagues found that the 'uncorrected' brain age gap that predicted chronological age the worst became the best index to predict fluid cognitive abilities. This shows the pitfall of the 'uncorrected' brain age gap. You can apply the same logic to the brain cognition gap.

      (14) Additionally, another way to show the unique contribution of brain cognition, over and above cognition per se, is to add both brain cognition and cognition together to predict physical activity, education, and cardiovascular risk.

      We thank the Reviewer for raising this important point. In response to their request and also the request from Rev. 1, we first examined the relationship between the Brain-Cognitive Gap (BCG) and the cognitive measure itself. Surprisingly, we did not find any significant relationship in either the DyNAMiC sample (r =0.01, p =0.939) or the COBRA sample (r =0.01, p =0.894) (see Author response image 1).

      We then conducted additional analyses, splitting the sample into high and low EM performers, and compared their levels of physical activity and Framingham cardiovascular risk scores. We found that no significant difference in physical activity (DyNAMiC: p =0.56, CI: -14.99 – 8.13; COBRA: p =0.29, CI: -3.54 – 1.05) or Framingham CVD risk score (DyNAMiC: p =0.11, CI: -1.08 – 10.72; COBRA: p =0.41, CI: -1.86 – 4.58) between high and low EM perfprmers. Given the significant difference in physical activity and Framingham CVD risk score between positive and negative BCG groups, our results support that BCP provides unique information, beyond cognitive measure, regarding factors that contribute to cognitive resilience. These results have been added to Section 2.4, and Figure 3 has been updated.

      (15) Related to the brain age gap, the brain cognition gap is actually just another way to quantify how generalizable models are to another sample, similar to MAE or MSE. If the models built from DyNAMiC don't fit well with samples from COBRA, you will get a higher (i.e., wider) brain cognition gap, which means a poor fit. The authors should discuss this interpretation - should your biomarker's performance be due to a fit of the model?

      We appreciate this insightful comment. We agree that BCG can be interpreted not only as a marker of individual differences and resilience factors but also as a measure of model fit, analogous to error metrics, such as MAE or MSE. A higher gap may, in part, reflect poorer generalizability of models across samples. We have now revised the Discussion to explicitly acknowledge this alternative interpretation and to emphasize that BCG should be viewed both as a candidate biomarker and as a reflection of model performance.

      We added the following paragraph in the discussion:

      “An important caveat is that BCG can also be conceptualized as an error metric, similar to mean absolute error or mean square error, reflecting the extent to which models trained in one sample generalize to another. From this perspective, a larger gap may not only indicate individual differences related to resilience factors and dopaminergic function, but also reduced model fit or generalizability across datasets. Thus, BCG likely reflects a combination of meaningful biological variability and methodological variance.”

      (16) It is unclear why the authors binarized the brain cognition gap when predicting physical activity, education, and cardiovascular risk, and not doing so with the striatal D1DR. It is rarely a good idea to binarize a continuous variable (see 10.1136/bmj.332.7549.1080). In this case, people who had a bigger negative brain cognition gap were treated equally to people who had a smaller negative brain cognition gap. I also do not think it is necessary to separately analyze positive and negative gaps. Perhaps the authors should correlate the corrected brain cognition gap with physical activity, education, and cardiovascular risk and provide scatter plots and effect sizes.

      Following the reveiwer suggestion, we directly correlated BCG with physical activity and cardiovascular risk. Our results confirmed our initial analysis that individuals with a negative gap exhibited lower physical activity and higher Framingham CVD risk across both COBRA and DyNAMiC datasets. We have reported these results on page 10.

      Author response image 5.

      (17) Given that the motivation is to move away from brain age, the authors should benchmark the corrected brain cognition gap against the corrected brain age gap, as well as against the performance when directly predicting physical activity, education, and cardiovascular risk from the functional connectivity metrics.

      Author response image 6.

      We agree that benchmarking BCG against BAG in predicting lifestyle and vascular risk factors would be valuable. We have calculated adjusted BAG and related it to lifestyle and vascular risk factors. Interestingly, we did not find any significant association, suggesting that BCG might be more sensitive to cognitive resilience. However, this investigation was beyond the scope of the present study. Our aim was not to compare BCG with BAG, but rather to examine whether BCG provides information beyond cognition itself. We also note that introducing BAG would open a separate line of investigation, namely, which cognitive state (rest, movie-watching, n-back) best estimates biological age. While this is an interesting question in its own right, addressing it here would considerably broaden the scope and complexity of an already dense manuscript. To prevent misunderstanding, we have clarified this point in the Discussion and added a caveat noting that future work should explicitly benchmark these approaches. That said, if the Reviewer and/or the Editor incline to add these additional findings into the manuscript, we are open to doing so in a revision.

      We have added the following sentence to the Discussion.

      “While our focus was to investigate whether the brain–cognition gap provides information about factors contributing to cognitive resilience, we acknowledge that benchmarking BCG against the brain-age gap in predicting lifestyle and vascular risk factors would be valuable. However, addressing this question lies beyond the scope of the present study, and future work should systematically compare these approaches.”

      (18) Why was only the working memory score used to create brain cognition, and not episodic memory as well? Including both could provide a more comprehensive measure.

      We initially attempted to predict both episodic memory (EM) and working memory (WM). However, EM prediction was only reliable within and across samples for the resting state, whereas WM prediction generalized most strongly from the movie-watching condition. Because COBRA does not include a movie-watching paradigm, we could not evaluate WM prediction across datasets. For this reason, we focused on EM when examining the brain–cognition gap.

      (19) The PET mediation analysis seemed to come out of the blue. Is there existing literature showing the relationship between striatal D1DR and cognition? If so, did the authors find a similar relationship in the current data? I also suggest rewriting this section to strengthen the justification for the PET mediation analysis.

      We have previously conducted studies in which DA found to be associated with memory (Johansson et al., 2023, Nyberg et al., 2016).

      The third aim of our study was to examine whether DA integrity is implicated in brain–cognition gaps (BCG), which we propose as a marker of cognitive resilience. In line with this aim, we found that lower DA receptor availability was associated with larger BCGs (Figure 4). We then asked whether this relationship is mediated by functional signal variability, such that lower DA is linked to reduced signal-to-noise ratio (i.e., greater entropy in functional connectivity), which in turn contributes to less reliable prediction of cognition and, consequently, larger BCGs. Our mediation analysis supports this pathway (see also our reply to Reviewer 1, Comment 6).

      Thus, our mediation was not designed to test whether DA predicts episodic memory performance directly, nor whether BCG mediates such a relationship. Instead, we specifically investigated whether the effect of DA on BCG operates through functional variability. We agree that future work could extend our approach by directly examining whether BCG mediates the link between DA and cognitive outcomes. However, in the present study, our primary focus was on testing the mechanistic pathway of DA → entropy → BCG.

      Minor recommendations:

      (1) Task-based connections are not truly task-based, as they are around 70-80% related to the resting state, capturing non-task-specific functional connectivity. Task-based connections should refer to techniques that derive task-related connectivity, such as psychophysiological interaction and beta-series correlation. Perhaps use terms like "functional connectivity during tasks."

      Thank you. This has been corrected throughout the manuscript.

      (2) Are there really two studies? The same MRI was used with the same configurations, and participants were from the same city. The only difference is the age range. It may be more appropriate to refer to this as "across age groups" rather than "cross-datasets."

      Thank you for this comment. While the two samples share some similarities, there are also several marked differences beyond age range. For example, Movie-watching was administered in DyNAMiC but not collected in COBRA. The resting-state fMRI sequence was 12 minutes in DyNAMiC but only 6 minutes in COBRA. Moreover, DyNAMiC included dopamine D1-receptor PET, whereas COBRA assessed dopamine D2-receptor availability. Even the questionnaires used to measure physical activity differed between the two studies. Given these methodological and measurement differences, we believe that referring to them as “cross-datasets” rather than “across age groups” more accurately captures the distinction.

      (3) What kind of movie is "Cockpit"? Can you explain? Different movies may elicit different patterns of connectivity.

      We apologize for not providing information about the movie, which has been presented in our recent work (Johansson et al., 2023).

      The participants’ reactions to the content of the movie were not monitored, but the clips were selected to be as neutral in their content as possible. The content of the movie: Following his termination as a pilot and the end of his marriage, Valle embarks on a quest to secure new employment. Faced with desperation in the job market, he resorts to disguising himself as a woman with the intention of obtaining a position at a company specially seeking a female pilot.

      This information is added to the method section.

      “During the fMRI session, participants viewed a 12-minute segment from the Swedish comedy film Cockpit (2012). We did not monitor participants’ responses to the movie, and the chosen clips were selected to be relatively neutral in emotional content. The storyline follows Valle, a recently fired pilot whose marriage has ended, as he struggles to find new employment. In a desperate attempt to secure a job at an airline specifically recruiting a female pilot, he presents himself as a woman.”

      (4) There is a typo in the equation numbering (i.e., two equations are designated as #1).

      We have now corrected the typo.

      (5) From the discussion: "Importantly, this prediction generalizes across conditions." This is not surprising given the similarity between conditions, with around 70-80% variance.

      We agree with the reviewer that the high similarity of FC across states likely increases the chance of cross-condition generalizability. However, this generalization is not guaranteed for all models. For example, the model trained on FC during movie-watching successfully predicted episodic memory during rest, but it did not generalize to episodic memory during the n-back condition, although movie-watching and n-back FC patterns are themselves highly correlated. Thus, the observed generalization is meaningful in demonstrating that not all models transfer equally well across states.

      That said, we have added the following sentence to the Discussion:

      “Importantly, this prediction generalizes across conditions and datasets, suggesting that features derived from resting state FC serve as a relatively stable marker of individual differences in EM, though with reduced strength in COBRA. While such generalization is partly facilitated by the similarity of functional connectivity across states, it is not a trivial outcome. For instance, the model trained on movie-watching data generalized to EM prediction during rest but failed to do so for the n-back condition, even though movie-watching and n-back connectivity patterns are themselves highly correlated. This indicates that successful generalization depends not only on shared variance across states but also on the cognitive processes most relevant to the target behavior.”

      (6) It might be helpful to include some figures for the cognitive tasks used. The description is a bit hard to follow without visual aids.

      Thanks for the comment. We have had a figure describing this in the initial paper about DyNAMiC (Nordin et al., 2022). We have added the Supplementary Figure (Fig S3) in the manuscript.

      Fig S3. Overview of the cognitive tests included in the DyNAMiC study. Adopted from Nordin et al. with permission.

      (7) It may not be appropriate to use the term "cross-validation" here, as one dataset was used for testing and the other for training, but not vice versa (so no "cross" per se).

      We thank the reviewer for pointing this out. We agree that the term “cross-validation” is not precise in this context, since we trained the model in one dataset and tested it in another without performing the reverse. We have revised the manuscript to use the term “external validation” instead of “cross-validation” to more accurately describe our cross-dataset approach.

      (8) I don't have access to the supplementary materials or code/data, so all of the comments here are based on the main text.

      We have added the supplementary materials and inserted the GitHub link to the code.<br />

      Reviewer #3 (Recommendations for the authors):

      I suggest benchmarking against other simpler algorithms and controlling for memory in the brain cognition gap analyses.

      The authors might also want to simplify some aspects of the paper. There is a lot going on, which leaves less space to go into enough details for some analyses to warrant claims in the discussion. For example, the authors only compare the deep net to CPM and kernel ridge based on the literature. Direct comparisons would be needed.

      Thanks for the comment. We have made an attempt to address the concerns outlined in the public recommendation. Our study explicitly compares predictive power across different cognitive states (rest, movie watching, n-back), with the aim of identifying the states that best capture individual differences across domains. Thus, our goal was not to propose a universally superior prediction model, but rather to test how brain state influences predictive utility for WM and EM using a deep learning approach. We have revised the manuscript text to make this focus clearer and to avoid any misinterpretation of our aims. Specifically, we removed statements in the Discussion that could be read as suggesting that our deep learning approach outperforms prior machine learning methods. While we compared our model with the connectome predictive modeling (CPM) approach and observed better performance with our deep learning framework, we did not conduct a comprehensive benchmark across all available machine learning methods, nor was this the aim of the present study. Accordingly, we have adjusted the text to avoid implying methodological superiority beyond the scope of our analyses. Furthermore, we have controlled for memory as suggested by the reviewer and outlined in response to reviewer 1.

    1. Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?

      大多数人认为AI生成的代码只要能通过测试就是高质量的,但作者认为这种观点存在严重缺陷,因为代码的可维护性才是关键。FrontierCode的创新之处在于它评估代码是否真正可合并,而不仅仅是单元测试通过,这挑战了行业对代码质量的主流评估标准。

    2. Models write sloppy code that works but isn't maintainable. Our eval is first to measure: would you actually merge this code?

      大多数人认为AI代码评估应该关注功能正确性,但作者认为我们应该评估代码是否真正可合并,这挑战了传统基准测试的共识。FrontierCode引入了'可合并性'这一新标准,关注代码质量而非仅通过测试,这是一个反直觉的转变。

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The inferred relationships between neural clusters and specific drift‑diffusion parameters (e.g., bound height, scaling factor, non‑decision time) are intriguing but inherently correlational. The authors should clarify that these associations do not necessarily establish distinct computational mechanisms.

      We agree and have revised the text to avoid any mention of a causal relationship.

      (2) While the k‑means approach is well described, it remains somewhat heuristic. Including additional cross‑validation (e.g., cluster reproducibility across monkeys or sessions) would strengthen confidence in the four‑cluster interpretation.

      We took several steps to increase our confidence in the clustering results. First, we made improvements in how we used the k-means method, primarily by using activity vectors with finer time resolution and filtering out “outlier” neurons (details in Methods) that were dissimilar to other neurons to reduce spurious clustering results. Second, we performed a new set of clustering procedures based on the linkage method, in addition to the k-means method that we originally used. The two clustering methods generated very similar neuron groupings, with a Rand index of 0.93. We now present k-means results in the main figures and linkage results as supplements (e.g., compare Fig 5 and Fig 5-S2). Third, following the reviewer’s suggestion, we performed clustering based on the two monkeys’ data both combined and separately (new Fig 5-S3). Clustering of data from both monkeys combined, compared to each monkey considered separately, had rand index values of 0.94 and 1 for monkeys C and F, respectively (i.e., neurons from one monkey tended to be assigned to the same cluster regardless of whether the clustering was based on data from that monkey alone or both monkeys together), indicating comparable cluster boundaries for the two monkeys. Lastly, we performed clustering based on pseudo-vectors derived from sampling a subset of trials for each neuron and found that the clustering results were stable and robust based on as low as 40% of the trials (new Fig 5-S4).

      Because most neurons were recorded in separate sessions, we cannot perform session-based cross validation.

      (3) The functional dissociations across clusters are clearly described, but how these subgroups interact within the STN or through downstream basal‑ganglia circuits remains speculative.

      We agree and have made sure any speculative claims we make are clearly described as such.

      (4) A natural next step would be to construct a generative multi‑cluster model of STN activity, in which each cluster is treated as a computational node (e.g., evidence integrator, bound controller, urgency or evaluative signal).

      (5) Such a low‑dimensional, coupled model could reproduce the observed diversity of firing patterns and predict how interactions among clusters shape decision variables and behavior.

      (6) Population‑level modeling of this kind would move the interpretation beyond correlational mapping and serve as an intermediate framework between single‑unit analysis and in‑vivo perturbation.

      We agree that such a model would be extremely useful. However, given that designing, implementing, and testing a model like that would require a good deal of speculation about functional and anatomical interactions that we did not measure, it is also well outside the scope of the current study.

      That said, we appreciate the suggestions, which spurred us to go further in terms of providing a summary of our findings (new Figure 9) with a bit of informed speculation about how the different functionally defined subgroups of STN neurons that we characterized might relate to not only different computations but also different pathways through the basal ganglia (i.e., the hyperdirect versus indirect pathway, both of which include the STN). We hope that this summary, along with our more detailed findings, will inform new modeling studies by us and others.

      (7) Causal inference gap - Without perturbation data, it is difficult to determine whether the identified neural modulations are necessary or sufficient for the observed behavioral effects. A brief discussion of this limitation - and how future causal manipulations could test these cluster functions - would be valuable.

      As suggested, we have added the following to the Discussion (line 365): “The exact contributions of these subpopulations are challenging to elucidate, as their intermingled localization make common perturbation techniques, such as electrical microstimulation or optogenetic manipulations, not suitable. It would be interesting to examine if these subpopulations differ in molecular or connectivity properties (e.g., as we speculated above) that can be capitalized to precisely target each subpopulation.”

      Reviewer #1 (Recommendations for the authors):

      (1) Develop or outline a generative multi‑cluster model:

      Consider constructing, even at a conceptual level, a generative network model in which the identified STN clusters serve as interacting computational nodes (e.g., evidence integration, bound modulation, urgency, or evaluative nodes).

      Such a framework could reproduce the simultaneous presence of ramping, transient, and context-sensitive activity patterns observed across clusters.

      Even a simulated or schematic implementation - showing how parameter coupling among these clusters gives rise to the reported firing diversity and behavioral effects - would help clarify the mechanistic implications of your findings.

      As noted above, we believe that a full modeling study is well outside the scope of the present work. However, we have provided a conceptual framework, shown in Figure 9, summarizing our findings and providing some informed speculation about how different subgroups of STN neurons could provide different functions along distinct anatomical pathways.

      (2) Strengthen the link between cluster activity and computation:

      Use cross‑validated or hierarchical regression models to verify the robustness of correlations between cluster‑specific firing measures and fitted drift‑diffusion parameters. This would make the mapping between neural activity and model components more statistically grounded.

      We appreciate the suggestion and thought hard about how we might implement it but ultimately decided our approach is most appropriate, given the strengths and limitations of our dataset. The fundamental issue is that it takes many trials to obtain reliable estimates of DDM parameters. Our approach of creating twelve “pseudo-sessions” for each neuron (half of those for trials with high firing rates, half for trials with low firing rates) balances our ability to obtain those estimates while testing for relationships with firing rate. Any further subdivision of the data for cross validation yields unreliable parameter estimates (i.e., with big error bars). We also chose not to use a hierarchical model and instead took a more unbiased approach by considering how all of the DDM parameters relate to firing rate.

      Despite the simplicity of our approach, we believe that these results are statistically grounded. It is possible that more complex regression models may reveal additional (e.g., non-linear) relationships, but those results would also be less intuitive to interpret. We therefore decided to retain our analysis choice.

      (3) Assess cluster reproducibility:

      Report or include in the supplement the degree of correspondence of cluster identities across monkeys or across independent subsets of trials. Cluster stability metrics (e.g., bootstrap or split‑half analysis) would reassure readers that cluster structure is not dataset‑specific.

      Please see our response above to the main comment #2 regarding the robustness and stability of clustering results.

      (4) Explore population interactions directly:

      You could analyze pairwise or population‑level covariations (e.g., principal components or canonical correlation analysis) to test whether inter‑cluster interactions correspond to model‑predicted dynamics such as competition or normalization.

      Because most of the neurons were recorded in separate sessions and not simultaneously, the suggested population analyses are not feasible.

      Discuss briefly how the proposed generative or dynamical multi‑cluster model could be empirically tested-e.g., using selective perturbation (microstimulation, optogenetic, or pharmacological) in future studies-to evaluate interactions inferred from the current dataset. If feasible, mention how this framework might generalize to other decision contexts beyond oculomotor tasks, such as effort‑reward tradeoffs or inhibitory control, reinforcing the broad relevance of STN computations.

      As suggested, we have added the following to the Discussion (line 366): “The exact contributions of these subpopulations are challenging to elucidate, as their intermingled localization make common perturbation techniques, such as electrical microstimulation or optogenetic manipulations, not suitable. It would be interesting to examine if these subpopulations differ in molecular or connectivity properties (e.g., as we speculated above) that can be capitalized to precisely target each subpopulation.”

      Reviewer #2 (Public review):

      One criticism I would make is that the authors sometimes seem to assume that readers are familiar with their previous work. Indeed, the motivation and choices behind some analyses are not clearly explained. It might be interesting to provide a little more context and insight into these methodological choices. The same is true for the description of certain results, such as the behavioral results, which I find insufficiently detailed, especially since the two animals do not perform exactly the same way in the task.

      We apologize for the lack of detail regarding the behavioral results and analysis choices. To address this issue, we substantially revised the text, particularly in Results and Methods.

      The differences in behavior for the two monkeys were the subject of an entire published study (Fan Y, Gold JI, Ding L, 2018, Ongoing, rational calibration of reward-driven perceptual biases. Elife 7: e36018.). That study showed that these differences most likely arose from the monkeys’ individual sensitivity to the motion stimulus, combined with a heuristic-based strategy to gain satisficing rewards that they all seem to use. We revised the text to acknowledge the individual differences and refer readers to our previous study (line 78): “Both monkeys showed consistent biases toward the large-reward choice (Figure 1B, C). The individual differences in their choice and RT performance reflect individual differences in sensitivity to motion stimulus and a common heuristic-based satisficing strategy, as we demonstrated in a previous study (Fan et al., 2018).”

      Another criticism is the difficulty in following and absorbing all the presented results, given their heterogeneity. This heterogeneity stems from analytical choices that include defining multiple time windows over which activities are studied, multiple task-related or monkey behavioral factors that can influence them, multiple parameters underlying the decision-making phenomena to be captured, and all this without any a priori hypotheses. The overall impression is of an exploratory description that is sometimes difficult to digest, from which it is hard to extract precise information beyond the very general message that multiple subpopulations of neurons exist and therefore that the STN is probably involved in multiple roles during decision-making.

      In response to the three reviewers’ comments on data inclusion and the clustering analysis we presented, we have substantially improved the objectivity and robustness of our approaches, by: 1) applying a data-driven criterion for identifying neurons with robust task-relevant modulation (Figure 4C), 2) removing “outlier” neurons that appear not to share activity profiles with any other neurons in our sample (note that these outlier neurons would be at the outskirts in the cluster space instead of between clusters), 3) increasing the temporal resolution for generating firing rate vectors, and 4) comparing clustering results based on two methods (k-means and linkage). These improvements both sharpened the cluster boundaries and allowed us to observe more robust and distinctive subpopulation-specific relationships between neural activity and computational components in the DDM framework (new Figures 5–7 and their supplementary figures). We believe these updated results clearly demonstrate that: 1) there are different STN subpopulations, and 2) each of the subpopulations encodes a distinct set of functions.

      It would also have been interesting to have information regarding the location of the different identified subpopulations of neurons in the STN and their level of segregation within this nucleus. Indeed, since the STN is one of the preferred targets of electrical stimulation aimed at improving the condition of patients suffering from various neurological disorders, it would be interesting to know whether a particular stimulation location could preferentially affect a specific subpopulation of neurons, with the associated specific behavioral consequences.

      We have added a new Figure 8 to show the localization of neurons with and without task modulation and of neurons from different subpopulations. Consistent with our previous demonstration of intermingled distribution of STN subpopulations, we did not observe any activity pattern-based segregation.

      To relate the activity patterns to previously reported stimulation effects, we added the following to the Discussion (line 307): “This functional diversity, along with a lack of clear anatomical organization, is consistent with the multiple effects of STN stimulation in patient populations on decision-making and out previous results in monkeys, including reductions in response times, a weaker dependence on evidence, and changes in the maximal value and trajectories of the decision bound (Frank et al., 2007; Cavanagh et al., 2011; Coulthard et al., 2012; Green et al., 2013; Zavala et al., 2014; Herz et al., 2016; Pote et al., 2016; Branam et al., 2024).”

      Therefore, this paper is interesting because it complements other work from the same team and other studies that demonstrate the likely important role of the STN in decision-making. This will be of interest to the decision-making neuroscience community, but it may leave a sense of incompleteness due to the difficulty in connecting the conclusions of these different studies. For example, in the discussion section, the authors attempt to relate the different neuronal populations identified in their study and describe some relatively consistent results, but others less so.

      We hope that our revised Results and Discussion clarify the conclusions that can be drawn from this and other related studies.

      Reviewer #2 (Recommendations for the authors):

      (1) Introduction, l. 47-48: It would be interesting to provide more details on these three populations in order to better understand why we need additional experiments to more comprehensively define their roles.

      We now give more details in the Introduction about the remaining questions we aimed to address in this study (line 50): “However, the specific computational roles that these different subpopulations play in decision-making and other cognitive functions remain not well understood. For example, two of the subpopulation had overall activity patterns that were consistent with two different models in which the STN modulated the decision bound (Ratcliff and Frank, 2012; Wei et al., 2015), but the exact nature of this modulation is not known. The other subpopulation’s general activity patterns were consistent with a model of STN mediating evidence accumulation (Bogacz and Gurney, 2007), but it is unclear if and how this activity contributes to how evidence is weighed, biased, or accumulated.”

      Our previous attempt to distinguish these alternatives using electrical microstimulation was unsuccessful because that manipulation likely affected highly intermingled subpopulations with different functions.”

      (2) Results, l. 71-73: A slightly more detailed description of the behavioral results would be appreciated, especially since the two monkeys do not behave exactly the same way in the task, particularly in terms of reaction times (Figure 1B top-right versus bottom-right).

      We revised the text to acknowledge the individual differences and refer readers to our previous study (line 78): “Both monkeys showed consistent biases toward the large-reward choice (Figure 1B, C). The individual differences in their choice and RT performance reflect individual differences in sensitivity to motion stimulus and a common heuristic-based satisficing strategy, as we demonstrated in a previous study (Fan et al., 2018).”

      (3) Figure 2G-I: Were the multiple linear regressions performed only in the asymmetric reward condition?

      Yes. We added in Methods (line 487): “All analyses were performed on activity from the asymmetric-reward task.”

      (4) Very often in the text, the authors use terms that refer to concepts or methods that are difficult to grasp on the first reading, especially if we are not familiar with the team's previous publications. This is the case, for example, with "joint modulation," "reward context," "reward expectation," "k-means clustering," "tSNE," "Silhouette score for neurons," "Rand index," etc. All the explanations are minimal, and it would be helpful to clearly define these terms and provide some justification and insight to support the use of the analyses and the resulting variables, all of which would facilitate the reading of the manuscript.

      We now define these terms explicitly in the text (emphasis added here for clarity):

      (Results, line 129): “Using a previous definition of “joint modulation” (Doi et al., 2020), including modulation separately by motion coherence and reward context or reward size and modulation by the interaction of motion coherence and reward size, we found that ~40% of the neurons showed joint modulation during motion viewing.”

      (Results, line 71): “… for which we separately manipulated the noisy evidence (motion direction and strength) and reward context (a larger juice reward for a correct choice associated with one of the two directions).”

      (Results, line 250): “Choice accuracy describes the probability that a choice is correct given the evidence. Reward expectation describes the the expected reward given a choice.”

      (Methods, line 550): “To quantify the consistency between two runs of clustering, we computed the Rand index as the number of neuron pairs with consistent grouping (i.e., they were placed in the same cluster for both runs or they were placed in different clusters for both runs), normalized by the total number of possible neuron pairs. A value of 1 indicates that the two clustering runs produce identical results, and a value of 0 indicates that the two runs do not agree on any pairs of neurons.”

      To quantify the separation of clusters, we computed silhouette scores as the difference between mean intra-cluster distance and the mean nearest-cluster distance, normalized by the maximum of the two values. A positive score indicates that the member is closer to its same-cluster neighbors than different-cluster neighbors. Clustering runs with high mean silhouette score were considered to have better cluster separation.

      We no longer use tSNE visualization.

      (5) Figure 5A, caption: A quick description of the parameters would be useful.

      We added the description of DDM parameters in the caption of new Figure 4.

      (6) Results l. 222: Why does the analysis only concern epoch 5? I suggest justifying this choice. Also, the text indicates a "trend" but Figure 5C shows a significant result (p=0.0129).

      These statements have been removed from the updated manuscript.

      (7) Methods, l. 443: The authors should report more details about how they decided that neurons were task-related or not. "Visual inspection" sounds like a very vague and subjective criterion.

      We now apply a more objective criterion for identifying neurons with task-relevant modulation:

      (Results, line 145): “To focus on neurons with the most robust task-relevant activity, we measured firing rates during a baseline period (300 ms before motion onset) and sliding 100 ms windows from motion onset to 150 ms after saccade onset in 50 ms steps. We identified the maximal and minimal z-scores, representing the peak activation and suppression, respectively, for each neuron across all trial conditions (Figure 4C). We applied a threshold of z-score >1.5 for either activation or suppression and focused further analyses on the 87 neurons that met this selection criterion (n = 62 and 25 for monkeys C and F, respectively).”

      (8) A map of the location of the different STN neuron clusters found in this study within the structure would be very interesting.

      We have added a new Figure 8 to show the localization of neurons with/without task modulation and of neurons from different subpopulations.

      (9) Unless I am mistaken, there is no mention of data availability in this manuscript.

      The data availability statement was/is on the submission form.

      Data Availability: All electrophysiological data and the code for the analyses presented in the paper will be deposited in a publicly accessible domain when the paper is published.

      Previously Published Datasets: Source data for Figure 3-S2 in eLife paper:

      https://doi.org/10.7554/eLife.60535.: Fan, Doi, Gold, Ding, 2020,

      https://cdn.elifesciences.org/articles/60535/elife-60535-fig3-data1-v1.csv,

      https://cdn.elifesciences.org/articles/60535/elife-60535-fig3-data1-v1.csv

      Reviewer #3 (Public review):

      The primary weakness of the paper lies in the claim that STN contains multiple sub-populations with distinct involvements in decision making, which is inadequately supported by the paper's methods and analyses.

      First, while it is clear that the ~150 recorded neurons across 2 monkeys (91, 59 respectively) display substantial heterogeneity in their activity profiles across time and across stimulus/reward conditions, the claim of sub-populations largely rests on clustering a *subset of less than half the population - 66 neurons (48, 15 respectively) - chosen manually by visual inspection*. The full population seems to contain far more decision-modulated neurons, whose response profiles seem to interpolate between clusters. Moreover, it is unclear if the 4 clusters hold for each of the 2 monkeys, and the choice of 4-5 clusters does not seem well supported by metrics such as silhouette score, etc, that peak at 3 (1 or 2 were not attempted). From the data, it is easier to draw the conclusion that the STN population contains neurons with heterogeneous response profiles that smoothly vary in their tuning to different decision variables, rather than distinct sub-populations.

      In response to the three reviewers’ comments on data inclusion and the clustering analysis we presented, we have substantially improved the objectivity and robustness of our approaches, by: 1) applying a data-driven criterion for identifying neurons with robust task-relevant modulation (Figure 4C), 2) removing “outlier” neurons that appear not to share activity profiles with any other neurons in our sample (note that these outlier neurons would be at the outskirts in the cluster space instead of between clusters), 3) increasing the temporal resolution for generating firing rate vectors, and 4) comparing clustering results based on two methods (K-means and linkage). These improvements both sharpened the cluster boundaries and allowed us to observe more robust and distinctive subpopulation-specific relationships between neural activity and computational components in the DDM framework (new Figures 5–7 and their supplementary figures). We believe these updated results clearly demonstrate that: 1) there are different STN subpopulations, and 2) each of the subpopulations encodes a distinct set of functions.

      We performed additional analysis to assess the robustness of the clustering results. First, following the reviewer’s suggestion, we performed clustering based on the two monkeys’ data both combined and separately (new Fig 5-S3). Clustering of data from both monkeys combined compared to each monkey considered separately had rand index values of 0.94 and 1 for monkeys C and F, respectively (i.e., neurons from one monkey were assigned to the same cluster regardless of whether the clustering was based on data from that monkey alone or both monkeys together), indicating comparable cluster boundaries for the two monkeys. Second, we performed clustering based on pseudo-vectors derived from sampling a subset of trials for each neuron and found that the clustering results were stable and robust based on as low as 40% of the trials (new Fig 5-S4). Third, we generated a new figure (Figure 5-S1), using dendrograms to visualize how the neurons relate to each other. The dendrogram in Figure 5-S2 is more consistent with (at least) three distinct subpopulations of neurons than with the null hypothesis of a continuous distribution with smoothly-varying response profiles.

      Second, assuming the existence of sub-populations, it is unclear how their time- and condition-varying relationship with DDM parameters is to be interpreted. These relationships are inferred by splitting trials based on individual neurons' firing rates in different task epochs and reward contexts, and regressing onto the parameters of separate DDMs fit to those subsets of trials. The result is that different sub-populations show heterogeneous relationships to different DDM parameters over time - a result that, while interesting, leaves the computational involvement of these sub-populations/implementation of the decision process unclear.

      The improvements we made of the clustering procedure both sharpened the cluster boundaries and allowed us to observe more robust and distinctive subpopulation-specific relationships between neural activity and computational components in the DDM framework (new Figures 5-7 and their supplementary figures). These updated results demonstrate that: 1) there are different STN subpopulations, and 2) each of the subpopulations encodes a particular set of functions.

    1. LLMs are eroding my software engineering career and I don't know what to do
      • The author is a senior software engineer with a decade of professional experience, primarily focused on web backend development, including settlement processing and accounting systems.
      • Historically, the author took pride in their deep domain expertise, debugging intuition, and ability to meticulously design complex systems before writing a single line of code.
      • The author experienced a profound realization when their manager noted that while their code delivery speed was excellent, they were taking too long to write up design documents.
      • Upon testing a high-performing LLM, the author was shocked to find that the AI could rapidly structure and organize the most complex, specialized logic that had taken them years of sweat and tears to master.
      • While human engineers are still fundamentally necessary to pilot the models, review the output, and act as the "human in the loop," the author feels their role has been reduced to that of an "off-the-shelf," replaceable commodity.
      • The decline in the perceived value of high-quality, handcrafted technical design has led to a sense of existential dread, making the author feel as though they spent ten years building a skillset that is rapidly becoming obsolete.
      • The author expresses profound uncertainty and sadness regarding the future of their software engineering career, feeling trapped in a shifting landscape where their hard-earned specialization no longer differentiates them from generalists.

      Hacker News Discussion

      • The Leveling of the Playing Field: Commenters validated the author's observation that AI acts as a massive equalizer. A single domain expert paired with an LLM can now scale their output, write documentation, and conduct reviews at a pace that replaces a larger team of specialized human experts, drastically reducing the market premium for deep technical specialization.
      • Management Misconceptions and Operational Risk: A major point of discussion centered on the behavior of non-technical leadership. Many managers mistakenly believe LLMs can completely replace engineering talent, leading them to substitute experienced seniors with cheaper juniors to prompt the AI; users warned this will result in long-term architectural debt and severe system maintenance crises.
      • The Looming Threat of White-Collar Homogenization: Several users noted that this existential crisis is not unique to software engineering. If software development is disrupted to this extent, almost all knowledge-based, white-collar professions are vulnerable to similar devaluation, shifting the ultimate value away from individual expertise and toward the owners of capital.
      • Loss of Cognitive Critical Thinking: Participants argued that outsourcing the initial design phase to AI erodes foundational critical thinking. Because LLMs automatically gloss over omissions with confident assumptions, engineers stop asking hard, preventative architectural questions, shifting the discovery of critical edge cases to much later in the lifecycle.
      • A Counter-Perspective on LLM Limitations: Conversely, some engineers disagreed with the author's anxiety, asserting that fields like finance, taxation, and complex distributed infrastructure remain heavily insulated. They noted that AI agents regularly make confidently flawed logic errors, meaning true accountability, edge-case mitigation, and systemic understanding still absolutely require human experts.
    1. I design with Claude more than Figma now
      • The author, a designer at Jane Street, now primarily uses Claude Code rather than Figma to design and prototype new features.
      • Instead of creating traditional spec documents, Figma mockups, and proposals, the new workflow involves writing a problem description, opening an editor, and using Claude to build an interactive prototype inside the actual codebase.
      • Building high-fidelity prototypes directly in the medium (e.g., using OCaml and Bonsai at Jane Street) eliminates intermediary artifacts and allows the author to quickly iterate on minute details like keyboard shortcuts, copy, and button refinement.
      • This approach makes evaluating concepts much easier for stakeholders, as they can interact with a live tool rather than static frames, which is particularly valuable when testing the feasibility of complex features like internal LLM integration.
      • A key shift in their model happened over the course of a few months as improved models, growing prompting familiarity, and proper scoping allowed for handling large-scale diffs (exceeding 2,000 lines).
      • A major workflow challenge is how engineering teammates handle code reviews for fully baked features; the current solution treats the prototypes like "code mockups" that engineers can iterate on or reference to write the official production code.
      • The author expresses concern that relying on Claude might stifle fluid, out-of-the-box creativity, locking them into an incremental, iterative mindset constrained by what they expect the LLM can easily generate.

      Hacker News Discussion

      • The Shift from Static Design to Working Prototypes: Many users echoed the author's sentiment, noting that the traditional reliance on Figma for initial product concepts is declining. Teams increasingly prefer building quick, functional wireframes in dev environments that stakeholders can actually interact with.
      • Organizational Friction and "Vibe Coding" Pressure: A prominent topic of discussion was the tension this workflow introduces with management and business teams. When non-technical stakeholders or designers build a working prototype quickly using AI ("vibe coding"), leadership often pressures engineers to push it directly to production without understanding the need for refactoring, architecture, and handling edge cases.
      • Loss of Deep Design Thinking: Some commenters argued that outsourcing early-stage creation to an LLM removes a crucial phase of critical thinking. Because the AI automatically paints over gaps or details in a prompt, team members stop asking foundational questions ("how should we communicate this idea?" or "what happens when..."), leaving critical logic gaps to be fixed much later.
      • Homogenized and "Safe" Aesthetics: Users iterating with text-to-UI tools noted that the default visual output tends to adhere strongly to contemporary web tropes, resulting in boilerplate or generic Tailwind/Bootstrap-style layouts unless heavily prompted with highly specific design rules or unconventional examples.
      • The Long Tail of Accountability: Engineers emphasized that while AI dramatically speeds up the initial prototyping loop, it does not replace the necessity for engineering discipline. The long-term ownership of operational risk, system maintenance, edge-case mitigation, and on-call accountability still relies entirely on human experts.
    1. Each parcel is linked to property-level attributes (land use code, structure characteristics, assessor value) drawn from the CoreLogic property database. Where parcel records and CoreLogic records share a common identifier we match directly; the remaining parcels are matched spatially, with address-string similarity used to break ties when more than one CoreLogic record is closest. We retain 1–4 family residential parcels and exclude commercial, industrial, agricultural, and most government-owned land uses. A small number of residential parcels are recorded in CoreLogic as currently vacant or government-owned.

      Do we need to say anything here about how stacked parcels are handled.

      The last highlighted sentence about " a small number of residential parcels are recorded..." how were they identified as residential? because they have a mitigation or application on them or because NC OneMap had them labeled as residential?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This retrospective study provides new data regarding the prevalence of pain in women with PCOS and its relationship with health outcomes. Using data from electronic health records (EHR), the authors found a significantly higher prevalence of pain among women with PCOS compared to those without the condition: 19.21% of women with PCOS versus 15.8% in non-PCOS women. The highest prevalence of pain was conducted among Black or African American (32.11%) and White (30.75%) populations. Besides, women with PCOS and pain have at least a 2-fold increased prevalence of obesity (34.68%) at baseline compared to women with PCOS in general (16.11%). Also, women with PCOS had the highest risk for infertility and T2D, but women with PCOS and pain had higher risks for ovarian cysts and liver disease. Regarding these results, the authors suggested the critical need to address pain in the diagnosis and management of PCOS due to its significant impact on patient health outcomes.

      Strengths:

      (1) The problem of pain assessment in PCOS patients is well described and the authors provided a clear rationale selection of the retrospective design to investigate this problem.

      (2) A large number of analyzed patient records (76,859,666 women) and their uniformity increases the power of the study. Using the Propensity Score Matching makes it possible to reduce the heterogeneity of the compared cohorts and the influence of comorbid conditions.

      (3) Analysis in different ethnic cohorts provides actual and necessary data regarding the prevalence of pain and its relationship with different health conditions that will be helpful for clinicians to make a diagnosis and manage PCOS in women of different ethnicities.

      (4) Assessment of the risk of different health conditions including PCOS-associated pathology as other common groups of diseases in PCOS women with or without pain allows to differentiate the risk of comorbid conditions depending on the presence of one symptom (pelvic or abdominal pain, dysmenorrhea).

      We would like to thank the Reviewer for their positive feedback on this manuscript. Pain assessment in women with PCOS is of paramount interest and because of a gap in this research area, we are trying to address it.

      Weaknesses:

      (1) Although the paper has strengths in methodology and data analysis, it also has some weaknesses. The lack of a hypothesis doesn't allow us to evaluate the aim and significance of this study.

      We would like to thank the Reviewer for their valuable feedback regarding the hypothesis of this study. We understand that the hypothesis may not have been written clearly under the objectives and we have corrected this in the formal revision.

      The primary hypothesis of this study is that women with PCOS experience a higher prevalence to pain (including dysmenorrhea, abdominal pain and pelvic pain) compared to women without PCOS, and this prevalence varies by racial groups. Our hypothesis aims to explore the relationship between PCOS and pain, the associated health risks, and the potential racial disparities in pain prevalence and long-term health outcomes. Additionally, we seek to assess the effect of treatment on reducing pain symptoms in women with PCOS. This study not only examines the immediate burden of pain but also investigates its long-term consequences, including risks of infertility, obesity, and type 2 diabetes.

      To enhance clarity for readers, we explicitly stated this hypothesis in the revised manuscript and have ensured that its connection to the study’s objectives is clearly articulated. We appreciate the Reviewer’s insights and have incorporated these refinements to strengthen the manuscript.

      (2) The exclusion criteria don't include conditions, that can lead to symptoms similar to PCOS: thyroid diseases, hyperprolactinemia, and congenital adrenal hyperplasia. Thyroid status is not being taken into account in the criteria for matching. All these conditions could occur as on prevalence results as on risk assessment.

      We would like to thank the Reviewer for highlighting the need to include these additional conditions that mimic PCOS. After excluding hypothyroidism, hyperprolactinemia, and adrenal hyperplasia from the PCOS and PCOS and pain cohorts, we observed that 7,690 patients (1.65%) with PCOS and 1,854 patients (1.36%) with PCOS were removed. Based on this observation, we added these three conditions to our exclusion criteria and reran all our analysis for disease for our resubmission. The manuscript, figures, and tables have been updated to reflect these exclusions. Additionally, we have added rationale for excluding these conditions to the Discussion. With these major changes to the analysis, we aim to improve transparency and provide more accurate results and precise interpretations of our findings to the field.

      (3) The significant weakness of the study is the absence of a Latin American cohort. Probably the White cohort includes Latin Americans or others, but the results of the study cannot be extrapolated to particular White ethnicities.

      We appreciate the Reviewer’s suggestion to include Latin American cohorts in this study. The TriNetX platform has both self-reported race and ethnicity demographic information. In Table 3 - Figure Supplement 5 and Table 4 - Figure Supplement 6 we include baseline demographic information for both race (Asian, Black or African American, Native Hawaiian or Other Pacific Islander, Other, White, and Unknown Race) and ethnicity (Not Hispanic or Latino, Unknown, and Hispanic or Latino). In this paper we focused our future health outcome sub-analysis on four self-reported race groups: Asian, Black or African American, Other (Native Hawaiian or Other Pacific Islander, Other, Unknown Race), and White. We agree that including Latin American cohorts in the analysis is essential to better understand the health disparities affecting this population. Future work to better define Latin American cohorts in EHR data would significantly aid our ability to investigate this further.

      (4) The authors didn't provide sufficient rationale for future health outcomes and this list didn't include diseases of the digestive system or disorders of thyroid glands, which can also cause abdominal pain.

      We appreciate the Reviewer comment and concern regarding additional rationale for future health outcomes. We originally chose to investigate general future health outcomes like disease of the digestive system, circulatory system, etc. These disease groups were selected based on being general and having high prevalence as future health outcomes for patients with PCOS and Pain.

      Our initial results highlight the prevalence of disorders of the digestive system (Figure 2). However, after considering the Reviewers comments and to further strengthen our analysis, we included the most prevalent digestive system disorder in our relative risk (RR) analysis. Gastro-esophageal reflux disease (GERD) was identified as the most prevalent future digestive condition for women with PCOS and Pain (13.5%). There was also a 10.5% prevalence in women with PCOS overall.

      We were not able to include the same analysis for thyroid dysfunctions as this condition is a part of our exclusion criterion. These updates have been incorporated into the revised manuscript to ensure clarity and completeness.

      Reviewer #2 (Public review):

      Summary:

      The study offers a thorough analysis of the prevalence of pain in women with polycystic ovary syndrome (PCOS) and its associations with health outcomes across various racial groups. Furthermore, the research investigates the prevalence of PCOS and pain among different racial demographics, as well as the increased risk of developing various conditions in comparison to individuals who have PCOS alone.

      Strengths:

      The study emphasizes pain as a significant comorbidity of PCOS, an area that is critically underexplored in existing literature. The findings regarding the increased prevalence of some of the diseases in the PCOS + pain group provide valuable direction for future research and clinical care. I believe physicians should incorporate pain score assessments into their clinical practice to improve patient's quality of life and raise awareness about pain management. If future research focuses on the mechanisms of pain, it would provide a better understanding of pain and allow for a focus on the underlying causes rather than just symptomatic management. The study also highlights the association between PCOS+pain and various comorbidities, such as obesity, hypertension, and type 2 diabetes, as well as conditions like infertility and ovarian cysts, offering a holistic view of the burden of PCOS.

      We sincerely appreciate the Reviewer’s insightful comments. We hope that our findings will encourage further research on the occurrence of pain in women with PCOS and that others will replicate our results to strengthen the evidence in this area. As noted in our introduction, there are currently no standardized abdominal pain score assessments specifically for women with PCOS. We hope that the findings from this study will contribute to efforts toward developing a standardized pain assessment for the PCOS community. In the meantime, further research across more diverse populations will be essential to build a more comprehensive understanding of this issue.

      Weaknesses:

      Due to the nature of the retrospective study, some data may not be readily available in the system. Instead of simply categorizing participants based on whether they experience pain, it would be more useful to employ a pain scale or questionnaire to better understand the severity and type of patients' pain. This approach would allow for a more thorough analysis of pain improvement following treatment with the three widely used medications for PCOS. Additionally, it would be beneficial for the authors to specify subtypes of the disease rather than generalizing conditions, such as mentioning specific digestive system disorders or mental health disorders. The lack of detailed analysis of specific disorders limits the depth of the findings. This may cause authors to make incorrect conclusions.

      We appreciate the Reviewer for highlighting the importance of categorizing pain levels experienced by women with PCOS.  However, there is currently no standardized pain assessment for abdominal pain, and therefore more research is required before such a classification can be made. Additionally, the electronic health record data we leveraged via the TriNextX platform does not include any pain scale data from unstructured notes. Despite these limitations, this study is an important step toward recognizing abdominal and pelvic pain in women with PCOS. Our findings indicate that women with PCOS report abdominal pain independent of digestive conditions such as irritable bowel syndrome— a condition often associated with pain in this population.

      We would like to thank the Reviewer for their thoughtful comment with respect to subtyping future health outcomes. To get at the most impactful future health outcomes affecting women with PCOS and Pain, we have included the top 5 most prevalent health outcomes associated with PCOS and Pain. Specifically, we included analysis for anxiety disorder, depressive episodes, essential hypertension, Gastro-esophageal reflux disease (GERD), and acute pharyngitis. We observed that 17.1%, 11.5%, 10.5%, 10.0% of patients with PCOS and 20.1%, 13.7%, 13.5%, 13.3% of patients with PCOS and Pain were at risk of developing anxiety, depression, acute pharyngitis, and GERD respectively. For our revision, we have included these 5 conditions in our PCOS, PCOS and Pain and self-reported race-stratified future health outcome relative risk (RR) analyses. The revised manuscript, figures, and tables all reflect these changes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I highly recommend checking all papers and supplements for misprints. There are a lot of missing spaces in the Introduction.

      We would like to thank the Reviewer for bringing this to our attention. We have carefully reviewed the manuscript and all supplementary materials and corrected formatting issues, including missing spaces and typographical errors throughout the Introduction and the rest of the document.

      (2) Supplementary Table 3: numbers from the first line in "%No PCOS" should be in "No PCOS"?

      We thank the Reviewer for bringing this error to our attention. We have identified the source of the problem and values have been added to the appropriate column.

      (3) Why for the matching authors use the categorical data for overweight/obesity and not the entire values? There are different stages of obesity that can be predominant in different cohorts and contribute to the results.

      We would like to thank the Reviewer for their insightful question. While TriNetX does have some BMI values for patient participants, this data is not included for all patients. For example, only 29-30% of women in the PCOS control and case cohorts have BMI recorded. Therefore, we focused on ICD codes for obesity instead to include as much data as possible.

      (4) What criteria were being used to determine hyperlipidemia and obesity? Were these criteria equal for all patients, or did they depend on ethnicity?

      We would like to apologize to the Reviewer for any confusion. The criteria to determine hyperlipidemia and obesity are ICD-10-CM codes as recorded in the TriNetX platform. The ICD-10-CM codes for obesity are E65-E68 and the ICD-10-CM code for hyperlipidemia is E78.5. Please also see the Methods section of this manuscript where all the ICD-10-CM codes are described.

      (5) The section material and methods should provide information regarding quality assurance checks and any steps to eliminate data suspected to be unreliable or invalid, to process missing data, consisting of data or claim duplicates. If quality assurance of data hadn't been conducted, it should have been noticed in the study limitations.

      We thank the Reviewer for this suggestion. We have revised the Methods section to explicitly describe the data quality assurance procedures inherent to the TriNetX platform. Specifically, we clarified that TriNetX applies standardized data mapping to controlled clinical terminologies (ICD, CPT, RxNorm), performs automated quality checks and excludes records that do not meet platform-defined standards.

      (6) It's not clear why the authors didn't include in the analysis the information regarding taking painkillers or anti-inflammatory drugs by patients. Maybe there is no such data in EHR. However, if the patient has some chronic inflammatory or autoimmune disease, she should be prescribed medication. I recommend specifying this issue in the section Material and Methods and/or study limitations.

      We would like to thank the Reviewer for this important suggestion. We have now clarified this point in the limitations section of the discussion. Specifically, we added text explaining that over-the-counter analgesics and anti-inflammatory medications are not reliably captured by EHR or within the TriNetX platform and therefore could not be evaluated in our analysis.

      (7) The authors should provide the Table or complete Supplementary Tables 2 and 3 with the parameters of patients used for matching.

      We apologize to the Reviewer for any confusion. The parameters used for propensity score matching are described fully in the Methods section of the paper. Table 2 – Figure Supplement 5 and Table 3 – Figure supplement 6 display baseline characteristics for patients before and after the 1:1 propensity score matching using these parameters. We have now also added the propensity score matching parameters to the table descriptions to provide fluidity and further clarification.

      (8) The authors found out that women with PCOS and pain have higher RR for ovary cysts and liver diseases compared to women with PCOS who have higher RR for infertility, obesity, and T2D. Discussion includes thoughts regarding a higher risk of ovary cysts and liver disease in women with PCOS and pain, but there is not any suggestion as to why women with PCOS and without pain have a higher risk of infertility, obesity, and T2D. If there is no data explaining this phenomenon, I recommend noting the need for additional research.

      We would like to thank the Revier for this helpful feedback. The Discussion section now includes deeper insights into the pathophysiology behind the two distinct PCOS phenotypes (PCOS overall vs. PCOS and Pain) and their differing risk profiles for future health outcomes.  Specifically, we note that while women with PCOS overall may be more metabolically driven (higher risk of infertility, obesity, and T2D), women with PCOS and Pain show a higher risk of ovarian cysts and liver disease. We clarify that these findings are observational and hypothesis-generating and emphasize the need for future longitudinal and mechanistic studies.

      (9) The authors suggested that systematic contraceptives, metformin, or spironolactone reduce pain in PCOS women. The reduction is significant, but the number of patients with beneficial effects is low (2.5-7.5%). Is it enough to recommend prescribing this medication not only for PCOS treatment but against pain?

      We thank the Reviewer for this important comment. We agree that although the reduction in pain diagnoses following treatment with COCPs, metformin, or spironolactone was statistically significant, the absolute proportion of patients experiencing benefit was modest. Our intention was not to recommend prescribing these medications solely for pain management, but rather to highlight that standard PCOS therapies may have additional benefits in reducing pain symptoms. We have clarified this point in the Discussion to emphasize that these findings are observational and hypothesis-generating, and that prospective studies are needed before these medications can be considered specifically for pain management in PCOS.

      Reviewer #2 (Recommendations for the authors):

      (1) Including a subtype analysis of specific diseases on digestive, respiratory, and mental health diseases rather than generalizing the system will enhance the content.

      We would like to thank the Reviewer for this helpful suggestion. In the revised manuscript, instead of the generalized disease systems we previously reported on, we have included analysis for the top 5 most prevalent conditions. Specifically, we included analysis for anxiety disorder, depressive episodes, essential hypertension, Gastro-esophageal reflux disease (GERD), and acute pharyngitis. We observed that 17.1%, 11.5%, 10.5%, 10.0% of patients with PCOS and 20.1%, 13.7%, 13.5%, 13.3% of patients with PCOS and Pain were at risk of developing anxiety, depression, acute pharyngitis, and GERD respectively.

      (2) Including the prevalence of dysmenorrhea among healthy populations would allow readers to better compare its impact on the lives of individuals with PCOS.

      We would like to apologize to the Reviewer for any confusion. The prevalence of dysmenorrhea for cases and control cohorts can be found in Table 2 – Figure Supplement 5 and Table 3 – Figure Supplement 6 before and after propensity score matching.

      (3) Introducing an analysis of age subgroups will provide readers with a clearer understanding of the prevalence of pain and specific diseases across different age groups.

      We would like to thank the Reviewer for this helpful suggestion. For this revision, we did a sub-analysis to explore the prevalence of PCOS and PCOS and Pain stratified by 10-year age groups. A barplot of these results can be found in Figure 4 - Figure Supplement 7.

      Thank you again to the Reviewers for the positive and constructive feedback for this manuscript. We have made the appropriate edits and changes to the final revisions of the manuscript.

    1. The result is "almost-right code … the most expensive kind of wrong" because reviewers must reconstruct the missing system context for every change

      几乎正确的代码是代价最高昂的错误, 丢失了上下文

    1. CodeRabbit's Dec 2025 analysis found AI co-authored code carried ~1.7× more "major" issues. Treat technical debt and security regressions as default risks, not edge cases.

      技术债和安全性是默认风险

    1. Failing grades soar as professors see greater AI usage, dwindling math skills in UC Berkeley computer science classes
      • Skyrocketing Failure Rates: UC Berkeley is seeing an unprecedented spike in failing grades within introductory computer science (CS) courses. According to data from Berkeleytime, 35.3% of students in CS 10 and 10.6% of students in CS 61A received an "F" in spring 2026. This marks an abrupt jump from spring 2025 and spring 2024, when the failure rate did not exceed 10% for either class.
      • Overreliance on AI for Homework: Faculty members (including professors Dan Garcia, Anant Sahai, and Gireeja Ranade) report that widespread, unchecked use of LLMs and AI tools on out-of-class assignments creates an "illusion of competence." Students use AI to trivially generate solutions or debug code without building actual problem-solving skills, leading to catastrophic failure on heavily weighted, proctored, in-person exams.
      • Severe Gaps in Math Prerequisites: In addition to AI issues, professors note a drastic decline in foundational mathematical skills. Professor Ranade shared that while students are expected to enter advanced courses with a strong grasp of linear algebra, vector calculus, and mathematical proofs, many struggle heavily with basic concepts.
      • The "Open-Internet" Loophole: Prerequisite courses are failing to filter or prepare students properly. Ranade discovered during office hours that some foundational linear algebra classes at UC Berkeley had adopted "open-internet, open-AI" policies for homework and exams, completely subverting the rigorous testing of foundational skills.
      • Implications for the Curriculum: Faculty warn that when students rely on a frictionless tool to bypass the hard parts of learning, they fail to build the cognitive stamina required for high-level computer science and original engineering work.

      Hacker News Discussion

      • The Illusion of Learning: Commenters note that the barrier to getting a solution with AI is now zero. This mimics the feeling of understanding (like watching a step-by-step tutorial), but leaves students entirely incapable when forced to solve problems independently during a real, proctored exam.
      • Widespread Cognitive Decline: A highly upvoted comment pointed out that this isn't just an issue with undergraduates. Even highly qualified professionals and PhDs are exhibiting a noticeable decline in their ability to brainstorm, code, or sit quietly to think deeply for 30 minutes without relying on an LLM to do 90% of the cognitive lifting.
      • Deficiencies in Academic Instruction: Some users argue that AI isn't the sole culprit, shifting blame toward professors who rely on stale, verbatim lecture slides rather than engaging, practical teaching methods. They mention that students naturally turn to tools like NotebookLM, Claude, or ChatGPT because they often provide clearer explanations than condescending or disengaged faculty.
      • The Advantage of Going "No-AI": Some shared anecdotes that students who deliberately avoid AI tools are finding it easier to stand out. In tracks involving heavy writing or class participation, "AI-reliant" students struggle to think dynamically, while independent thinkers produce much less generic, higher-quality work.
      • Grading and Curriculum Debates: There is an active debate on the role of curving grades and weed-out classes. Users emphasize that if prerequisite classes allow open-AI policies on exams, the entire sequential structure of a rigorous engineering degree collapses.
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility, and clarity

      Summary: Edvalson and colleagues use transcriptomics, cell biology and genetics to study variation between segregation distorter (meiotic drive) strains and find several important results. These include apparent suppression of small RNAs mapping to responder (the drive target) in one of the lines, a general pattern of differential expression consistent with the drive mechanism being upstream of sperm individualization (where defects have been seen previously), and genetic confirmation that perturbing Rsp expression can influence the strength of drive.

      Major comments: I found the total RNA sequencing experiment a bit oddly presented. This is partly because it was in the middle of the results (might fit better first), partly because few specific genes were discussed (this might be appropriate given then question, but maybe the question should be more clearly stated), and the complexity of the approach (WCGNA + PANGEA) and how it all fits together. I suggest working to clarify the main points of this section (which are a bit different than the main focus of the Rsp work).

      We thank the reviewer for these important points. We liked the suggestion to swap the order of our results. We attempted the change, but we found that we weren't able to make the flow of the results much better. Instead, we primed the transition from smRNA to totRNA in the last paragraph of the smRNA results (lines 190-196). This paragraph now reads:

      The dearth of Rsp smRNAs in SD-Mad heterozygotes could be due to a disruption in transcription of the locus or subsequent processing steps. Many factors can influence piRNA production. For example, the piRNA pathway can amplify piRNAs independently of transcription, such as the ping pong cycle, (Czech and Hannon 2016). Notably, Rsp piRNAs do not have a strong ping pong signature in testes (Wei et al. 2021; Chen et al. 2021a). To distinguish between a disruption in transcription or some downstream process, we examined total RNA.

      The main reason we elected to describe patterns rather than specific genes is that the 2nd chromosomes we tested (R-16, SD-Mad, SD-5) have all diverged from each other and any single differentially expressed gene could be due to differences in genetic background. Therefore, we elected to point out more broad systematic changes in pathways and correlated gene networks rather than specific genes. We have made it more obvious throughout the total RNA section in the text what our question is regarding the transcriptome and the reasoning for using WGCNA and gene set analysis.

      We also appreciate the reviewers point that the complex approach we used to extract changes in pathways and networks is difficult to follow. We have modified our wording to better describe the flow of analyses.

      We also note that we have extended our analysis for the comparison of SD-Mad and SD-MadRev, which only differ by the Sd-RanGAP locus. Here we do discuss individual genes that are differentially expressed. See below for details about this new analysis.

      Minor comments:

      Abstract - Probably worth mentioning Sd-RanGAP here, even if you are using it as a straw man. I agree that the specific mechanism is not known, but some of the genetics are established.

      This is a good point. While our study doesn't address RanGAP, it is important to point out that, although its role in drive is unclear, Sd-RanGAP is a necessary component of the system. We added the following language to the abstract:

      SD is a multigene complex, frequently associated with chromosomal inversions, where the main driver locus, a truncated duplication of the gene RanGAP kills wild-type sperm containing a satellite DNA called Responder (Rsp).

      Line 80 and elsewhere - it would be helpful to be specific here - you are looking at both small and total RNA

      We've modified our wording throughout the manuscript to specify when we are referring to total RNA and small RNA.

      Fig 1B - is there a reason not to show the values of the replicates here? It would be more transparent.

      We thank the reviewer for this comment. We replaced Fig 1B with a chart that is computed from the DESeq2 normalized counts for each comparison and added replicates to all related graphs.

      Line 139 - does the experimental design control for 1.688 genomic copy number? Where is it located?

      We indeed control for the 1.688 copy number here. Most 1.688 repeats are found on the X chromosome and all flies in our experiments have identical X chromosomes. We changed the text to specify that copy number for 1.688 are the same between conditions.

      144-146 - this could be written clearer, and I think it should only refer to 1C, not 1B. Part of the issue is that there are several repeats not discussed, and it isn't clear what is happening with them. I suggest expanding this description so it is more clear.

      Thank you for this feedback. We have expanded the description to make this section clearer.

      Line 161 - what do you mean (specifically) by "repetitive loci"?

      Repetitive loci in this case refers to transposons, satellite DNAs (except simple satellites), and piRNA clusters. We have added text explaining what is included the grouping of "repetitive loci". We have added the following sentence to the text:

      Our results demonstrate that SD-Mad and SD-5 haplotypes, despite sharing the same main drive locus, have different effects on smRNAs derived from repetitive loci such as complex satellites (including Rsp), transposable elements, and piRNA clusters.

      193-203 - This is an important finding that is somewhat lost in trying to keep track of WCGNA and PANGEA and the different Modules. I suggest clarifying to drive home the point that differential expression appears to start prior to individualization, which suggests and earlier mechanism of drive.

      We thank the reviewer for this feedback. We have added wording to out discussion that points out this finding in lines 501-505 which reads:

      We suspect that the timing of the proximal cause of SD-mediated drive may align with early spermatogenetic processes; perhaps where cell cycle-related genes are active and appear to be broadly differentially expressed (Figure 2B, Module H). This earlier timing is consistent with temperature shift experiments that place the critical period for SD at or before meiosis (Mange 1968).

      Fig 3B & 3C, Fig 4 - same as 1B, is there a reason not to show the actual data points?

      A similar issue was brought up earlier, in response we modified all our figures to show replicate points where applicable.

      Line ~245 - was the same experiment done with SD-5? (as you do below for Rsp overexpression)

      We originally did not include SD5 in this experiment, but we have since measured drive strength of SD5 in a kipfKO background. We found a small but statistically significant difference in drive strength. We added the new SD5 results to the figure and moved the kipfKD data to the supplement along with some added data on a Rsp deletion line generated from Iso1 that bolsters our confidence in the SDMad results.

      Significance

      This is a strong paper that moves the field forward, even if it leaves questions still to be answered (why the difference between drivers? what is the mechanism? how is rsp interacting with drive?

      Several findings move the field forward: the Rsp small RNA results, the differential expression hinting at a molecular mechanism that is upstream of sperm individualization.

      The audience is moderately broad. Genetic conflict is gaining in general interest, but aspects of this will be mostly interesting to the hardcore drive crowd.

      Reviewer #2

      Evidence, reproducibility and clarity

      I have only one request: I found it unclear whether the authors were referring to small RNAs or their precursor (long RNA). By reading the text carefully, I could deduce that Fig1A/Table S2 represent the small RNA sequencing, while FigS3A represents total RNA seq (detecting precursor). However, the labeling in the Fig1A and Table S2 only says 'piRNA cluster' or 'Rsp' (without clarifying 'piRNA from piRNA cluster' or 'piRNA from Rsp'), and it took quite some time for me to understand which Fig/data is smallRNA vs. longRNA.

      This is helpful feedback. We have added more clarity to which type of RNA is being represented in our figures throughout.

      Significance

      This manuscript by Edvalson et al. describes their study on SD (segregation distorter) meiotic drive system, examining the role of piRNA derived from Rsp satellite. Although the exact mechanism of drive is still unknown, this study represents a significant step forward in understanding SD-mediated drive.

      By using two SD alleles (SD-5 and SD-Mad), they show that Rsp-derived piRNA is depleted in SD-Mad. The authors used total RNA sequencing/small RNA sequencing mutants and carefully designed controls (such as deletion of Sd-RanGAP) to reach the model that Rsp-derived piRNA is involved in SD-Mad-mediated drive. The result that kipferl depletion (that lead to sat DNA expression) rescues SD-Mad's drive phenotype is very interesting. This supports that the decreased Rsp piRNA indeed corresponds to SD-Mad-mediated drive. They further back up this idea by overexpressing Rsp.

      Interestingly, SD-5 was not impacted by changes in Rsp expression. Based on this result, the authors state that there are mechanistic variations in the same (SD) drive system. This statement is certainly justified by the data, but I cannot help wondering there might be a unifying mechanism that explains both SD-5 and SD-Mad. I am not suggesting to edit the manuscript or add the discussion: but do they have any speculations on this? For example, SD-5 is simply epistatic to Rsp piRNA production? For example, SD-RanGAP > SD-Mad (some gene on SD-Mad inversion) > Rsp piRNA production > SD-5 > sperm killing?

      We thank the reviewer for this insight. We indeed think that the proximal cause of sperm dysfunction could be the same, but there are components of SD5 that act downstream of Rsp piRNAs. The small difference in drive strength in the SD5 KipfKO experiments might support this hypothesis, although it is also possible instead that drive is influenced by changes in some other piRNAs (from the piRNA clusters or satellites).

      We modified our wording in the first paragraph of the discussion to point out this possibility. Lines 367-370 now reads

      These results suggest that, while SD chromosomes share a target and main drive locus (Sd-RanGAP), the modifiers accumulated on each haplotype may influence the drive mechanisms, either by creating new pathways to drive or acting as tuning knobs on drive strength.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary

      In the presented manuscript Edvalson and Wei et al use Drosophila genetics and NGS experiments to investigate the mechanism of meiotic drive through the Segregation Distorter (SD) system. They reveal that two driving haplotypes seem to function via different mechanisms, with drive through SD-Mad but not SD-5 involving small RNAs produced from the Responder (Rsp) satellite, the target of SD drive. SD-Mad testes displaying drive are characterized by lower levels of Rsp sRNAs compared to non-drive controls as well as SD-5, and the ectopic overexpression of Rsp sRNAs through two distinct mechanisms decrease drive in SD-Mad genetic background, specifically. With this work, the authors are adding an important piece of information to the highly complex SD system, indicating that sperm killing is likely achieved by different mechanisms in different SD haplotypes, despite sharing a common driver.

      Major comments

      Fig1C: It might be interesting to show the fold change between SD-Mad and SD-MadRev in addition to what is displayed. Moreover, can the authors comment on what might be causing the increased smRNA counts for 38C2? Is this because R16 has particularly low 38C2 values?

      We appreciate the reviewer's comment concerning the fold change between SD-Mad and SD-MadRev. We have made a figure showing the difference between and put it in Figure S1.

      We suspect that the expression difference in 38C2 between the R16 heterozygotes and SD heterozygotes may be due to genetic divergence, since these are different 2nd chromosomes. We have added language pointing this out to the manuscript in line 182. The paper now reads:

      *There is no evidence that either 38C2 or Flamenco are involved in SD-mediated drive. *

      Fig1/S1: Could the authors also display the Rsp smRNA counts for all Gla crosses similar to panel 1B? What is the interpretation for the increase in Rsp smRNAs in SD-5/Gla relative to R16/Gla but the lack of such an increase in the SD-5/iso1 vs R16/iso1 comparison? Do SD-Mad and SD-5 induce the same strength of drive against each of the two wildtype chromosomes? Experiments: smRNAseq for SD-MadRev/Gla.

      We have added a plot to Fig S1 to show the abundance of Rsp small RNAs in the Gla background, similar to Figure 1.

      It is difficult to interpret the apparent overabundance of Rsp small RNAs in the SD-5/Gla background. Because differences in Rsp smRNA abundance for SD-5 are inconsistent between the Iso1 and Gla background, our interpretation is that SD-5 is not manipulating Rsp levels. The apparent overabundance of Rsp in the Gla background could be due to an epistatic interaction between Rsp and other components of that particular background. Consistent with this interpretation, the SD-Mad induced reduction of Rsp smRNAs in the Gla background is less dramatic than in the Iso1 suggesting that something about that background is increasing Rsp expression slightly when paired with an SD chromosome.

      Fig1: The authors note changes in smRNA levels for other satellites as well as piRNA clusters but do not give any interpretation to this observation. Are they meaningful? Should they be attributed to genetic background?

      Our interpretation of the observation that some satellites or piRNA clusters are differentially expressed is that these differences are likely due to epistatic effects from the different 2nd chromosomes used in the study or are incidental to mechanism of SD.

      FigS2: Same question also for the deregulated TEs: do they share sequence features with Rsp or are they overrepresented in the clusters that change? Are these explained by differences in insertions between genotypes? Do their total RNAseq values change in any way? What do the percentages in line 162 correspond to? Number of TEs that are deregulated? At which cutoff? It might be informative to compare the data to a cross between driver and R16, or even better the SD-MadRev control. Experiments: totRNAseq for SD-MadRev crosses and optionally crosses to R16.

      The Rsp repeat unit does not share significant homology to portions of the genome outside of the pericentromere of 2R with the exception of ~6-12 copies in the intron of Ago3.

      As far as TEs are concerned, we surprisingly don't see a strong correlation between piRNA cluster content, dysregulation, and TE transcript abundance. For example, in the SD/Gla backgrounds the total RNA for R1, R2, IGS, and Tc1-Mariner family TEs is down regulated. However, the only major piRNA cluster that is upregulated in both SD/Gla backgrounds (80F) is not enriched for TE fragments matching any of those 4 families. One thing we can note is that the definition of the major piRNA clusters are given in relation to the Iso1 genome which may differ from that of our experimental backgrounds. Without long read resolved genomes for our specific experimental lines generated at the same time as the RNA samples it is difficult to determine how expression at the major piRNA clusters and the corresponding TEs are related. We have described this lack of a correlation in lines 210-217 in the text along with our interpretation for why this could be. The paper now reads:

      On the other hand, we did find some differences in repetitive elements related to rDNA (R1, R2, and IGS) and Tc1-Mariner family TEs (all backgrounds; Figure S6). Interestingly, there was no correlation between the expression of TEs and the expression of piRNA clusters that contain fragments of these TEs in the total RNA, nor was there any correlation between the small RNAs from piRNA clusters and the total RNAs for those TEs. PiRNA clusters are usually defined in one isolate of Iso1: rapid turnover of TEs and piRNA sources could explain why we do not see a correlation between piRNA cluster expression and TE expression in our backgrounds.

      We investigated differences in TE and piRNA cluster expression in our SD-Mad/Iso1 vs SD-MadRev/Iso1 comparison, but a lack of power due to inter-sample variation prevents us from confidently making any assessments on any TEs or piRNA clusters in that comparison. We did however generate additional gene level transcriptomic data using 3' Digital Gene Expression to bolster our confidence in the totRNA data and found some interesting genes that were in the top most differentially expressed. We have noted those genes in lines 276-287 which read:

      To identify genes that might interact to cause drive, we compared the gene expression of SD-Mad/Iso1 to SD-MadRev/Iso1. These genotypes only differ by the presence of the main drive locus, Sd-RanGAP. We performed both totRNA and 3' Digital Gene Expression (DGE) RNA sequencing and examined the overlap in differential expression between the totRNA and DGE sequencing. There are 69 differentially expressed genes where the DGE comparison is significant (PDGE {less than or equal to} 0.01), and the sign of the Log2FC of the totRNA matches that of the DGE. Among this set of differentially expressed genes, 57 show at least a 50% difference in gene expression (absolute Log2FC value of at least 0.58 in DGE). These genes are not enriched in any Reactome gene sets. The top 20 most differentially expressed genes consists of 9 lncRNAs (3 anti- sense RNAs) and 11 protein coding genes: 8 of which are uncharacterized. The 3 characterized genes are Artemis (Arts), Gr61a, and Tono (Figure S98, Supplemental File 1).

      We discuss two of these genes in further detail in the discussion in lines 476-486 which read:

      First, Tono, a BTB zinc finger-containing transcription factor is upregulated (Log2FCDGE = 1.7) in all SD-Mad comparisons. Tono plays a role in regulating transcription in muscle cells in response to mechanical pressure (Zhang et al. 2024) but also shows enrichment in male germ cells (Li et al. 2022). The putative DNA-binding capacity and ability to form nuclear condensates (Zhang et al. 2024) makes this an interesting candidate gene for interacting with the Rsp satellite. Second, the importin-4 ortholog, Artemis (Arts), which facilitates Ran-mediated import of H3 and H4 is overexpressed in SD-Mad (Log2FCDGE = 2.5). Interestingly, Arts expression is antagonistic to male fertility (VanKuren and Long 2018). Also of note, Apollo, a duplicate of Arts which supports male fertility (VanKuren and Long 2018) is downregulated (Log2FCDGE = -0.6) though it is not in the top-most differentially expressed genes.

      Figure S3: Am I reading the PCA plots right in that there are very few gene expression changes when the drivers are in iso1 background but much more in the Gla background? Comment on possible explanations for that. Please indicate the number of significantly changed genes in each comparison. Again, are these changes correlated between the two drivers or can they be attributed to genetic background of Gla vs R16? Would it be interesting to see how SD-Mad/Gla and SD-5/Gla gene expression profiles compare? Experiment: totRNAseq for SD-MadRev crosses.

      There did tend to be more differences in the Gla background compared to Iso1. This difference can best be explained by inter-sample variation in the SD-Mad/Iso1 background which we see in the PCA plot in Fig S4A. Another reason for the difference could be that the Gla and Iso1 chromosomes are very different from each other which prevents us from making any 1-to-1 comparisons between the SD/Iso1 and SD/Gla backgrounds. We generally avoid comparing between genetic backgrounds for this reason unless they share differences as these are more likely related to drive.

      In Figure S5A it seems that totalRNA levels of Rsp are strongly increased in SD-Mad/Gla but not in SD-Mad/iso1. The iso comparison (less piRNAs but same transcript) could indicate that it is actually transcription of the Rsp that is affected here. This is even pointed out in line 205 without discussion of the fact that the Gla comparison (less piRNAs but more transcript) would rather indicate that transcription is intact, but processing into piRNAs is defective. Could this be clarified using FISH as in Figure S8? If true, SD-Mad/Gla should have much more FISH signal than SD-Mad/iso1. Either way, this discrepancy should be further discussed. Experiments: comprehensive smFISH panel for all crosses (including SD-MadRev).

      The reviewer makes an excellent point. Why would Rsp long RNAs be overexpressed in the SD-Mad/Gla background? Earlier we noted that in the Gla background specifically the genotypes that contain an SD chromosome seem to have a higher level of Rsp small RNAs than we might expect given our Iso1 results. We conclude that this is likely due to an epistatic interaction between the 2nd chromosomes used in the study and the rest of the chromosomes. This interpretation could extend to the long noncoding precursors as well.

      Further, although the difference between SD-Mad/Gla is significant and SD-5/Gla is not, they do move in the same direction. This is also true in the Iso1 backgrounds but in the opposite direction. Given an interpretation that Rsp expression is higher than expected in the SD/Gla background due to epistatic effects, it becomes clearer that changes in long RNA abundance are related to changes in small RNA abundance though not perfectly indicative. However, due to lower count levels for Rsp in the totRNA, we do not have the power to confidently draw that conclusion.

      In general, the totRNA profiles of repeats don't seem to correlate well between the genotypes (iso vs Gla crosses, neither for SD-5 nor for SD-Mad). Is this because values are in general small and/or replicates don't correlate? Should these data even be considered? Also panels 2A and S5C are very different from each other. The additional comparison with the SD-MadRev allele crossed into both Iso1 and Gla should give additional insight. Experiment: totRNAseq for SD-MadRev crosses.

      The reviewer brings up a good point. While some repetitive elements had relatively small counts in the totRNA (like Rsp) most had adequately high counts. But these differences are to some degree expected. Although the other chromosomes are controlled for, the second chromosomes are different by design including the two SD haplotypes. In this context, similarities between the two haplotypes may be helpful in determining some unifying aspects of the SD mechanism but differences could be incidental to the genotype and not necessarily related to SD.

      It may be generally informative to set the sRNA and RNA comparisons into perspective, for example by including the comparison of SD-Mad crosses versus SD-MadRev crosses to exclude unrelated genetic background components as much as possible.

      The reviewer is correct here. Differences in the transcriptomes of SD-Mad and our revertant are much more likely due to the drive phenotype. Due to variation between SD-Mad total RNAseq replicates, we have substantially less power when comparing SD-Mad/Iso1 to SD-MadRev/Iso1. We therefore generated new data to address this point: we did digital gene expression for three biological replicates of SD-Mad/Iso-1 and SD-MadRev/Iso1. We described the results of this new analysis above.

      FigS6: I assume this is given, but as it is not specified: is the directionality of differential expression taken into account here? Or could it be significantly up in one and down in the other? Please specify / adjust color scale to allow this distinction.

      This is a good point. We have modified the figure to not only indicate significance but also direction and magnitude.

      FigS8: Please add a scale bar for all images. 1.688 is labeled as 359 in the legend, please unify or/and explain nomenclature. Consider adding a nuclear outline based on DAPI. It looks like 1.688 is actually more different between control and SD-Mad/Iso than Rsp. Could the authors comment on this? In the text the authors mention that these experiments were done for both SD-Mad and SD-5 heterozygotes, but only the SD-Mad data are shown.

      The most abundant component of 1.688 repeats is the 359bp repeat, which is used as a proxy for 1.688 and our 359-bp probe cross hybridizes with other abundant variants of 1.688 on chromosome 3. We agree, there does seem to be some differences in the 1.688 RNA FISH, however we do not yet have evidence that 1.688 is related to the drive phenotype. We have expanded that figure (now supplemental figure 7) with multiple images for each genotype to demonstrate the lack of change in Rsp and 1.688 localization. We have added an explanation of the nomenclature.

      The reference to SD-5 in the text was made in error. We do not have RNA FISH images of SD-5/Iso1 heterozygotes. We've modified the text to reflect this.

      FigS9B: What does the y-axis label mean? Fold change relative to what? Is this not displaying counts?

      This is a good catch by the reviewer. The y-axis is mislabeled and should read "TPM". We have made this change.

      To set the KipfKD/KO data in context, please give also the k value for SD-MadRev and compare the smRNA values in this context to the data displayed in F1B. Experiment: drive analysis for SD-MadRev.

      Our basis for concluding that Rsp smRNA overexpression may reduce drive strength is in demonstrating that kipfKO is sufficient to rescue wild type sperm in driving backgrounds. We did not introduce KipKD (or KO) to the SD-MadRev background because this chromosome does not drive.

      The note that the 3XP3-dsRed cassette needs to be flipped out for Rsp overexpression to influence drive is interesting. It would be great if the authors could show a more detailed scheme of the structure of this insertion including the directionality of the promoter relative to the Rsp fragment and the rest of cluster 38C (including dm6 coordinates perhaps). Small RNA sequencing compared to totRNA sequencing should reveal if the transcription or the processing into piRNAs of the inserted piece is affected, and if more of the 38C piRNAs are affected. Genic transcription has been previously observed to limit Rhino-dependent piRNA production from piRNA clusters (Andersen et al 2017). It might be of interest to the general piRNA community to see how cluster output is influenced through the integration of an internal genic promoter.

      We agree that this is an interesting result. We have added more detail to Fig 4A to indicate directionality and genomic location of the insert in terms of dm6.

      Figure panel 4A should be adjusted to include annotations of the black boxes and to give genomic locations. It is unclear what the blue brackets mean, and where exactly the insertion took place. Are the attP sites relevant for the experiments? It might be nice to see a piRNA profile over the locus, to put the levels of additional Rsp piRNAs into perspective.

      We have removed the black boxes from the schematic as they were only there as an aesthetic choice. We have indicated where exactly the insertion was made. The attP sites are there for future experimental flexibility.

      Minor comments

      Figure 3B: fold change of satellite RNA is shown. It might be obvious that the fold change relates to KipfKO / WT but this should be stated explicitly. What is the genetic background here?

      Thank you for the comment. We added information on the genetic background in the figure.

      Figure legends should be extended for clarity throughout the manuscript in main and supplementary figures. All color codes and abbreviations as well as samples / genotypes and assay used should be clearly explained. Few examples include: F1B: smRNA or totalRNA? F3B: fold change relative to what? F4B: what are these data relative to? F4C: smRNA or totalRNA? S2: Is this smRNAseq? Further description of the color code in the volcano panels would be desirable. FS3: typo in A-B should be A-D. Fold changes relative to what. Etc.

      Thank you for these helpful suggestions. We have edited the figure legends as suggested to improve the clarity. We appreciate the feedback.

      The abbreviation for Kipferl is kipf, not kip.

      Thank you for pointing this out, we have made the corrections.

      I don't understand the sentence on lines 310-312.

      We agree that sentence was confusing. We replaced it with:

      "Identifying potential proteins that interact with Rsp may therefore provide important clues about why satellites like Rsp are targets of drive."

      **Referee cross-commenting**

      I agree with the other reviewer's assessments

      Reviewer #3 (Significance (Required)):

      General assessment

      This study of a highly complex and poorly understood drive system adds a very interesting piece to the puzzle of understanding the interplay between a RanGAP duplication and a large satellite array. It's strengths lay in the use of genetics tricks to modify drive (SD-MadRev allele, KipfKO, Rsp cluster insertion). The main weakness of the study is the relatively low correlation of several observations between drive crosses to the Iso1 and Gla lines and lack of explanations thereof. Neither gene nor repeat expression seem to give a convincing overlap in any direction.

      Furthermore, it is interesting that SD-Mad and SD-5 have such different dependencies on Rsp sRNA. While outside the scope of this work, it would be very interesting to see how other drive haplotypes behave: is SD-5 the exception or is it SD-Mad (as the authors have also wondered in the discussion). Such additional comparisons may clarify also the discrepancies in RNAseq.

      Advance

      While it has been previously shown by the same group that Rsp satellites give rise to smRNAs through the piRNA pathway, it is to my knowledge unclear how and if these smRNAs influence drive. This study thus presents a conceptual advance in that it demonstrates that the role of Rsp smRNAs is not shared among driving haplotypes.

      Audience

      This study is relevant for a highly specialized audience interested in meiotic drive. It contributes to the understanding of the SD system and may serve as a basis for future research in this area. In addition, results reported in Figure 4 may be of peripheral interest for the Drosophila piRNA community for technical interests.

      This reviewers expertise: Drosophila, piRNA pathway, heterochromatin, sRNA

      This reviewers limitations: nuclear-cytoplasmic trafficking, cytoskeleton

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      In the presented manuscript Edvalson and Wei et al use Drosophila genetics and NGS experiments to investigate the mechanism of meiotic drive through the Segregation Distorter (SD) system. They reveal that two driving haplotypes seem to function via different mechanisms, with drive through SD-Mad but not SD-5 involving small RNAs produced from the Responder (Rsp) satellite, the target of SD drive. SD-Mad testes displaying drive are characterized by lower levels of Rsp sRNAs compared to non-drive controls as well as SD-5, and the ectopic overexpression of Rsp sRNAs through two distinct mechanisms decrease drive in SD-Mad genetic background, specifically. With this work, the authors are adding an important piece of information to the highly complex SD system, indicating that sperm killing is likely achieved by different mechanisms in different SD haplotypes, despite sharing a common driver.

      Major comments

      Fig1C: It might be interesting to show the fold change between SD-Mad and SD-MadRev in addition to what is displayed. Moreover, can the authors comment on what might be causing the increased smRNA counts for 38C2? Is this because R16 has particularly low 38C2 values?

      Fig1/S1: Could the authors also display the Rsp smRNA counts for all Gla crosses similar to panel 1B? What is the interpretation for the increase in Rsp smRNAs in SD-5/Gla relative to R16/Gla but the lack of such an increase in the SD-5/iso1 vs R16/iso1 comparison? Do SD-Mad and SD-5 induce the same strength of drive against each of the two wildtype chromosomes? Experiments: smRNAseq for SD-MadRev/Gla.

      Fig1: The authors note changes in smRNA levels for other satellites as well as piRNA clusters but do not give any interpretation to this observation. Are they meaningful? Should they be attributed to genetic background?

      FigS2: Same question also for the deregulated TEs: do they share sequence features with Rsp or are they overrepresented in the clusters that change? Are these explained by differences in insertions between genotypes? Do their total RNAseq values change in any way? What do the percentages in line 162 correspond to? Number of TEs that are deregulated? At which cutoff? It might be informative to compare the data to a cross between driver and R16, or even better the SD-MadRev control. Experiments: totRNAseq for SD-MadRev crosses and optionally crosses to R16.

      Figure S3: Am I reading the PCA plots right in that there are very few gene expression changes when the drivers are in iso1 background but much more in the Gla background? Comment on possible explanations for that. Please indicate the number of significantly changed genes in each comparison. Again, are these changes correlated between the two drivers or can they be attributed to genetic background of Gla vs R16? Would it be interesting to see how SD-Mad/Gla and SD-5/Gla gene expression profiles compare? Experiment: totRNAseq for SD-MadRev crosses.

      In Figure S5A it seems that totalRNA levels of Rsp are strongly increased in SD-Mad/Gla but not in SD-Mad/iso1. The iso comparison (less piRNAs but same transcript) could indicate that it is actually transcription of the Rsp that is affected here. This is even pointed out in line 205 without discussion of the fact that the Gla comparison (less piRNAs but more transcript) would rather indicate that transcription is intact, but processing into piRNAs is defective. Could this be clarified using FISH as in Figure S8? If true, SD-Mad/Gla should have much more FISH signal than SD-Mad/iso1. Either way, this discrepancy should be further discussed. Experiments: comprehensive smFISH panel for all crosses (including SD-MadRev).

      In general, the totRNA profiles of repeats don't seem to correlate well between the genotypes (iso vs Gla crosses, neither for SD-5 nor for SD-Mad). Is this because values are in general small and/or replicates don't correlate? Should these data even be considered? Also panels 2A and S5C are very different from each other. The additional comparison with the SD-MadRev allele crossed into both Iso1 and Gla should give additional insight. Experiment: totRNAseq for SD-MadRev crosses.

      It may be generally informative to set the sRNA and RNA comparisons into perspective, for example by including the comparison of SD-Mad crosses versus SD-MadRev crosses to exclude unrelated genetic background components as much as possible.

      FigS6: I assume this is given, but as it is not specified: is the directionality of differential expression taken into account here? Or could it be significantly up in one and down in the other? Please specify / adjust color scale to allow this distinction.

      FigS8: Please add a scale bar for all images. 1.688 is labeled as 359 in the legend, please unify or/and explain nomenclature. Consider adding a nuclear outline based on DAPI. It looks like 1.688 is actually more different between control and SD-Mad/Iso than Rsp. Could the authors comment on this? In the text the authors mention that these experiments were done for both SD-Mad and SD-5 heterozygotes, but only the SD-Mad data are shown.

      FigS9B: What does the y-axis label mean? Fold change relative to what? Is this not displaying counts?

      To set the KipfKD/KO data in context, please give also the k value for SD-MadRev and compare the smRNA values in this context to the data displayed in F1B. Experiment: drive analysis for SD-MadRev.

      The note that the 3XP3-dsRed cassette needs to be flipped out for Rsp overexpression to influence drive is interesting. It would be great if the authors could show a more detailed scheme of the structure of this insertion including the directionality of the promoter relative to the Rsp fragment and the rest of cluster 38C (including dm6 coordinates perhaps). Small RNA sequencing compared to totRNA sequencing should reveal if the transcription or the processing into piRNAs of the inserted piece is affected, and if more of the 38C piRNAs are affected. Genic transcription has been previously observed to limit Rhino-dependent piRNA production from piRNA clusters (Andersen et al 2017). It might be of interest to the general piRNA community to see how cluster output is influenced through the integration of an internal genic promoter.

      Figure panel 4A should be adjusted to include annotations of the black boxes and to give genomic locations. It is unclear what the blue brackets mean, and where exactly the insertion took place. Are the attP sites relevant for the experiments? It might be nice to see a piRNA profile over the locus, to put the levels of additional Rsp piRNAs into perspective.

      Minor comments

      Figure 3B: fold change of satellite RNA is shown. It might be obvious that the fold change relates to KipfKO / WT but this should be stated explicitly. What is the genetic background here?

      Figure legends should be extended for clarity throughout the manuscript in main and supplementary figures. All color codes and abbreviations as well as samples / genotypes and assay used should be clearly explained. Few examples include: F1B: smRNA or totalRNA? F3B: fold change relative to what? F4B: what are these data relative to? F4C: smRNA or totalRNA? S2: Is this smRNAseq? Further description of the color code in the volcano panels would be desirable. FS3: typo in A-B should be A-D. Fold changes relative to what. Etc.

      The abbreviation for Kipferl is kipf, not kip.

      I don't understand the sentence on lines 310-312.

      Referee cross-commenting

      I agree with the other reviewer's assessments

      Significance

      General assessment

      This study of a highly complex and poorly understood drive system adds a very interesting piece to the puzzle of understanding the interplay between a RanGAP duplication and a large satellite array. It's strengths lay in the use of genetics tricks to modify drive (SD-MadRev allele, KipfKO, Rsp cluster insertion). The main weakness of the study is the relatively low correlation of several observations between drive crosses to the Iso1 and Gla lines and lack of explanations thereof. Neither gene nor repeat expression seem to give a convincing overlap in any direction.

      Furthermore, it is interesting that SD-Mad and SD-5 have such different dependencies on Rsp sRNA. While outside the scope of this work, it would be very interesting to see how other drive haplotypes behave: is SD-5 the exception or is it SD-Mad (as the authors have also wondered in the discussion). Such additional comparisons may clarify also the discrepancies in RNAseq.

      Advance

      While it has been previously shown by the same group that Rsp satellites give rise to smRNAs through the piRNA pathway, it is to my knowledge unclear how and if these smRNAs influence drive. This study thus presents a conceptual advance in that it demonstrates that the role of Rsp smRNAs is not shared among driving haplotypes.

      Audience

      This study is relevant for a highly specialized audience interested in meiotic drive. It contributes to the understanding of the SD system and may serve as a basis for future research in this area. In addition, results reported in Figure 4 may be of peripheral interest for the Drosophila piRNA community for technical interests.

      This reviewers expertise: Drosophila, piRNA pathway, heterochromatin, sRNA

      This reviewers limitations: nuclear-cytoplasmic trafficking, cytoskeleton

    1. Whether extreme spend pays off comes down to the ultimate business value of shipped code (e.g. revenue), which most companies still can't measure.

      大多数人认为增加AI投入会直接转化为业务价值和收入,但作者指出大多数公司实际上无法衡量AI投入与业务价值之间的直接联系。这与AI投资决策的主流逻辑相悖,质疑了当前AI支出模式的合理性。

    1. Hello, really nice work! The channel-adaptive variant is a neat way to get cross-dataset generalization. One thing I was curious about is how many (and which) landmark channels actually matter for prediction accuracy?

      Figure S11 shows that it works qualitatively with a single channel, and OpenCell is quantitative but changes both channel count and imaging domain at once, so I couldn't tell how much is lost by dropping channels alone. Seems like it should cost something real given that each channel carries independent info, and the model only sees cell identity/state through landmark morphology, so fewer channels means less to condition on. The Vermeer-XL CA vs. fixed gap in Tables S2–S4 hints at this too. A quick within-HPA ablation (nucleus only vs. + microtubule vs. + ER, same metrics) would isolate it and tell people how much fidelity they give up when they've only got, say, a Hoechst stain. Thanks btw for sharing the code and weights!

    1. Beyond just being able to write code yourself, you can think more like a programmer in many other domains of life. What things do you repeat often in your work which could be automated? What ambiguous process could you convert into a foolproof set of steps?

      Runbook writing!

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important study deepens our understanding of how populations of a given species may diverge in their molecular and physiological patterns as a result of adaptation to different thermal regimes. By approaching this question from multiple directions, the authors provide solid evidence for adaptive changes in three strains of the diamondback moth after only three years of experimental evolution, and support the causal involvement of the PxSODC gene in thermal adaptation to both cold and hot temperatures. This work would benefit from more sophisticated phylogenetic analyses, better statistical support, and a more detailed discussion of the differences in the three strains at the pathway level.

      We sincerely thank the editors for this positive and constructive assessment. In the revised manuscript, we have addressed the highlighted points by: (1) re-inferring the phylogenetic tree of the PxSODC gene using a model-based Maximum Likelihood method (IQ-TREE) to ensure a robust evolutionary analysis; (2) substantially expanding the description of our statistical methods across all data types to ensure reproducibility and clarify multiple-testing corrections; and (3) adding a more detailed discussion of the pathway-level differences between the hot and cold strains, particularly integrating how their distinct transcriptomic responses align with their shared metabolic adjustments and phenotypic traits.

      Reviewer #1 (Public review):

      (1) The authors identify pathways that are enriched in different strain comparisons (Figure 3E), but do not provide a detailed interpretation of these results. It would be great if the authors could explain in more detail how the physiological processes of a cold-adapted strain of this species may differ from those of a warmer-adapted strain.

      We agree. We have addressed this by directly integrating our pathway enrichment results (Figure 3E) with the observed life-history phenotypes (concurrently addressing Reviewer 2's Comment 36a). We expanded the Discussion to explain that while both strains share convergent adjustments in core pathways (e.g., lipid metabolism for energy reallocation), their specific physiological strategies differ. The cold-adapted strain relies on broader transcriptional reprogramming to maintain homeostasis and support extended longevity/cold hardiness, whereas the hot-adapted strain utilizes broader metabolic rewiring to actively fuel its accelerated development and higher fecundity.

      (2) The authors reconstruct a phylogenetic tree of the PxSODC gene using the neighbor-joining algorithm. The limitations of this algorithm have been known for many years now, especially for sequences separated by long evolutionary distances. According to Wang et al. (2016), the last common ancestor of the species shown in Figure S4C occurred 392-350 million years ago. Given this, I would strongly recommend that the authors infer a phylogenetic tree using model-based methods, such as those implemented in RAxML-NG or IQ-TREE. Also, in the absence of a valid outgroup sequence, I would show the gene tree as unrooted or rooted based on the corresponding species tree.

      Agree. We have re-inferred the phylogenetic tree of the PxSODC gene using the model-based Maximum Likelihood (ML) method implemented in IQ-TREE. As recommended, in the absence of a valid outgroup sequence, the revised tree is now presented as unrooted. Supplemental Figure S4C (Figure 5-figure supplement 1C) and the corresponding text in the manuscript have been updated.

      (3) There is a key piece of the puzzle that is currently missing: the structural mechanism behind the mutational effects described in this study (e.g., Figure 5). The authors could leverage AlphaFold to generate structural models of different mutants and conduct molecular dynamics simulations to examine their conformational dynamics.

      We thank the reviewer for this excellent suggestion. We generated AlphaFold structural models of the wild-type (WT) and mutant (MU) PxSODC proteins and conducted 100 ns molecular dynamics (MD) simulations using GROMACS 2022.3 at three physiologically relevant temperatures: 15°C (cold stress), 26°C (favorable baseline), and 32°C (heat stress). Using 26°C as the physiological baseline, three key structural parameters support enhanced thermostability of the mutant protein (Figure 5–figure supplement 3). First, RMSD analysis revealed that under heat stress (32°C), the WT underwent severe conformational drift (RMSD increased from the 26°C baseline of 1.62 to 2.49, an increase of 0.87), while MU remained remarkably stable (from 1.59 to 1.66, an increase of only 0.07). Second, MU possessed a significantly more compact structure, with lower SASA values at 15°C (118.39 vs. 127.29 nm²) and 26°C (113.82 vs. 125.61 nm²), indicating optimized hydrophobic core packing. Third, the intramolecular hydrogen bond network of MU demonstrated dual stress resistance: under cold stress, MU actively increased hydrogen bonds from its baseline (113→119), whereas WT lost bonds (117→112); under heat stress, MU fully maintained its bond count (113→113). These results provide a direct structural mechanism for the enhanced catalytic efficiency of the mutant SOD at lower expression levels.

      Reviewer #1 (Recommendations for the authors):

      (4) The experimental evolution component of this study is described in the text as lasting for three years. It would help if the number of generations per strain were also reported.

      We have added the number of generations per strain. Over the three-year period, the hot strain completed ~75 generations and the cold strain ~15 generations. The ancestral strain was continuously maintained at 26°C throughout this period. The revised text has been updated in both the Introduction and Materials and Methods.

      (5) In Figure 3B: There is a typo in the word “Statistics”.

      Corrected. The typo in “Statistics” in Figure 3B has been fixed.

      (6) In Figure 3D: “CS” appears twice.

      Corrected. The duplicated “CS” label in Figure 3D has been replaced with the correct label.

      (7) Figure 4: This is not accessible to colorblind readers, who will clearly not be able to tell each color apart. As a non-colorblind person, I, too, have trouble figuring out which color label in panel B corresponds to which color in panel A. For example, I do not know off the top of my head how 'blue' differs from 'midnightblue', 'royalblue', or 'skyblue'. I recommend that the authors replace colors with identifiers, such as 'g1' for group 1 and so on.

      We appreciate this suggestion. We have replaced all color-based module labels with alphanumeric identifiers (M1, M2, M3, etc.) and added a corresponding legend. The main text and supplementary materials have been updated accordingly.

      (8) Lines 246-247: "Its secondary structure mainly consisted of strands, helices and coils." This sentence is redundant. These three are the only possible secondary structural elements, according to most bioinformatics tools such as PSIPRED, which the authors used. This sentence would be more useful if the authors could report the percentage breakdown of each secondary structural element.

      We have removed the redundant sentence and updated the text to report the specific percentage breakdown of the secondary structural elements based on our PSIPRED predictions (approximately 55.24% random coils, 16.19% alpha helices, and 28.57% extended strands). The revised text has been updated in the Results section.

      (9) Lines 260-261: "This suggests that the PxSODC gene can alter its expression pattern and function in response to environmental change...". I find this sentence a bit imprecise. Would it not be more precise to mention that the expression of this gene is regulated by temperature triggers?

      We agree that the original phrasing was imprecise. We have revised the sentence in the manuscript to state: “This suggests that the expression of the PxSODC gene is regulated by temperature triggers, and its altered function contributes to temperature-adaptive evolution in P. xylostella.”

      (10) The data points in Figures S1 and S7 are very small and hard to tell apart without zooming in a lot. Perhaps the authors could change the orientation of those pages to landscape and increase the size of the figures.

      Done. We have changed the orientation of Supplemental Figures S1 (Figure 1-figure supplement 1) and S7 (Figure 5-figure supplement 4) to landscape and increased the size of the figures and individual data points to improve visibility.

      (11) In Figure S2, the panel labeled as 'C' should be 'B' (based on the caption) and vice versa.

      Corrected. The panel labels ‘B’ and ‘C’ in Supplemental Figure S2 (Figure 2-figure supplement 1) have been swapped. The Supplementary Materials have been updated accordingly.

      Reviewer #2 (Public review):

      (1) The paper in its current form is hard to digest and would benefit from improved clarification of the storyline, as well as a tighter integration between the phenotypic, omics, and functional validation data. Currently, it is not always clear what the relevance is of all the reported results, nor why certain decisions were made, or how all the different methods the authors used fit together. For example, the authors functionally validated a second gene, PxDnmt1, but it is unclear why this particular gene was chosen, nor how it relates to their selection regimes when looking at the results obtained with the phenotyping and omics data collection. Seeing how much work the authors did, this makes the paper overwhelming and difficult to read.

      We sincerely appreciate this constructive feedback. In the revised manuscript, we have made significant structural revisions to improve the storyline and logical flow. We have streamlined the Results section (moving extensive descriptive data like life table curves and detailed metabolomics of mutant strains to the Appendix 1-3) to focus on the key findings. Furthermore, we have clarified the logical transitions between experiments. For instance, regarding the choice to validate PxDnmt1, we now explicitly explain in the Results that our untargeted metabolomic analysis of the PxSODC mutant strains revealed consistent alterations in 5-hydroxymethyluracil (involved in DNA demethylation) and 5'-deoxyadenosine (a precursor to the primary methyl donor S-adenosylmethionine) across all developmental stages. This specific metabolic signature provided a strong, data-driven hypothesis linking PxSODC function to epigenetic regulation via DNA methylation, prompting us to functionally validate PxDnmt1. By explicitly stating these rationales, the narrative is now much clearer and cohesive.

      (2) The authors at times stretch their results too far, as the ecological relevance of their study design and results is not clear, limiting the generalizability and value of the results for understanding species' adaptive potential under climate change. For example, the selection regimes used present the minimum and maximum known temperatures at which the species can survive and develop, but it is unclear how the temperatures relate to the natural environment of the source population, to what extent wild populations might experience these temperatures, and whether they would experience them at the extended duration used (12h at max/min temperature). Moreover, I wonder whether the comparisons made would identify the genes that matter under natural conditions, as unevolved populations were kept under constant conditions compared to 12h:12h temperature regimes for the evolved populations, and the metabolic and transcriptomic profiling was done under a constant favorable 26°C rather than under thermal stress in a, as far as I can tell, randomly chosen life stage (larval stage).

      We appreciate the reviewer raising these important points regarding ecological relevance and experimental design. In the revised manuscript, we have added context and acknowledged these limitations in the Methods and Discussion sections. First, regarding ecological relevance: The source population is from Fuzhou, a subtropical region where summer high temperatures frequently exceed 32°C and winter lows can drop below 10°C, making our selection temperatures ecologically relevant extremes for this population. The 12h:12h cycling temperatures were designed to simulate severe but natural diurnal fluctuations.

      Second, regarding constant control vs. cycling regimes: The constant 26°C represents the established optimal developmental temperature and standard laboratory condition for P. xylostella. We acknowledge that comparing cycling selection regimes against a constant control might conflate adaptation to absolute temperature extremes with adaptation to thermal fluctuation itself. We have added this as a caveat in the Discussion. Third, regarding omics profiling conditions: The transcriptomic and metabolomic profiling was conducted under common garden conditions (26°C) specifically to identify constitutive, genetically fixed adaptations resulting from evolutionary selection, rather than immediate physiological plasticity under stress. We have clarified these rationales in the text.

      (3) The paper in its current form does not adequately describe the statistical analyses underlying the results, nor do the authors share their code, making it very hard to judge whether the analyses used are appropriate and the results trustworthy. I have concerns about the inappropriate use of t-tests, the lack of correcting for confounding variables, and the need for multiple testing corrections.

      We sincerely appreciate this concern. In the revised manuscript, we have made substantial improvements to the description of statistical analyses throughout the Methods section:

      (1) Statistical methods for each data type are now described separately and in detail, specifying the tests used, the number and type of comparisons, and sample sizes.

      (2) For metabolomic data, we have clarified that FDR correction was applied alongside multi-criteria thresholds (|log<sub>2</sub>Fold Change| ≥ 1, VIP ≥ 1, FDR < 0.05). For transcriptomic data, FDR correction (Benjamini and Hochberg, 1995) was applied via DESeq2.

      (3) For WGCNA, we have specified the total number of correlation tests (29 modules × 30 metabolites = 870) and the stringent dual threshold (|r| > 0.8, P < 0.05) used to control for false positives, following standard practice.

      (4) For life table parameters, the paired bootstrap method with 100,000 replications was used for all pairwise comparisons among strains.

      (5) For all other experimental data (qRT-PCR, SOD activity, O<sub>2</sub><sup>-</sup> levels, survival rates, supercooling/freezing points, etc.), we have specified that t-tests were used only for two-group comparisons, while one-way ANOVA with Tukey's or Tamhane's T2 test was used for three or more groups, with non-parametric alternatives applied when normality assumptions were not met.

      (6) The raw data have been deposited in public repositories (see Data availability), and all statistical procedures are now described in sufficient detail to enable independent reproduction of the results.

      Reviewer #2 (Recommendations for the authors):

      Title

      (4) I don't feel the title adequately captures the work, I would instead of 'adaptive evolution' use 'experimental evolution' and I would not use the word 'underpins' but instead 'indicates', as it is not clear from your work whether the adaptations to the lab conditions you used would be ecologically relevant nor whether they are involved in thermal adaptation in wild populations.

      Accepted. The title has been revised to: “Experimental evolution to thermal stress indicates climate resilience in a cosmopolitan arthropod.”

      Abstract

      (5a) Please add the phenotype results to the abstract.

      We have added key phenotype results to the abstract. The revised text now reads: “The hot strain showed accelerated development, higher fecundity, and increased survival under extreme heat, while the cold strain exhibited lower supercooling and freezing points, indicating enhanced cold hardiness.”

      (6b) The Abstract doesn't really detail the answer to your research question yet: so what insights into the genetic mechanisms underlying thermal adaptation did you gain that are novel?

      We agree. We have revised the Abstract to explicitly highlight the novel genetic and molecular mechanisms we discovered. Specifically, we now detail that thermal adaptation is driven by a coordinated mutational, metabolic, and epigenetic (1) an energy-efficient genetic mechanism where non-synonymous mutations in PxSODC enhance superoxide scavenging efficiency, enabling effective oxidative stress management at lower gene expression levels; (2) convergent metabolic adjustments, notably a reduction in lipid metabolism to conserve energy; and (3) epigenetic regulation of thermal tolerance via DNA methylation. The revised text has been updated in the Abstract accordingly.

      (7c) Line 3: replace 'ectotherms' with 'arthropods' to match the title?

      Done. “Terrestrial ectotherms” has been replaced with “terrestrial arthropods” in the abstract.

      (8d) Line 9: replace 'demographic' with 'life history'?

      Done. “Demographic” has been replaced with “life history” in the abstract.

      Introduction

      (9a) The storyline is a bit unclear. Do you want to focus on the increased threat from insect pests under climate change or on the threat of climate change on insect persistence? Please pick one and adapt your storyline accordingly. I would suggest focusing on the first and talking more about the range extension of pest species under climate change (which would also require adaptation to cold extremes).

      We agree and have refocused the Introduction on the increased threat from insect pests under climate change, emphasizing that range expansion into new regions requires adaptation to both heat and cold extremes. Both the first and second paragraphs have been revised accordingly.

      (10b) Line 31-33: What do you mean by 'shows a positive relationship between the thermal tolerance range and the level of climatic variability'? Are they able to tolerate a larger range of temperatures?

      This sentence has been revised as part of the restructured Introduction, which now focuses on the range expansion of pest species under climate change. The revised text reads: “Such range expansion requires adaptation not only to warmer conditions in existing habitats but also to cold extremes encountered during colonization of higher latitudes or elevations (Harvey et al., 2020).”

      (11c) Line 33-35: Is this information relevant here?

      Agreed. This sentence has been removed as part of the restructured Introduction, which now focuses on the threat of pest range expansion under climate change.

      (12d) Line 55-56: What exactly do we not know yet about the mechanisms that enable thermal adaptation that you aim to fill in this paper? Please rephrase your knowledge gap to be more concrete (e.g., "but we do not yet know how...").

      We have rephrased the knowledge gap to be more concrete and aligned with the revised storyline. The revised text now reads: “...we do not yet know how long-term thermal selection drives coordinated changes across gene function, metabolic networks, and life history traits to enable thermal adaptation and range expansion in pest species.”

      (13e) Line 57: Also, here, the storyline is unclear. Why did you use the diamondback moth as your model species? You provide many different reasons, but it would help if you emphasized one reason that is in line with whichever storyline you want to focus on: is it because it is an insect pest that can tolerate a wide range of temperatures?

      We have streamlined this paragraph to focus on the primary rationale: P. xylostella is a globally distributed pest that thrives across a wide range of thermal environments, making it an ideal model for studying the genetic mechanisms of thermal adaptation. Supporting details on genomic resources are retained briefly as they enable the multi-omics approach used in this study.

      (14f) Line 65: Demonstrated how? Please give a short summary of the evidence for their genetic capacity to tolerate future climates.

      We have added a brief summary of the evidence. Specifically, genome-wide SNP analysis of field populations from 114 locations across diverse biogeographical zones revealed climate-adaptive genetic variability, indicating that P. xylostella can tolerate projected future climates in most regions (Chen et al., 2021).

      (15g) Line 72: What does 'Age-stage' mean? Should it read 'Aged-staged'?

      “Age-stage, two-sex life table” is an established demographic method developed by Chi (1988) that simultaneously accounts for both age and developmental stage in both sexes. This is a standard term in the field (Chi et al., 2020), so we have retained the original wording but added a brief clarification upon first use.

      (16h) Line 78-80: This needs a bit more explanation. Why does an increased ability to scavenge superoxide anions affect adaptability under extreme temperature environments?

      We have added a brief explanation. Extreme temperatures induce oxidative stress by elevating intracellular reactive oxygen species (ROS), including superoxide anions, which can damage cellular structures. Enhanced scavenging capacity thus helps maintain cellular homeostasis under thermal stress.

      (i) Line 82-86: Please be more precise. What novel insights did you gain about the genetic mechanisms underlying thermal adaptation?

      We have revised this sentence to more precisely summarize the novel insights, encompassing both the multi-omics findings and the functional validation of PxSODC.

      Results

      (18a) The results section is very long and presents an overload of information at the moment, overwhelming the reader. Consider moving some sections to the Supplements (for example, a large part of the phenotypic data that cannot be linked to the omics data and the metabolic profiling of the mutant strains) or leave them out of the paper altogether.

      We agree that the Results section was too dense. We have streamlined it by moving the following content to the Supplementary Materials:

      (1) Detailed age-stage survival and fecundity curve data for the ancestral, hot and cold strains (Supplementary Text S1).

      (2) Detailed life table analysis of the PxSODC mutant strains (Supplementary Text S2).

      (3) Detailed untargeted metabolomic profiling of the SODC-MU mutant strains across developmental stages (Supplementary Text S3).

      The main text now retains only the key life history comparisons, extreme temperature tolerance results, omics-based evidence linking transcriptomics and metabolomics, functional validation of PxSODC, and the DNA methylation findings, with brief summaries and cross-references to the Supplements for supporting details.

      (19b) Please also provide the effect sizes for the different effects you report, for example, how many degrees difference was there between ancestral and cold strains in the supercooling/freezing points, and what was the variation?

      We have added specific effect sizes (mean ± SEM and between-group differences) for all key comparisons throughout the Results section, including preadult duration, stage-specific survival rates under extreme heat, supercooling/freezing points, and SODC-MU mutant strain comparisons. For example, the supercooling points of CS pupae (-23.99 ± 0.18°C) were 0.90°C lower than AS (-23.09 ± 0.26°C), and the freezing points were 2.66°C lower (-14.24 ± 0.61°C vs. -11.58 ± 0.52°C). Please refer to the revised manuscript for all updated values.

      (20c) Line 93-94: "Intrinsic and finite rate of increase" of what?

      Clarified. These are population growth parameters. The revised text now specifies “intrinsic rate of increase (r) and finite rate of increase (λ) of the population.”

      (21d) Line 98-99: Please start the paragraph with this summary of the results and then further detail them.

      We have restructured this paragraph by moving the summary sentence to the beginning, followed by the supporting details.

      (22e) Line 100-109: Why did you look at daily survival and fecundity rates? Please add why this is relevant.

      As part of the overall streamlining of the Results section, this paragraph on detailed age-stage survival and fecundity curves has been moved to Supplementary Text S1. A brief justification for their relevance has been added there, noting that these curves capture stage-specific variation in survival and fecundity that summary life table parameters alone may obscure.

      (23f) Line 106: What do HS, AS, and CS stand for? And please provide the statistics for comparison of daily survival rates between the strains.

      We have defined the abbreviations (HS = hot strain, AS = ancestral strain, CS = cold strain) at their first appearance in the Results section. This paragraph on daily survival and fecundity has been moved to Supplementary Text S1, where the abbreviations are also defined. The survival rates reported are the maximum daily survival rates derived from the age-stage specific survival rate curves (s<sub>xj</sub>), and the statistical comparisons among strains are presented in Supplemental Table S1.

      (24g) Line 144-146: Why are these differential metabolites likely to play a crucial role?

      We agree this statement was speculative. It has been removed from the revised manuscript.

      (25h) Line 159-161: Why is a reduction of lipid metabolites evidence for adaptive evolution?

      We have revised this sentence to clarify the reasoning. The reduction in lipid metabolites in both independently evolved hot and cold strains suggests a convergent metabolic response, indicating that lipid metabolism adjustment is a shared adaptive strategy rather than a random change.

      (26i) Line 184-185: It is difficult to judge from Figure 3E the extent of overlap in KEGG pathways between the hot and cold strains. Can you adjust the figure to emphasize that overlap more?

      Agree. To intuitively emphasize the extent of overlap in KEGG pathways between the hot and cold strains, we have completely redesigned Figure 3E. Instead of presenting two separate panels with unaligned vertical axes, we have consolidated the data into a single back-to-back (mirrored) bar chart with a shared central y-axis.

      (27j) Line 211: Not only the red module, but also the blue and green module correlates with many of the shared differential metabolites.

      We agree. We have revised the text to acknowledge that the blue and green modules also showed strong correlations with shared differential metabolites, while noting that the red module had the highest number of significantly correlated metabolites and was therefore selected for further analysis.

      (28k) Line 215: I would rephrase this as genes being interesting candidates for being involved in thermal adaptation or 'seem to be important for the adaptation of...', as you don't know from these results whether these genes play a critical regulatory role.

      Agreed. We have toned down the language to reflect the correlative nature of these results.

      (29l) Line 233: Do you mean that you further analyzed 15 genes of the 79 identified candidate genes in the previous paragraph?

      Yes, exactly. From the 79 candidate genes, we selected 15 that were both annotated in the genome and had high expression levels (FPKM > 10) for further analysis. We have clarified this in the revised manuscript.

      (30m) Line 238: What does SOD stand for?

      We have spelled out the abbreviation upon first use in this section.

      (31n) Line 254-255: Please provide the stats for this result.

      We have added the specific allele frequencies for each strain. The Leu194-Met194 mutation frequency was determined by direct sequencing of 10 individuals per strain, and the frequencies are now reported in the revised text.

      (32o) Line 303-304: How did you test for enhanced stability to temperature fluctuations? And enhanced compared to what?

      This observation was based on the survival rate data in Figure 5C, where mutant pupae at 43°C showed no significant difference from the ancestral strain, whereas other life stages (eggs, larvae, adults) at 42°C showed significantly reduced survival in the mutant strains. We have revised the text to clarify the comparison.

      (33p) Line 324-326: Why do decreased expression levels demonstrate increased O₂⁻ scavenging capacity? And why is that beneficial for adaptation to thermal stress? Please explain.

      We have revised this sentence to clarify the logic. The non-synonymous mutations in the hot and cold strains likely alter the protein conformation of SOD enzymes, increasing their catalytic efficiency per molecule. This allows effective O<sub>2</sub><sup>-</sup> scavenging at lower expression levels, which is energetically favorable under thermal stress where energy conservation is critical for survival.

      (34q) Line 404-406: I'm confused. Is there a direct link between the gene you knocked out here and the results you presented up until now? How do the reduced levels of 5-methylcytosine relate to the metabolite results you present at the beginning of the paragraph, other than that both could be involved in DNA methylation?

      We have revised this paragraph to clarify the logical chain. Among the three metabolites consistently altered across all developmental stages in the SODC-MU strains, 5-hydroxymethyluracil is involved in dynamic DNA demethylation and 5'-deoxyadenosine is a precursor to S-adenosylmethionine (the methyl donor for DNA methylation). This suggested a link between PxSODC deletion and DNA methylation. To test this, we examined PxDnmt1 expression and activity in the thermally adapted strains and found both were significantly reduced. We then used RNAi to silence PxDnmt1 and confirmed that reduced DNA methylation (lower 5-mC levels) directly impaired thermal tolerance. Thus the connection is: PxSODC deletion → altered methylation-related metabolites → reduced DNA methyltransferase activity → decreased thermal tolerance.

      (35r) Line 410: Saying that your knockdown of a gene that did not directly pop up in any of your other analyses confirms that DNA methyltransferase is associated with the response to thermal selection is a stretch. Please rephrase.

      We agree this was overstated. We have toned down the language to reflect that the RNAi results provide preliminary evidence for a potential role of DNA methylation in thermal tolerance, rather than confirmation.

      Discussion

      (36a) The phenotype data are currently not discussed at all. Please add it to the discussion and try to integrate it more with the omics data you collected.

      We agree. To provide a cohesive narrative and avoid redundancy, we have addressed this comment in conjunction with our pathway interpretation (please see our response to Reviewer 1, Comment 1). In the revised Discussion, we explicitly integrated our specific phenotypic findings (e.g., accelerated development, increased fecundity, and heat survival in the hot strain; prolonged lifespan and lowered supercooling points in the cold strain) with the distinct transcriptomic and metabolomic profiles. This integration demonstrates how molecular and metabolic rewiring directly underpins the divergent life-history traits without engaging in unwarranted speculation.

      (37b) Line 433-434: I don't think this adequately represents the relevance of your particular study. I would suggest changing it to be more in line with the storyline of understanding the capacity for global dispersal in insect pests under climate change.

      We agree. We have revised this sentence to align with the storyline of pest range expansion under climate change.

      (38c) Line 476: This is a very odd statement; don't all species' genomes have genes encoding proteins involved in thermal adaptation? The reference also doesn't seem to be appropriate. I would suggest deleting this sentence.

      Agreed. This sentence has been removed.

      (39d) Line 483: Please write out SOD the first time you use it in a new section.

      Done. SOD has been spelled out at its first use in the Discussion.

      (40e) Line 544-548: This is a bit too specific to be the last sentence of the discussion. Try to formulate it more broadly in terms of what future research should focus on in general, not just your specific research.

      We agree. We have broadened the final sentence to address future research directions more generally.

      Figures

      (41a) Figure 1A: I don't think t-tests are appropriate here since you are not simply comparing two treatments, but testing for the effects of 5-6 different temperatures. And how did you correct for replicate populations in your analysis?

      Clarified. In Figure 1A, our comparisons are independent pairwise tests between exactly two strains (HS vs. AS) at each specific temperature and time point, making t-tests statistically appropriate. We were not testing for a continuous effect across temperatures. Regarding replicate populations, the individuals used in these assays were drawn from across the six replicate populations per treatment, with each biological replicate (n = 6, with 20 individuals per replicate) comprising individuals pooled from across the replicate populations to account for inter-population variation. We have clarified this in the revised figure legend.

      (42b) Figure 1B, Figure 5D, Figure 7: bar graphs are used for count data, so do the data represent the number of individuals with a certain trait value? If they are instead showing the mean of the population/treatment group, please use mean points ± standard errors instead.

      Accepted. The data in these figures represent continuous physiological traits (e.g., supercooling/freezing points) showing the mean of the populations, rather than count data. To align with current data visualization standards for continuous variables and to provide full transparency of the underlying data distribution, we have replaced the bar graphs in Figures 1B, 5D, and 7 with scatter plots. These revised figures now display the mean ± SEM overlaid with all individual biological replicate data points.

      (43c) Figure 3B: There is a typo in the graph, it reads 'Stattistics' instead of 'Statistics'.

      Corrected. The typo ‘Stattistics’ in Figure 3B has been fixed.

      (44d) Figure 3C: I don't understand what the colors of the graph mean here. Is it the average differential expression of each replicate compared to the ancestral?

      Clarified. We have updated the figure legend to explain that the colors represent the Pearson correlation coefficient (r) between pairs of biological replicates, indicating the degree of transcriptomic similarity among samples.

      Methods

      (45a) Please start each new methods paragraph with the purpose of the method/analysis, for example, "To investigate XX, we used method X to measure X". It is at the moment hard to understand why certain things were done.

      We agree. We have revised each Methods paragraph to begin with a clear statement of purpose, so that the rationale for each analysis is immediately apparent. All changes are shown in the revised manuscript.

      (46b) Line 575-578: Why were the selection regimes with cycling temperatures and the control with constant?

      The cycling temperatures in the hot (32°C/27°C) and cold (15°C/10°C) regimes were designed to simulate diurnal temperature fluctuations (12h light/12h dark) that more closely reflect natural thermal environments. The control was maintained at a constant 26°C, which is the established optimal developmental temperature for P. xylostella (Liu et al., 2002) and represents the standard laboratory rearing condition. We acknowledge this asymmetry and have added a justification in the revised manuscript.

      (47c) Line 581: How many generations was the ancestral population kept in the lab before the start of the selection experiment? And for how many generations were the populations selected?

      The ancestral population was maintained in the laboratory for approximately ~170 generations (from July 2012 to the start of the selection experiment) before the thermal selection began. The hot strain was selected for ~75 generations and the cold strain for ~15 generations over the three-year experiment. We have added this information to the revised manuscript.

      (48d) Line 585-586: I don't understand what you mean by randomly selecting six replicate populations per treatment for downstream experiments when you only had six replicate populations per treatment to begin with (as detailed in Line 574)?

      We apologize for the confusion. All six replicate populations per treatment were used for downstream experiments. We have corrected this sentence to remove the misleading “randomly selected” wording.

      (49e) Line 590: Were these 90 eggs also randomly selected, like for the individual life tables? And were these kept at the baseline temperature conditions?

      Yes, the 90 eggs were randomly selected and maintained under the baseline favorable temperature (26°C). We have clarified this in the revised manuscript.

      (50f) Line 606: Which life history and population fitness parameters were calculated?

      We have specified all parameters calculated in the revised manuscript.

      (51g) Line 609: Link to software doesn't work.

      We have updated the software link to the current working URL.

      (52h) Line 611: Please spell out what 'BT' stands for.

      Done. “BT” has been spelled out as “bootstrap” upon first use.

      (53i) Line 612-613: How many tests did you do? Did you correct for multiple testing? Using what method?

      The paired bootstrap method implemented in TWOSEX-MSChart inherently accounts for multiple pairwise comparisons through 100,000 bootstrap replications. We have clarified the scope of comparisons in the revised manuscript.

      (54j) Line 620-621: What does biological replicate mean here? Individual eggs / larvae / pupae / adults, or were all or some life stages pooled? Also, you now only detailed which samples were collected for metabolomic profiling, were the same samples used for transcriptomic profiling, or a subset?

      Each biological replicate consisted of pooled individuals at the same developmental stage. The same sample collection strategy was used for both metabolomic and transcriptomic profiling, but from independent biological replicates (six for metabolomics, three for transcriptomics). We have clarified this in the revised manuscript.

      (55k) Line 637: Also here, how many tests did you do? Were p-values corrected for multiple testing? Using what method?

      Differential metabolites were identified through pairwise comparisons using Student's t-test with FDR correction for multiple testing. A multi-criteria threshold of |log<sub>2</sub>Fold Change| ≥ 1, VIP ≥ 1, and FDR < 0.05 was applied. This approach was used for all metabolomic comparisons, including HS vs. AS, CS vs. AS, and SODC-MU vs. AS. We have clarified this in the revised manuscript.

      (56l) Line 662: And here: how many tests did you do? Did you correct for multiple testing? Using what method?

      In the WGCNA analysis, Pearson correlations were calculated between each module eigengene and each of the 30 common differential metabolites, resulting in a total of 29 × 30 = 870 correlation tests. Following standard WGCNA practice, rather than applying FDR correction, we used a stringent dual threshold of |correlation coefficient| > 0.8 and P < 0.05 to identify significant module-metabolite associations, which effectively controls for false positives (Langfelder and Horvath, 2008). We have clarified this in the revised manuscript.

      (57m) Line 663: How did you select these modules? The ones that significantly correlated with differential metabolites? Why did you not use the phenotype data here?

      Modules were selected based on significant correlations (|correlation coefficient| > 0.8, P < 0.05) with differential metabolites shared between the hot and cold strains. We chose metabolites rather than phenotype data as the trait input for WGCNA because metabolites serve as intermediate molecular phenotypes that bridge gene expression and organismal phenotypes, providing a more direct link to the underlying regulatory mechanisms. This approach allowed us to identify gene modules most closely associated with the metabolic changes driven by thermal adaptation, which could then be connected to the observed life history and fitness divergence.

      (58n) Line 666: move RNA extraction details to before RNAseq methods description.

      Done. The “RNA extraction and cDNA synthesis” section has been relocated to before the “Transcriptomic profiling” section for better logical flow.

      (59o) Line 836: This paragraph describing the statistics is very short, and it is unclear to what data the described analyses apply. As the different types of data are very different, I expect the analyses to differ as well. Please describe the statistical analyses for each data type in more detail, specifying what tests you used, which, and how many comparisons were performed.

      We agree. The statistical methods for life table analysis, metabolomics, and transcriptomics have been detailed in their respective method sections. We have expanded the Data analysis section to specify the statistical tests for the remaining experimental data.

      (60p) Line 837: Please include your SPSS scripts to ensure the reproducibility of your results.

      The statistical analyses in SPSS were performed using the graphical user interface. As all statistical tests, parameters, and comparison groups have been described in detail in the revised Methods section, and the raw data have been deposited in public repositories (see Data availability), we believe the analyses are fully reproducible. We are happy to provide additional details if needed.

    1. In the short term, this could be attackers, if frontier labs aren't careful about how they release these models. In the long term, we expect it will be defenders who will more efficiently direct resources and use these models to fix bugs before new code ever ships. But the transitional period may be tumultuous regardless.

      「过渡期可能无论如何都会动荡」是整篇报告最诚实的一句话。历史上,每一次重大安全工具的出现(模糊测试、漏洞扫描器、自动化渗透测试)都经历了攻击者先于防御者大规模采用的阶段。Anthropic通过Project Glasswing的限制发布试图压缩这个窗口,但「可能」(may be)而非「将会」(will be)的措辞,承认了这一策略的局限性。

    2. Engineers at Anthropic with no formal security training have asked Mythos Preview to find remote code execution vulnerabilities overnight, and woken up the following morning to a complete, working exploit.

      「没有正式安全培训的工程师过夜得到完整可用漏洞利用」——这句话将Mythos的能力从「顶级研究人员工具」重新定义为「技能民主化工具」。漏洞利用开发历史上是最难民主化的安全技能之一,需要多年专业积累。如果这个门槛已经被清除,那么具有适度技术背景的国家行为者、犯罪组织乃至个人都将获得此前只有精英安全团队才有的进攻能力。

    3. We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.

      「能力涌现」而非「刻意训练」是这篇报告最深刻的政策含义:漏洞发现和利用能力是通用推理能力的副产品,无法被单独抑制。这意味着任何试图「只训练防御能力而屏蔽进攻能力」的方法在根本上是不可行的——使模型更擅长修复漏洞的同样能力,也使它更擅长利用漏洞。这对AI安全治理的含义是:能力限制必须在模型部署层而非训练层实施。

    1. the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer.

      This is the sharpest critique of naive AI coding adoption in the article. Without proper agent oversight, code review loops, and quality gates, AI doesn't raise the floor — it lowers it by enabling low-quality code to ship at machine speed. The 'worst engineer' framing implies that unconstrained agents optimize for task completion, not codebase health.

    2. what changed after the December 2025 model inflection , and why "spec to pull request" is now becoming a real production workflow.

      'Spec to pull request' as a production workflow means the human's job becomes writing requirements, not code — a complete inversion of the current engineering process. The December 2025 inflection point is significant: it marks when models became capable enough to close the gap between high-level intent and production-ready implementation without constant human steering.

    3. From coining "context engineering" to building the infrastructure behind Devin's 7x PR growth and jump from 16% to 80% of commits across Cognition repos

      16% to 80% of commits is the most striking internal metric here — it means AI has gone from a minority contributor to the dominant author of code at Cognition's own repos. This is a company eating its own cooking in a very public way, and the 7x PR growth rate suggests the compounding effect of agents handling more complete units of work.

    4. Cursor is no longer primarily about writing code . It is about helping developers build the factory that creates their software . This factory is made up of fleets of agents that they interact with as teammates : providing initial direction, equipping them with the tools to work independently, and reviewing their work.

      The 'factory that creates software' metaphor signals a fundamental identity shift for developer tools — from text editors with AI to production management systems. If developers become factory managers rather than craftspeople, the skills that matter most shift dramatically toward task decomposition, agent supervision, and quality gate design.

    5. The first wave of AI coding tools made the developer faster but remain heavily in the loop. Copilor and Cursor's tab autocomplete are prime examples However, the workflow was still heavily centered around and bottlenecked by the developer's local workflow: a developer in an IDE, watching the model, accepting or rejecting changes, and pushing code one interaction at a time.

      Framing Copilot and Cursor's autocomplete as 'wave 1' that merely accelerated the existing bottleneck reframes the narrative: these tools didn't change the fundamental unit of work (developer attention), they just made it faster. The real disruption is removing developer attention as the rate-limiting step entirely.

    1. Switch on a new Claude Code-specific setting called ultracode. This is accessible through the effort menu and it sets the effort level to xhigh, while letting Claude decide automatically when to use a workflow to handle your task.

      Naming a mode 'ultracode' with an 'xhigh' effort level is a deliberate psychological signal about token consumption — it primes users to expect significant resource use. More interestingly, letting Claude autonomously decide when to spawn a full workflow (versus a simple reply) means the model itself is making meta-level resource allocation decisions.

    2. Agents address the problem from independent angles, other agents try to refute what they found, and the run keeps iterating until the answers converge—which is how a workflow reaches results a single pass can't.

      Convergence through adversarial iteration is borrowed from ensemble methods and scientific peer review — but applied to code. The non-obvious implication: this architecture is more robust to the hallucination problem than single-pass generation, because refuting agents are specifically incentivized to find failures. It's a form of AI quality control built into the workflow itself.

    3. One workflow mapped the right Rust lifetime for every struct field in the Zig codebase. The next wrote every .rs file as a behavior-identical port of its .zig counterpart, hundreds of agents working in parallel with two reviewers on each file.

      Rust lifetime inference across a 750k-line codebase is one of the hardest mechanical tasks in systems programming — it requires deep semantic understanding of ownership patterns. That Claude could map lifetimes wholesale across a large Zig codebase, then have agents review each file in parallel, suggests a qualitative jump in code comprehension capability.

    4. Jarred Sumner used dynamic workflows to port Bun from Zig to Rust with 99.8% of the existing test suite passing, roughly 750,000 lines of Rust, and eleven days from first commit to merge.

      750,000 lines of Rust in 11 days is a genuinely remarkable benchmark — a large-scale language port that would typically occupy an experienced team for 6-12 months. The 99.8% test pass rate is the critical credibility signal: it suggests the agents were doing semantic translation, not just syntactic conversion.

    5. When the cost of a wrong answer is high, a workflow gives Claude independent attempts at the problem and adversarial agents working to break the result before you see it.

      Adversarial self-verification is a significant architectural step beyond standard code review. Having agents actively attempt to falsify results before surfacing them mirrors formal verification approaches — but applied dynamically to any engineering problem. This could shift AI coding from 'trust then verify' to 'verify then deliver.'

    6. Work you'd normally plan in quarters now finishes in days. Claude dynamically writes orchestration scripts that run tens to hundreds of parallel subagents in a single session, checking its work before anything reaches you.

      The 'quarters to days' compression is a bold claim that reframes AI coding tools from assistants to project managers. The key novelty here isn't just parallelism — it's that Claude writes the orchestration scripts itself, meaning the planning layer is also automated rather than pre-specified by engineers.

    1. eLife Assessment

      This valuable work addresses a longstanding question of how the extant genetic code came to be selected and conserved almost universally across life. Using a mutational approach and a small set of reporters, the authors demonstrate that the mutational impact was similar for non-standard genetic codes. The data provide solid support for the claim of having provided experimental verification of the error minimization theory.

    2. Reviewer #1 (Public review):

      [Editors' note: This version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review satisfactorily and toned down the comments as advised.]

      In this manuscript, the authors investigate the relationship between genetic codes and their robustness to single-point mutations. They construct ten alternative genetic codes by reassigning nine codons to Leu, Ser, or Ala, and assess mutational robustness using three reporter proteins subjected to error-prone PCR. This represents an interesting experimental approach to addressing the hypothesis that the standard genetic code is optimized for mutational robustness.

    3. Reviewer #2 (Public review):

      The study addresses the long-standing question in molecular biology and genetics: why has nature selected the current genetic code (SGC, or standard genetic code)? The authors have tested 'error minimization theory', one of the prevailing hypotheses to explain this. Their approach is to create a minimum genetic code (MGC) and its variants (3^9 theoretical possible codes). Using three parameters to quantify the effect of mutations (Polarity, volume, and hydropathy), they computationally test the cost of these genetic codes (3^9) by simulations. Finally, they test this cost experimentally using an in vitro translation system with 10 select genetic code variants with a range of costs (low to high). They use three randomly mutated reporter genes for this purpose - beta-galactosidase, luciferase, and mSG. They find no correlation between the cost of the genetic code and the reporters' output. Based on these observations, they suggest that error-minimization theory may not explain the current egocentric code.

      The question they are asking is very exciting, and their approach is solid. The authors are very careful in their analyses and conclusions.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Miyachi and Ichihashi investigate whether the arrangement of the genetic code affects mutational robustness. Using an in vitro minimal genetic code with vacant codons, they constructed 10 non-standard genetic codes by reassigning Ala, Ser, and Leu, generating codes with replacement costs that were generally higher than those of the standard genetic code across several amino acid property measures. They then tested how random mutations affected the activity of reporter proteins translated under these altered codes. Although error minimization theory predicts that higher-cost codes should make mutations more harmful, the authors report that protein function declined to a similar extent across all codes examined, suggesting that mutational robustness remains largely unchanged within the range of genetic code alterations tested here.

      Strengths:

      This is an interesting study that investigates one of the most fundamental and intriguing questions in molecular evolution: the emergence of the genetic code, which is nearly universal across nature. The in vitro approach is a powerful aspect of the work and provides an opportunity to examine this phenomenon experimentally at a depth that has previously been inaccessible.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors investigate the relationship between genetic codes and their robustness to single-point mutations. They construct ten alternative genetic codes by reassigning nine codons to Leu, Ser, or Ala, and assess mutational robustness using three reporter proteins subjected to error-prone PCR. This represents an interesting experimental approach to addressing the hypothesis that the standard genetic code is optimized for mutational robustness.

      We sincerely thank the reviewer for the positive evaluation of our experimental approach. We are encouraged that the reviewer recognizes the value of constructing multiple non-standard genetic codes in vitro and using them to experimentally examine the relationship between genetic code arrangement and mutational robustness. In the revised manuscript, we have further clarified the scope of our experimental system and the interpretation of the results, particularly emphasizing that our conclusions concern the mutational robustness of individual reporter protein activity measured in an in vitro translation system.

      Major comment:

      While I find the experimental design valuable, I am not fully convinced by the authors' conclusion that "alterations of the genetic code within the ranges explored in this study have no significant effect on mutational robustness". The current analysis is based on the functional output of three individual reporter proteins. Given that cellular systems involve far more complex interactions, it would be more appropriate to limit this conclusion to mutational robustness at the level of individual protein activity, rather than making broader generalizations.

      We thank the reviewer for this important comment. We agree that our original wording was broader than what can be directly supported by the present experiments. Because our analysis is based on the functional outputs of three individual reporter proteins translated in a reconstituted in vitro system, the results do not directly address mutational robustness at the level of the cellular system, protein interaction networks, or organismal fitness.

      Accordingly, we have revised the manuscript to limit our conclusion to the mutational robustness of individual reporter protein activity. In the revised Abstract, Results, and Discussion, we now state that within the experimentally tested range of non-standard genetic codes, we did not detect a dependence of the mutation-induced decrease in reporter protein activity on mutational cost. We have also added a statement in the Discussion noting that cellular systems involve many additional layers, including protein–protein interactions, metabolic networks, quality-control systems, and growth selection, and that whether genetic code arrangement affects robustness at these higher biological levels remains an important question for future work.

      Specifically, we have added this explanation and the new experiment to the revised manuscript as follows.

      Abstract

      “This result provides direct experimental evidence that mutational robustness does not significantly change in individual reporter protein activity when the genetic code is altered within the range of mutational cost tested in this study…”

      Introduction

      “Random mutations decreased reporter protein function at similar levels across all genetic codes examined, implying that alterations of the genetic code within the ranges explored in this study have no significant effect on mutational robustness of individual protein activity.”

      Result

      “Taken together, these results indicate that mutational robustness of individual reporter protein function did not substantially differ among the genetic codes…”

      Discussion

      “…suggesting that mutational robustness of protein activity remained largely unchanged within at least the ranges of mutational cost tested in this study. It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question.”

      Specific comments

      (1) tRNA modification and expression efficiency (Page 5, line 131)

      The authors attribute the observed inefficiency to the lack of chemical modifications in the tRNAs used. However, gene expression efficiency can also be strongly influenced by DNA sequence design. To better support this claim, it would be helpful to compare luciferase activity when expressed using native E. coli tRNAs. This comparison could clarify whether the observed effects are due to tRNA modification status or other sequence-dependent factors.

      We thank the reviewer for this important suggestion. We agree that the translation efficiency of NanoLuc templates with 21-, 32-, and 46-codons may be affected not only by the chemical modification of tRNAs but also by sequence-dependent factors, such as codon context and mRNA structure.

      To examine this possibility, we performed an additional comparison using native E. coli tRNAs in the tfPURE system. When the NanoLuc templates encoded with 21, 32, or 46 codons were translated using native E. coli tRNAs, the observed luminescence values were 1.2 × 10<sup>10</sup>, 0.78 × 10<sup>10</sup>, and 0.60 × 10<sup>10</sup>, respectively. Thus, the 46-codon NanoLuc template showed lower activity than the 21- and 32-codon templates even with native tRNAs, indicating that sequence-dependent effects indeed contribute to translation efficiency.

      However, the difference among these templates with native E. coli tRNAs was within approximately two-fold. This effect was much smaller than the marked decrease observed when the 46-codon template was translated using the in vitro prepared 46 tRNAs SGC system. Therefore, while sequence-dependent effects cannot be excluded, the inefficient translation in the reconstructed 46 tRNAs SGC is likely to be mainly attributable to the limited functionality of unmodified tRNAs decoding NNA codons.

      We have revised the manuscript to clarify this interpretation and have added the new comparison using native E. coli tRNAs.

      “We also examined whether the lower translation efficiency of the 46-codon NanoLuc template could be explained by sequence-dependent effects, such as codon context or mRNA structure. When the 21-, 32-, and 46-codon NanoLuc templates were translated using native E. coli tRNAs in the tfPURE system (Figure 1–figure supplement 2), the 46-codon template showed lower activity than the 21- and 32-codon templates; however, this difference was within approximately two-fold. Accordingly, we decided to use only the 32 codons used in near-SGC (i.e., excluding NNA codons) in the subsequent construction of non-standard genetic codes.”

      (2) Discrepancy between expression level and activity (Figure S7 vs Figure S8).

      Although GAL expression levels appear similar across different genetic codes (Figure S7), their activities differ substantially (Figure S8), even in the low-mutation library. This discrepancy warrants further investigation. Possible explanations include differences in protein folding efficiency or translational error rates, as mentioned by the authors in the main text.

      To address this, the authors could analyze the protein products using mass spectrometry. If this is not feasible due to low expression levels, alternative approaches such as SDS-PAGE (e.g., with radiolabeling or Western blotting) would still provide valuable information. Additionally, comparing activity after in vitro refolding could help distinguish between folding defects and sequence-level errors. While I understand that the primary aim of this study is to compare mutational robustness across genetic codes, discussing these observations would significantly enhance the mechanistic insight of the work.

      We agree that the discrepancy between similar GAL expression levels and different GAL activities across genetic codes is important for interpreting the results.

      In our experiment, GAL protein amounts were quantified using a C-terminal HiBiT tag. Because the HiBiT tag was fused to the C-terminus of GAL, this assay indicates that the amount of C-terminally completed GAL products did not differ substantially among genetic codes. However, we agree that this assay does not evaluate the sequence fidelity, amino acid misincorporation patterns, or folding state of the translated products. Therefore, the observed differences in GAL activity despite similar HiBiT signals may reflect genetic code-dependent differences in translational error rates, amino acid misincorporation, protein folding efficiency, or other effects on the fraction of catalytically active protein.

      We have revised the Discussion to explicitly describe this interpretation and to clarify that detailed mechanistic dissection of these baseline activity differences, for example by mass spectrometry, SDS-PAGE/Western blotting, or refolding analysis, is an important future direction but beyond the scope of the present study. We also clarified that the main analysis in this study uses the ratio of activity from the high-mutation library to that from the corresponding low-mutation library within each genetic code.

      We have added this explanation to the revised manuscript as follows.

      “Although protein amounts quantified by the HiBiT tag were comparable among genetic codes, GAL activities differed substantially. This indicates that the activity differences among genetic codes were not primarily attributable to differences in the amount of C-terminally completed translation products. The HiBiT assay does not provide information on the fraction of catalytically active protein, including sequence fidelity or folding state, and therefore cannot distinguish among these possibilities. Detailed characterization of translated products by mass spectrometry would provide further mechanistic insight into how individual non-SGCs affect protein quality. However, the primary objective of the present study was to compare mutation-dependent activity loss across genetic codes. Therefore, we evaluated this effect by normalizing the activity of the high-mutation library to that of the corresponding low-mutation library within each genetic code.”

      (3) Protein expression analysis for additional reporters.

      Since protein expression levels are critical for interpreting reporter activity, similar analyses should also be performed for luciferase (Luc) and mSG in both high- and low-mutation libraries. This would ensure that differences in activity are not confounded by variations in protein abundance.

      We agree that protein abundance is an important factor for interpreting reporter activity. In this study, we performed HiBiT-based protein quantification for GAL because GAL showed the largest variation in absolute activity among genetic codes, even in the low-mutation library. This analysis showed that the amount of C-terminally completed GAL products was broadly comparable among genetic codes and between low- and high-mutation libraries, indicating that the observed GAL activity differences were not primarily attributable to differences in total protein abundance.

      For all three reporters, our main analysis was based on the ratio of activity from the high-mutation library to that from the corresponding low-mutation library within each genetic code. This normalization was intended to evaluate mutation-dependent activity loss while reducing the influence of code-specific baseline differences in expression level or protein quality. We believe that the data are sufficient to evaluate the effect of mutations on protein activities. Nevertheless, we agree that protein quantification for Luc and mSG would provide useful information regarding variation in the baseline levels of reporter activity, and this is an important direction for future work.

      Reviewer #2 (Public review):

      Summary:

      The study addresses the long-standing question in molecular biology and genetics: why has nature selected the current genetic code (SGC, or standard genetic code)? The authors have tested 'error minimization theory', one of the prevailing hypotheses to explain this. Their approach is to create a minimum genetic code (MGC) and its variants (3^9 theoretical possible codes). Using three parameters to quantify the effect of mutations (Polarity, volume, and hydropathy), they computationally test the cost of these genetic codes (3^9) by simulations. Finally, they test this cost experimentally using an in vitro translation system with 10 select genetic code variants with a range of costs (low to high). They use three randomly mutated reporter genes for this purpose - beta-galactosidase, luciferase, and mSG. They find no correlation between the cost of the genetic code and the reporters' output. Based on these observations, they suggest that error-minimization theory may not explain the current egocentric code.

      The question they are asking is very exciting, and their approach is solid. The authors are very careful in their analyses and conclusions.

      We sincerely thank the reviewer for the positive assessment of our study and for the helpful suggestions. We are encouraged that the reviewer found the question exciting and the approach solid. In the revised manuscript, we have clarified the rationale for using the MGC/near-SGC framework, added further analyses and explanations of the mutational cost calculations, and revised the wording of our conclusions to more explicitly define the scope and limitations of the present experimental system.

      (1) The rationale for using MGC instead of SGC: It is unclear why the authors rely on the MGC for this analysis when the central question concerns the SGC. If the goal is to evaluate whether the SGC minimizes mutational cost, a more direct approach would be to generate alternative variants of the SGC itself and compare their mutational cost distributions. At present, it is difficult to assess whether conclusions drawn from this comparison are fully relevant to the stated biological question.

      We thank the reviewer for this important comment. We agree that directly constructing alternative variants of the SGC by changing amino acid assignment from SGC would be the most straightforward approach to testing whether the SGC minimizes mutational cost. However, this approach is currently not feasible in our reconstituted translation system for two reasons.

      First, our attempt to construct a 46-tRNA SGC-like system revealed that translation using the 46-codon NanoLuc template was approximately 100-fold less efficient than translation using the MGC or near-SGC (Fig. 1). This low activity likely reflects inefficient decoding of NNA codons by in vitro-prepared tRNAs, which lack native post-transcriptional modifications. Because this system did not provide sufficient translational activity for systematic reporter assays, we restricted subsequent experiments to the 32-codon near-SGC framework, excluding NNA codons. We now describe this technical limitation more explicitly in the revised manuscript.

      Second, the MGC framework provides vacant codons that can be reassigned by adding anticodon-variant tRNAs. This feature is essential for constructing multiple genetic code variants in parallel under controlled in vitro conditions. We, therefore, constructed the near-SGC-based non-SGC by adding each tRNA variant to the MGC as an experimentally tractable model system to verify whether differences in genetic code arrangement affect mutation-induced decreases in reporter protein activity.

      We have added this explanation to the revised manuscript as follows.

      “We first established a minimal genetic code, composed of 21 tRNAs with vacant codons, which allows multiple alternative codon assignments to be introduced under otherwise comparable translation conditions.”

      Despite this technical limitation, we believe that the central conclusion of this study—that mutational robustness in individual reporter protein activity does not change significantly when the genetic code is altered within the range of mutational costs tested here—remains well-supported by the present results.

      (2) The mutational cost analysis appears biologically oversimplified because all amino acid substitutions are treated equivalently. The analysis assumes that all mutations contribute equally to fitness consequences, which does not reflect biological reality. In natural proteins, the impact of an amino acid substitution depends strongly on its structural and functional context. For example, substitutions affecting catalytic residues, ligand-binding interfaces, phosphorylation sites, or other regulatory motifs can severely impair protein function even when associated changes in polarity, hydropathy, or volume are minimal. Conversely, substitutions in structurally permissive or functionally dispensable regions may have little or no measurable effect despite larger physicochemical differences. Therefore, changes in polarity, hydropathy, and volume alone do not necessarily predict functional consequences.

      We agree that the mutational cost used in this study is a simplified measure and does not capture the full biological complexity of amino acid substitutions. As the reviewer pointed out, the functional consequence of a substitution depends strongly on its structural and functional context, including whether the affected residue is involved in catalysis, ligand binding, protein–protein interactions, regulatory motifs, folding, or structurally permissive regions.

      In this study, we used physicochemical-property-based mutational costs because this type of definition has been widely used in classical formulations of the error minimization theory. Our aim was therefore not to construct a comprehensive predictor of protein fitness effects, but to experimentally test whether the conventional theoretical cost metrics used to discuss genetic code optimality are reflected in the average mutation-induced decrease in reporter protein activity. We have now clarified this rationale in the revised manuscript.

      “It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question.”

      (3) It is not clear why they increased the concentration of the two tRNAs in near-SGC. Have they maintained the same tRNA concentrations in experiments explained in Fig 5 for all 10 genetic codes tested?

      We apologize that the rationale for increasing the concentrations of tRNA<sup>Val</sup><sub>CAC</sub> and tRNA<sup>Arg</sup><sub>CCU</sub> was not sufficiently clear in the original manuscript. As we wrote in the previous manuscript, “To improve translation efficiency with near-SGC, we focused on two tRNA concentrations (tRNA<sup>Val</sup><sub>CAC</sub> and tRNA<sup>Arg</sup><sub>CCU</sub>), which were suggested to have low activities in a previous study (Iwane et al., 2016),” we tested whether increasing their concentrations would improve translation efficiency. As shown in Figure 1–figure supplement 1, NanoLuc activity increased as the concentrations of these two tRNAs were raised and used at 100 ng/µL for tRNA<sup>Val</sup><sub>CAC</sub> and tRNA<sup>Arg</sup><sub>CCU</sub> in the optimized near-SGC, referred to as near-SGC (RV), and in all subsequent experiments. Additional anticodon-variant tRNAs required for each non-SGC were used at optimized concentrations determined from Figure 2–figure supplement 1. For each genetic code, the same tRNA composition and concentrations were used for the low- and high-mutation libraries (See Supplementary Table S7). To clarify this point, we added the sentence, “The increased concentrations of these two tRNAs were used in all the subsequent experiments,” in the corresponding part.

      Reviewer #3 (Public review):

      In this manuscript, Miyachi and Ichihashi investigate whether the arrangement of the genetic code affects mutational robustness. Using an in vitro minimal genetic code with vacant codons, they constructed 10 non-standard genetic codes by reassigning Ala, Ser, and Leu, generating codes with replacement costs that were generally higher than those of the standard genetic code across several amino acid property measures. They then tested how random mutations affected the activity of reporter proteins translated under these altered codes. Although error minimization theory predicts that higher-cost codes should make mutations more harmful, the authors report that protein function declined to a similar extent across all codes examined, suggesting that mutational robustness remains largely unchanged within the range of genetic code alterations tested here.

      Strengths:

      This is an interesting study that investigates one of the most fundamental and intriguing questions in molecular evolution: the emergence of the genetic code, which is nearly universal across nature. The in vitro approach is a powerful aspect of the work and provides an opportunity to examine this phenomenon experimentally at a depth that has previously been inaccessible.

      Weaknesses:

      However, the authors' use of random mutation libraries has certain limitations that prevent the study from realizing its full potential to uncover the mechanisms governing the molecular evolution of the genetic code.

      We sincerely thank the reviewer for the positive evaluation of our study and for recognizing the strength of the in vitro approach. We are encouraged that the reviewer considers this system a powerful way to experimentally address the emergence of the genetic code.

      We also appreciate the reviewer’s constructive comments regarding the limitations of random mutation libraries. We agree that pooled random libraries do not allow us to assign functional effects to individual mutations or to fully uncover the molecular mechanisms underlying mutational robustness. In the revised manuscript, we therefore clarify that our conclusions concern the library-averaged effects of random mutations on individual reporter protein activity, rather than the effects of specific mutations or cellular-level fitness. To address this limitation, we have added explanations of the scope and limitations of the present approach.

      (1) Statistical analyses are missing for several of the manuscript's main claims. This issue applies throughout the paper, including, but not limited to, Figures 1D, 2B, 4B-D, and 5B.

      We thank the reviewer for this important comment. We agree that statistical analyses are necessary to support the major claims of the manuscript. We have therefore added statistical analyses appropriate for the purpose and experimental design of each figure.

      For Fig. 1D, we performed one-way ANOVA followed by Tukey’s post hoc test on NanoLuc activity to compare translation efficiencies among the MGC, near-SGC, near-SGC (RV), and SGC conditions. This analysis showed a significant overall difference among conditions (one-way ANOVA, p < 0.0001). Tukey’s post hoc test showed that near-SGC was significantly lower than MGC, that near-SGC (RV) significantly improved near-SGC translation, and that near-SGC (RV) was not significantly different from MGC. In contrast, the 46-tRNA SGC remained significantly less efficient than near-SGC (RV). We have summarized the major comparisons in Supplementary Table S8.

      For Fig. 2B, we compared NanoLuc activity between the 21-code control and the corresponding 21+1-code condition for each codon reassignment using Welch’s t-test on luminescence. This analysis was added to statistically support whether each anticodon-variant tRNA increased NanoLuc translation from the corresponding reassigned template. The statistical results are summarized in Supplementary Table S9.

      For Fig. 4B–D, we converted mutation rates per base to estimated numbers of mutations per gene and performed Spearman’s rank correlation analysis to evaluate whether reporter activity decreased monotonically with increasing mutational load. This analysis showed strong negative monotonic trends between mutation rate (estimated mutation number) and reporter activity for all three reporters (ρ = −0.90 to −1.00), supporting that the random mutation libraries reduced protein activity in a mutation-load-dependent manner.

      For Fig. 5B, replicate-level data were available for GAL, and we therefore performed two-way ANOVA using genetic code and mutation level as factors. This analysis detected significant main effects of genetic code and mutation level, indicating that GAL activity differed among genetic codes and decreased in the high-mutation library. However, no significant interaction between genetic code and mutation level was detected, indicating that the magnitude of mutation-induced activity reduction was not strongly code-dependent under the conditions examined.

      Finally, because the central claim of Fig. 5C, 5E, and 5G is that mutational cost does not systematically predict mutation-induced activity loss, we performed Spearman’s rank correlation analysis between each mutational cost metric and the high-/low-mutation activity ratio. No significant correlations were detected for any reporter or cost metric (Spearman’s ρ = −0.23 to 0.25), supporting the conclusion that mutational cost did not show a detectable monotonic relationship with mutation-induced activity loss within the tested range.

      We have added these statistical analyses to the revised manuscript. The following sentences were added to the figure legends:

      Fig. 1

      “Statistical comparisons in (D) were performed using one-way ANOVA followed by Tukey’s post hoc test on NanoLuc activity; major comparisons are summarized in Table S8.”

      Fig. 2

      “For each template, NanoLuc activity in the 21-code and corresponding 21+1-code conditions was compared using Welch’s t-test on luminescence. Statistical results are summarized in Table S9.”

      Fig. 4

      “Spearman’s rank correlation coefficients were ρ = −0.90 for GAL, ρ = −1.00 for Luc, and ρ = −1.00 for mSG”

      Fig. 5

      “For GAL activity in (B), two-way ANOVA was performed using genetic code and mutation level as factors. Significant main effects of genetic code and mutation level were detected (both p < 0.0001), whereas their interaction was not significant. For (C), (E), and (G), Spearman’s rank correlation analysis was performed between each mutational cost metric and the high-/low-mutation activity ratio. Statistical details are summarized in Table S10.”

      (2) In Figure 2A, the authors modify the NanoLuc gene by reassigning Ala, Leu, or Ser to new codons and elegantly show that the in vitro availability of the corresponding tRNAs is important for protein function. However, the functional importance of the specific modified positions within NanoLuc is not clear. As a result, it is difficult to determine what the expected consequences of these codon changes should be, which in turn limits the interpretation of the observed changes in protein activity. To improve the interpretability of this experiment, the authors should report exactly how many codons were modified in each variant and, ideally, examine the effect of progressively increasing the number of reassigned codons.

      We agree that the exact positions and numbers of codon replacements should be clearly reported. In the revised manuscript, we have added a list of the modified amino acid positions. In brief, two Ala codons, three Ser codons, or four Leu codons were replaced with the target vacant codon; the modified positions were Ala16 and Ala120, Ser31, Ser49, and Ser150, and Leu32, Leu67, Leu144, and Leu170, respectively.

      We also agree that progressively increasing the number of reassigned codons would provide additional mechanistic insight. However, the purpose of Fig. 2 was to test whether each vacant codon could be decoded by the corresponding anticodon-variant tRNA to produce functional NanoLuc, rather than to analyze the positional contribution of each replacement. We previously performed such progressive codon replacement analysis for one reassigned codon, ACG, in a related study (Miyachi et al., 2025), and the results supported the same qualitative interpretation. Although we did not repeat this progressive analysis for all codons in the present study, we expect that the qualitative interpretation of Fig. 2 would not be substantially changed.

      We have revised the figure text to clarify the scope of the experiment and added the detailed codon replacement information.

      “(A) Schematic illustration of reassignment experiments. Translation with the original MGC and NanoLuc template is shown at the top for comparison. An example of Ala reassignment to the UUG codon is shown at the bottom. In this example, three Ala codons in the NanoLuc sequence were replaced with one type of vacant codon (e.g., UUG), generating a 21 + 1 (UUG-Ala) codon set. Similar reassignment experiments were performed for three amino acids (Ala, Ser, and Leu) and nine vacant codons. Specifically, two Ala codons (Ala16 and Ala120), three Ser codons (Ser31, Ser49, and Ser150), or four Leu codons (Leu32, Leu67, Leu144, and Leu170) were replaced.”

      (3) The calculations presented in Figure 3 raise an interesting conceptual question: why does the near-standard genetic code not exhibit the lowest cost? One possible explanation is that the standard genetic code evolved under multiple competing constraints and is therefore not expected to be optimal for any single cost metric, while still achieving strong overall performance. In this context, it would be informative if the authors combined the three cost measures into a single integrated index and examined whether the near-SGC performs more favorably when all three dimensions are considered together. Such an analysis could add important depth to the study.

      We agree that the near-SGC is not necessarily expected to minimize each individual cost metric, because the standard genetic code may reflect multiple competing physicochemical, translational, biosynthetic, and evolutionary constraints rather than optimization of a single property.

      To address this point, we added an integrated cost analysis combining the three physicochemical cost metrics, Cost<sub>PR</sub>, Cost<sub>MV</sub>, and Cost<sub>HI</sub>. Because these three metrics have different numerical scales, we normalized each metric before integration. We used two types of integrated indices.

      First, for each metric m 𝛜 {PR, MV, HI}, we calculated a min–max normalized cost,

      Where G denotes the set of 19,683 candidate non-SGCs generated by assigning Ala, Ser, or Leu to the nine vacant codon boxes. We then defined the integrated min–max cost as

      Second, we calculated a z-score-normalized cost for each metric,

      Where µ<sub>m,G</sub> and 𝜎<sub>m,G</sub> are the mean and standard deviation of Cost<sub>m<sub>norm</sub></sub> across the candidate non-SGCs. The integrated z-score cost was then defined as

      Using both integrated indices, the near-SGC ranked first when compared with all 19,683 candidate non-SGCs; in other words, no candidate non-SGC showed a lower integrated cost than the near-SGC. The integrated min–max cost of the near-SGC was 0.01525, whereas the lowest value among candidate non-SGCs was 0.12301. Similarly, the integrated z-score cost of the near-SGC was −2.47947, whereas the lowest candidate value was −1.90838.

      We have added this integrated cost analysis as Supplementary Figure 5–figure supplement 7. We have also revised the Discussion to note that the near-SGC does not necessarily minimize every individual physicochemical cost, but performs most favorably when PR, MV, and HI are considered comprehensively. This result is consistent with the idea that the standard genetic code may represent a compromise among multiple constraints rather than optimization of a single physicochemical property.

      “We consider that the cost ranges examined in this study represent substantial fractions, especially for MV and HI. Although the near-SGC did not necessarily exhibit the lowest cost for each individual physicochemical metric, this does not mean that it is unfavorable in the multidimensional cost space. Because the SGC may reflect a balance among multiple physicochemical constraints rather than optimization of a single property, we also calculated integrated cost indices by combining Cost_PR, Cost_MV, and Cost_HI after min–max normalization or z-score normalization. In both integrated indices, the near-SGC showed the lowest overall cost when compared with all 19,683 candidate non-SGCs (Figure 5–figure supplement 7), indicating that no candidate non-SGC exhibited a lower combined cost than the near-SGC when the three physicochemical properties were considered comprehensively.”

      (4) It is difficult to assess the consequences of the random mutations presented in Figure 4 on reporter gene function based solely on the reported "error rate/base" parameter. In particular, the x-axis in Figure 4B should be converted into the estimated number of mutations per gene. This would make the results more intuitive and would allow the reader to better evaluate the expected degree of disruption to protein function.

      We agree that the mutation rate per base alone does not provide an intuitive sense of the expected mutational burden for each reporter gene. We therefore added a second x-axis to Fig. 4B–D showing the estimated number of mutations per gene. This value was calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene.

      We retained the original mutation rate per base axis to preserve the direct link to the sequencing-based mutation rate measurement, while adding the estimated mutations per gene axis to improve interpretability. We have revised the figure and figure 4 legend accordingly.

      “The lower x-axis indicates the estimated number of mutations per gene, calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene.”

      (5) A central limitation of the random mutagenesis libraries used in Figure 5, which also underlie one of the manuscript's main claims, is that the exact mutations and their distribution across the reporter genes are not reported. In addition, protein activity is measured only at the level of the entire library, without directly linking individual mutations to their functional consequences. This substantially limits mechanistic interpretation. In my view, this issue can only be addressed convincingly if the authors test a set of defined variants carrying specific mutations and directly evaluate their functional effects.

      (6) Related to the previous point, in Figures 5C, 5E, and 5G, the authors present the ratio between low-mutation-rate and high-mutation-rate libraries. However, because each library contains a different collection of mutations, it is unclear what can be inferred from these comparisons. To overcome this limitation, the authors should assess the effects of altered genetic codes on specific, defined mutations rather than on heterogeneous mutation pools alone.

      (7) Along the same lines, in Figures 5C, 5E, and 5G, it is unclear why the effects of random mutations would be expected to correlate with the three calculated cost metrics, given that the positions, identities, and functional relevance of the mutations within the genes are not known. Without this information, the biological meaning of these correlations remains difficult to evaluate.

      We agree that using pooled random mutation libraries does not allow us to directly link individual mutations to their functional consequences. We also agree that testing defined variants carrying specific mutations would provide a more direct and mechanistic understanding of how each genetic code affects the functional impact of particular amino acid substitutions. However, the purpose of the present study was different from such a defined-variant analysis. Our aim was to experimentally test whether the conventional mutational cost metrics used in error minimization theory predict the average effect of random mutational loads on protein activity. Because these theoretical costs are themselves defined as average expected physicochemical effects over many possible single-nucleotide substitutions, we reasoned that pooled random mutation libraries provide an appropriate first experimental framework to evaluate whether such average-cost metrics are reflected in the average functional output of translated proteins.

      We agree that low- and high-mutation libraries do not contain identical sets of mutations. Therefore, the high-/low-mutation activity ratio should not be interpreted as the effect of the same individual variants before and after additional mutations. Rather, it represents the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool under each genetic code. We have revised the text to clarify this interpretation.

      We also agree that the positions, identities, and functional relevance of individual mutations are not resolved in this pooled assay. This limitation prevents us from assigning mechanistic effects to specific substitutions. At the same time, using a small set of defined variants would introduce its own selection bias, because the conclusions could strongly depend on which mutations and which protein positions were chosen. Therefore, we consider the random-library approach to be a useful first step for testing library-averaged effects, whereas systematically defined variant analysis or genotype-resolved activity assays will be necessary to reveal mutation-specific mechanisms in future studies.

      In response to the reviewer’s concern, we have revised the Discussion to explicitly limit our conclusion to library-averaged effects on individual reporter protein activity. We now state that this approach does not identify the functional effects of individual mutations and that future studies using defined variants or high-throughput genotype–phenotype mapping will be required to determine how specific substitutions contribute to genetic code-dependent mutational robustness.

      Result

      “To estimate the average activity reduction associated with increased mutational burden under each genetic code, we calculated the ratio of activity obtained from the high-mutation library to that from the corresponding low-mutation library and plotted this ratio against each of the three mutational costs (Fig. 5C).”

      Discussion

      “A further limitation of this study is that the reporter activities were measured at the level of pooled random mutation libraries. Therefore, the high-/low-mutation activity ratio used in this study should be interpreted as the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool, rather than as the effect of identical variants before and after additional mutations. This library-averaged approach was chosen because the mutational costs considered here are also defined as average expected physicochemical effects over many possible single-nucleotide substitutions. In addition, because the non-SGCs constructed in this study were generated by reassigning only Ala, Ser, and Leu, the detectable effects may depend on how frequently mutations involving these amino acids occur in each reporter gene and whether the affected positions are functionally important. If genetic code dependent effects are restricted to a small subset of deleterious variants, such effects may be masked in pooled activity measurements. Future studies using defined variants or high-throughput genotype–phenotype mapping assays will be required to determine the mutation-specific and position-specific mechanisms underlying genetic code dependent effects on protein function (Rozhoňová et al., 2024).”

      (8) For each mutagenesis library, the number of variants, the average number of mutations per variant, and the distribution of mutation positions should be reported clearly and transparently. These details are important for evaluating the strength of the conclusions.

      We agree that a more transparent characterization of the random mutagenesis libraries is necessary for evaluating the strength and limitations of our conclusions.

      In the revised manuscript, we have added the estimated number of mutations per gene to the Results section. This value was calculated by multiplying the mutation rate per base by the coding sequence length of each reporter gene. For the high-mutation libraries used in Fig. 5, the estimated numbers of mutations per gene were approximately 8.0 for GAL, 4.5 for Luc, and 3.3 for mSG. We also added position-wise mutation profiles along each reporter gene (Figure 4–figure supplement 2), in addition to the heatmap shown in the original manuscript. These analyses clarify the mutational burden of each library and show that mutations were broadly distributed across the analyzed regions (approximately 300 nt in the middle of each gene) of the reporter genes.

      Regarding the number of variants, the translation reactions were performed using 5 nM DNA template in a 5 µL reaction, corresponding to approximately 1.5 × 10<sup>10</sup> DNA molecules. However, this value represents the total number of DNA molecules introduced into the reaction and does not directly indicate the number of unique full-length sequence variants, because multiple molecules can share the same genotype, and our sequencing analysis was designed to quantify mutation frequencies and positional distributions rather than to reconstruct full-length genotypes of individual library members. Therefore, we do not infer the exact number of unique variants in each library. Instead, we report the average mutation burden and position-wise non-reference rate distributions.

      We have revised the Results and added Supplementary Figure 4–figure supplement 2 accordingly.

      “For this experiment, two random mutation libraries were used: a low-mutation library prepared using the high-fidelity polymerase and a high-mutation library prepared using Taq DNA polymerase at a Mn<sup>2+</sup> concentration that yields mutation rates of 0.002 – 0.005 per base (0.0026 for GAL, 0.0027 for Luc, and 0.0048 for mSG, corresponding to approximately 8.0, 4.5, and 3.3 mutations per gene). We also plotted position-wise non-reference rates along the analyzed regions of each reporter gene, confirming that mutations were broadly distributed across the amplicons (Figure 4–figure supplement 2).”

      (9) Because only three amino acids were manipulated in the non-standard genetic codes, it remains unclear whether these particular amino acids occupy positions in the reporter proteins that are especially important for function and therefore likely to generate strong phenotypic effects. More broadly, it is not clear whether the assay is sufficiently sensitive to detect the effects of only a subset of deleterious variants within a pooled library. This point should be addressed more explicitly.

      We agree that this is an important limitation of the present study. Because our non-SGCs were constructed by reassigning only Ala, Ser, and Leu, the mutation-dependent effects that can differ among genetic codes are limited to mutations involving these reassigned codons or amino acid substitutions affected by these assignments. Therefore, the sensitivity of the assay depends on how frequently such substitutions occur in the reporter genes and whether the affected Ala, Ser, and Leu-related positions are functionally important.

      We have revised the Discussion to address this point more explicitly. In the revised manuscript, we now state that the absence of a detectable cost-dependent effect may reflect not only the limited cost range examined, but also the limited set of reassigned amino acids, the position-dependent importance of Ala/Ser/Leu residues in the reporter proteins, and the sensitivity limit of pooled activity measurements. We further note that future studies using genotype-resolved activity assays (defined variants) will be required to determine whether specific amino acid substitutions or specific protein positions exhibit stronger genetic code-dependent effects.

      “A further limitation of this study is that the reporter activities were measured at the level of pooled random mutation libraries. Therefore, the high-/low-mutation activity ratio used in this study should be interpreted as the relative reduction in average activity caused by increasing the mutational burden in a heterogeneous mutation pool, rather than as the effect of identical variants before and after additional mutations. This library-averaged approach was chosen because the mutational costs considered here are also defined as average expected physicochemical effects over many possible single-nucleotide substitutions. In addition, because the non-SGCs constructed in this study were generated by reassigning only Ala, Ser, and Leu, the detectable effects may depend on how frequently mutations involving these amino acids occur in each reporter gene and whether the affected positions are functionally important. If genetic code-dependent effects are restricted to a small subset of deleterious variants, such effects may be masked in pooled activity measurements. Future studies using defined variants or high-throughput genotype–phenotype mapping assays will be required to determine the mutation-specific and position-specific mechanisms underlying genetic code-dependent effects on protein function (Rozhoňová et al., 2024).”

      Recommendations for the authors:

      Reviewing Editor Comments:

      While we suggest that you address all the technical points raised by the reviewers, you may specifically want to limit the conclusion of the study to mutational robustness at the level of individual protein activity, rather than making broader generalizations. Also, the statistical analysis needs to be strengthened, as indicated in the reviews.

      We thank the Reviewing Editor for these important suggestions. We agree that the conclusion of the original manuscript was broader than what can be directly supported by the present experiments. In the revised manuscript, we have therefore limited our conclusion to mutational robustness at the level of individual reporter protein activity measured in a reconstituted in vitro translation system. We now explicitly state that our results do not directly address robustness at the level of cellular fitness, protein interaction networks, or long-term evolution.

      We have also strengthened the statistical analyses throughout the manuscript. Specifically, we added one-way ANOVA followed by Tukey’s post hoc test for Fig. 1D, Welch’s t-tests for Fig. 2B, Spearman’s rank correlation analyses for Fig. 4B–D and Fig. 5C/E/G, and two-way ANOVA for GAL activity in Fig. 5B. These analyses have been incorporated into the revised Results, figure legends, and supplementary information.

      Reviewer #2 (Recommendations for the authors):

      (1) Discuss other alternative hypotheses if the error minimization theory is unlikely.

      We thank the reviewer for this helpful suggestion. We think that the absence of a detectable relationship between mutational cost and reporter protein activity in our assay should not be interpreted as excluding all possible roles of error minimization in the evolution of the genetic code. Our results specifically address one aspect of the error minimization theory: whether physicochemical-property-based mutational cost predicts the average effect of random point mutations on individual reporter protein activity within the experimentally accessible range of non-SGCs tested here.

      In the revised Discussion, we have clarified that the organization of the SGC may have been shaped by multiple factors, including robustness to translational errors, historical constraints associated with genetic code expansion, biosynthetic or coevolutionary processes, stereochemical interactions, and the evolvability of proteins. Our results suggest that the contribution of mutational robustness at the level of individual protein activity may be limited within the range examined here, but they do not exclude the possibility that the SGC provides advantages under other forms of error, at the level of translation fidelity, cellular fitness, or long-term evolution.

      We have added a short discussion to clarify this point without expanding the scope of the manuscript beyond the present experimental results.

      “It should be noted that this conclusion is limited to the activity of individual reporter proteins translated in a reconstituted in vitro system. Therefore, whether similar trends would be observed at the level of cellular fitness or long-term evolution remains an open question. Moreover, our results do not exclude other possible roles of SGC organization. The SGC may have been shaped by multiple factors, including robustness to translational errors, historical constraints during genetic code expansion, biosynthetic or coevolutionary relationships among amino acids, stereochemical interactions, and effects on protein evolvability (Katoh and Suga, 2023; Koonin and Novozhilov, 2017, 2009; Novozhilov et al., 2007; Wong, 2005).”

      (2) A brief description of the PURE translation system can be provided for people from outside the field.

      We have added a brief description of the PURE system in the Introduction to make the experimental platform more accessible to readers outside the field. Specifically, we now explain that the PURE system is a reconstituted cell-free translation system composed of purified translation factors, ribosomes, aminoacyl-tRNA synthetases, tRNAs, amino acids, and energy-regeneration components. We also clarify that, in this study, we used a tRNA-free version of the PURE system, in which defined synthetic tRNA sets were supplied externally to reconstruct each genetic code.

      Introduction

      “A representative platform for such reconstitution is the PURE system (Shimizu et al., 2001), a reconstituted cell-free translation system composed of purified translation components, including ribosomes, translation factors, aaRSs, amino acids, and energy-regeneration components. In particular, a tRNA-free PURE system (Miyachi et al., 2022), in which endogenous tRNA activity is minimized and defined tRNA sets are supplied externally, enables genetic codes to be reconstructed by controlling the supplied tRNAs.”

      (3) Figure 5D and F - Technical replicates are provided only for GAL. A similar approach should be taken for LUC and mSG.

      We agree that replicate-level measurements for Luc and mSG would further improve reliability. However, repeating the full translation experiments for these reporters was not feasible in the current revision, as each experiment requires large amounts of freshly prepared tRNA-free PURE system and multiple defined tRNA mixtures for every genetic code variant tested. Given these material and technical constraints, we were unable to perform additional biological replicates within the scope of this revision. We would like to emphasize, however, that the GAL replicates shown in Fig. 5D and F are fully consistent across independent experiments, providing direct evidence for the reproducibility of the assay itself. Furthermore, the key metric in our analysis, the activity ratio between high- and low-mutation groups within each genetic code, is an internally normalized measure that is inherently less sensitive to between-experiment variability than absolute activity values. The correlation analyses further showed no significant relationship between mutational cost and this ratio across all three reporters, and this conclusion is consistent regardless of which reporter is examined. Together, we believe these results provide a robust basis for the conclusions drawn, even in the absence of full replication for Luc and mSG.

      (4) Provide statistical analysis wherever it is relevant (e.g, to support a lack of correlation).

      We have strengthened the statistical analyses throughout the revised manuscript. In particular, to support the lack of detectable correlation between mutational cost and mutation-induced activity loss, we performed Spearman’s rank correlation analyses between each mutational cost metric and the high-/low-mutation activity ratio for all three reporters. No significant correlations were detected for any reporter or cost metric. In addition, we added statistical analyses for other relevant figures, including one-way ANOVA followed by Tukey’s post hoc test for Fig. 1D, Welch’s t-tests for Fig. 2B, Spearman’s rank correlation analyses for Fig. 4B–D, and two-way ANOVA for GAL activity in Fig. 5B.

      Reviewer #3 (Recommendations for the authors):

      (1) In line 122, the phrase "as evenly as possible" is ambiguous and should be explained more precisely.

      We thank the reviewer for pointing this out. We have revised the phrase “as evenly as possible” to describe the codon design more precisely. Specifically, we now state that the NanoLuc coding sequences were designed so that the codons available in each genetic code were used with minimal differences in codon counts, while preserving the amino acid sequence of NanoLuc.

      “For near-SGC and SGC, the NanoLuc coding sequences were designed so that the codons available in each genetic code were used with minimal differences in codon counts, while preserving the amino acid sequence (Fig. 1B, 32 codons and 46 codons).”

      (2) For Figure 1D, a Western blot or another protein gel-based assay would be helpful to exclude the possibility that the observed differences arise from variation in translation efficiency rather than differences in protein activity.

      We agree that a protein gel-based assay such as Western blotting would in principle allow us to distinguish differences in translated protein amount from differences in specific activity, and we understand why such data would be informative. However, we would like to clarify that the primary purpose of Fig. 1D was to evaluate the overall functional translation output of each reconstructed genetic code, rather than to determine the mechanistic basis of any observed differences. In this context, NanoLuc luminescence serves as an integrated readout of the entire translation process, encompassing both translational efficiency and protein folding/activity. Crucially, regardless of whether the observed differences in NanoLuc luminescence reflect lower protein yield, reduced specific activity, or a combination of both, the conclusion of Fig. 1D remains the same. Although we did not perform Western blotting in this study, we believe that such an analysis would not change this interpretation and that the current data are sufficient to support this conclusion.

      (3) The number 3^9 is not immediately intuitive. It would be helpful if the authors also stated that this corresponds to approximately 20,000 possible non-standard genetic codes.

      We have revised the text to state both the exact number and the approximate value: 3<sup>9</sup> = 19,683, approximately 20,000 possible non-standard genetic codes.

      (4) The rationale for using the three cost parameters (PR, MV, and HI) should be explained in greater detail. Because these parameters are central to the manuscript, a citation alone is not sufficient. A concise explanation of their biological relevance would improve the clarity and accessibility of the study.

      We agree that the biological relevance of the three cost parameters should be explained more clearly. In the revised manuscript, we have added a concise explanation of why polar requirement (PR), molecular volume (MV), and hydropathy index (HI) were used.

      These parameters were selected because they have been widely used in theoretical studies of genetic code optimality and represent distinct physicochemical aspects of amino acid substitutions. PR reflects polarity-related interactions and has been a classical metric in error minimization analyses of the genetic code. MV represents side-chain size and steric volume, which could influence packing and structural stability in proteins. HI reflects hydrophobicity, which is closely related to protein folding and hydrophobic core formation. We have also clarified that these metrics are simplified descriptors and do not capture residue-specific structural or functional context, which we now discuss as a limitation of the study.

      “PR reflects polarity-related interactions of amino acids and has been used as a classical measure of amino acid similarity in error minimization analyses. MV represents side-chain size and steric volume, which could affect protein packing and structural stability, whereas HI reflects hydrophobicity, which could be closely related to protein folding or hydrophobic core formation.”

      (5) In Figure 3, the experimental framework would be easier to follow if the authors included a schematic and data for one representative non-SGC, explicitly illustrating how it differs from the near-SGC with respect to each of the three cost measures.

      We agree that showing one representative non-SGC would make the experimental framework and cost calculation more intuitive.

      In the revised manuscript, we added a new panel to Fig. 3 comparing the near-SGC with a representative non-SGC. We selected the PR<sub>max</sub> code as the representative example because it clearly illustrates how reassignment of vacant codon boxes can increase one mutational cost metric relative to the near-SGC. In this panel, we first show the codon assignment schemes of the near-SGC and PR<sub>max</sub> code in the same genetic-code format used in Fig. 1. We then show the corresponding heatmap representations for the three physicochemical properties used in the cost calculation: polar requirement, molecular volume, and hydropathy index. The Cost<sub>PR</sub>, Cost<sub>MV</sub>, and Cost<sub>HI</sub> values are shown for each code.

      This new panel illustrates how changes in codon assignment are translated into different physicochemical cost landscapes and clarifies how the representative non-SGC differs from the near-SGC with respect to each of the three cost measures.

      “To make the design of non-SGCs more explicit, we show one representative non-SGC together with the near-SGC in Fig. 3B. This comparison illustrates how assignment of Ala, Ser, or Leu to the vacant codon boxes changes the three mutational cost metrics, Cost<sub>PR</sub>, Cost<sub>MV</sub>, and Cost<sub>HI</sub>.”

      (6) In line 329, the phrase "similar pattern" is ambiguous and should be explained more explicitly.

      We have revised the ambiguous phrase “similar pattern” to describe the observation more explicitly. Specifically, we now state that the relative differences in GAL activity among genetic codes observed in the low-mutation library were broadly retained in the high-mutation library, although overall activity decreased.

      “For the high-mutation library, GAL activity decreased overall, while the relative differences in activity among genetic codes observed in the low-mutation library were broadly retained.”

      (7) Figure S7 appears to be an important control for the experiments shown in Figure 5, and I recommend moving it to the main figures.

      We thank the reviewer for this helpful suggestion. We agree that the HiBiT-based quantification of GAL protein amount is an important control for interpreting the GAL activity measurements in Fig. 5, and we appreciate the recommendation to increase its visibility. This analysis shows that the amount of C-terminally completed GAL products was broadly comparable among genetic codes, indicating that the large differences in GAL activity were not primarily attributable to differences in total translated protein amount.

      After careful consideration, we have opted to retain this analysis in the supplementary figures because the main focus of Fig. 5 is the relationship between mutational cost and mutation-induced activity loss, quantified by the high-/low-mutation activity ratio. The HiBiT experiment addresses a related but distinct question: whether differences in absolute GAL activity among genetic codes can be explained by differences in protein abundance, and we felt that including it in the main figures might shift the emphasis away from the central message of Fig. 5. Nevertheless, we have added a clear reference to Figure 4–figure supplement 1 in the main text and the figure legend to ensure that readers are directed to this control when interpreting Fig. 5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their article, Guo and coworkers investigate the Ca²⁺ signaling responses induced by Enteropathogenic Escherichia coli (EPEC) in epithelial cells and how these responses regulate NF-κB activation. The authors show that EPEC induces rapid, spatially coordinated Ca²⁺ transients mediated by extracellular ATP released through the type III secretion system (T3SS). Using high-speed Ca²⁺ imaging and stochastic modeling, they propose that low ATP levels trigger "Coordinated Ca²⁺ Responses from IP₃R Clusters" (CCRICs) via fast Ca²⁺ diffusion and Ca²⁺-induced Ca²⁺ release. These responses may dampen TNF-α-induced NF-κB activation through Ca²⁺-dependent modulation of O-GlcNAcylation of p65. The interdisciplinary work suggests a new perspective on calcium-mediated immune response by combining quantitative imaging, bacterial genetics, and computational modeling.

      Strengths:

      The study provides a new concept for host responses to bacterial infections and introduces the concept of Coordinated Ca²⁺ Responses from IP₃R Clusters (CCRICs) as synchronized, whole-cell-scale Ca²⁺ transients with the fast kinetics typical of local events. This is elegantly done by an interdisciplinary approach using quantitative measurements and mechanistic modelling.

      Weaknesses:

      (1) The effect of coordination by fast diffusion for small eATP concentrations is explained by the resulting low Ca2+ concentration that is not as strongly affected by calcium buffers compared to higher concentrations. While I agree with this statement on the relative level, CICR is based on the resulting absolute concentration at neighboring IP3Rs (to activate them). Thus, I do not fully agree with the explanation, or at least would expect to use the modelling approach to demonstrate this effect. Simulations for different activation and buffer concentrations could strengthen this point and exclude potential inhibition of channels at higher stimulation levels.

      We fully agree that CICR is determined by the local Ca<sup>2+</sup> concentration at each IP<sub>3</sub>R cluster, not by a global cytosolic average. In our stochastic model, IP<sub>3</sub> R clusters are represented as phenomenological entities at discrete spatial sites. Each cluster senses the local Ca<sup>2+</sup> concentration at its position, and its stochastic gating depends only on this local [Ca<sup>2+</sup>] and on [IP3]. Buffers are not included explicitly. Instead, we use an effective Ca2+ diffusion coefficient Deff, which accounts for the effect of endogenous Ca<sup>2+</sup> buffers. To reproduce the coordinated low-amplitude Ca<sup>2+</sup> responses observed experimentally, we found that we had to use Deff = 100 µm<sup>2</sup>/s. In the supplementary analysis, we show that an effective diffusion coefficient of this order is indeed plausible for a realistic mixture of mobile and immobile Ca<sup>2+</sup> buffers (Supplementary Note 2. Figure 1).

      In the revised manuscript, we now provide a supplementary analysis (Supplementary Note 2) to justify this choice. Using an equation to compute the effective diffusion coefficient considering a plausible mixture of mobile and immobile buffers and an explicit reaction–diffusion model, we show that:

      - The effective diffusion coefficient of Ca<sup>2+</sup> becomes Ca<sup>2+</sup> dependent, and

      - There exists a regime in which low-amplitude Ca<sup>2+</sup> elevations are characterized by an effective diffusion coefficient of Deff = 100 µm<sup>2</sup>/s and a larger spatial extent than higher-amplitude transients (Supplementary Note 2. Figure 1).

      Thus, the value of Deff used in the cluster model is quantitatively consistent with classical buffering theory and with plausible cytosolic buffer mixtures. This provides a mechanistic basis for the observation that small-amplitude, short-lived events can nevertheless produce coordinated signals with large spatial extent and, occasionally, almost immediate activation of IP<sub>3</sub>R clusters at distant locations in both simulations and experiments.

      In this respect, I would also include the details of the modelling, such as implementation environment, parameters, and benchmarking. The description in the Supplementary Methods is very similar to the description in the main text. In terms of reproducibility, it would be important to at least provide simulation parameters, and providing the code would align with the emerging standards for reproducible science.

      We apologize for the lack of details of the modelling in the previous submission. In this revised version, we are providing with a full description of the model in the Supplementary Information, Note1.

      To address the reviewer’s request for simulations at different activation levels, we now show an additional simulation in which [IP<sub>3</sub>] is higher (0.1 µM, constant in time and space) and Deff is set to 40 µm<sup>2</sup>/s (Supplementary Note 3). This lower effective diffusion coefficient is consistent with the stronger buffering and reduced Ca<sup>2+</sup> mobility expected for higher-amplitude signals. In this case, the same phenomenological cluster model generates a global Ca<sup>2+</sup> response with larger amplitude and longer duration, rather than a loss of activity due to excessive inhibition ((Supplementary Note 3, Figure 1, left panel). The Supplementary Note 3. Figure 1, right panel shows the 2D cell geometry, where dots indicate the random positions of IP<sub>3</sub>R clusters whose behavior is described by our phenomenological cluster model.

      (2) Quantitative characterization of CCRICs:

      The paper would benefit from a clearer definition of the term CCRICs and quantitative descriptors like duration, amplitude distribution, frequency, and spatial extent (also in relation to the comment on the EGTA measurements below). Furthermore, it remains unclear to me whether CCRICs represent a population of rapidly propagating micro-waves or truly simultaneous events. Maybe kymographs or wave-front propagation analyses (at least from simulations if experimental resolution is too bad) would strengthen this point.

      We agree and completed the description of the CCRICs by adding:

      In the Results section, p. 8, l. 27:

      “…with a duration of 2.1 ± 1.0 sec (mean ± SEM) (N = 4, 128 responses)”. p. 9, l. 13:

      “In rare instances (less than 3%), typical local “Puff” responses elicited by these ATP concentrations could also be detected often occurring at the cell periphery (Figs. 4B, red region and 4C, red arrow; Fig. S6D, blue trace) (N > 20, cells > 500). As expected from the small concentrations of Ca<sup>2+</sup> released at puff sites, no increase in cytosolic Ca<sup>2+</sup> was detected in a distal cell region (Fig. S6D, top), indicating that isotropic Ca<sup>2+</sup> diffusion from a puff release site cannot account for Ca<sup>2+</sup> increase over large cell area. Puffs could also be detected concomitantly with CCRICs in different ROIs of the same cell (Fig. S6D, bottom). In contrast to puffs, CCRICs often showed responses of comparable amplitude in distal regions over the whole cell (Figs. 4C and S6A, B), suggesting the contribution from IP<sub>3</sub>R cluster activation by Ca<sup>2+</sup>-Induced Ca<sup>2+</sup> Release (CICR). Within a given cell, the vast majority of CCRICs appeared quasi-synchronized at the fatest acquisition rate of 22 ms / frame that we could achieve. However, in few instances a delay could be detected in the elicitation of a peak in distant region of a cell (Fig. S6C). These observations suggest that the quasi-synchronization of CCRICs result from the fast diffusion of Ca<sup>2+</sup> leading to the activation of IP<sub>3</sub>R clusters over large cell area, which may be delayed in a some instances. Scrutinizing of CCRICs showed that while their profiles were comparable, the amplitude of these responses varied in different regions of the cell, with often a single 1 µm<sup>2</sup> region, likely corresponding the initial firing cluster, showing a prominent amplitude and other regions with smaller amplitude for a given response (Figs. 4B and 4C). For example, in Fig. 4C, the highest amplitude is observed in the red region for peaks 1 and 3, whereas it is observed and in the purple region for peak 2. Thus, for a given CCRIC, the respective contribution of local IP<sub>3</sub>R cluster activation and isotropic diffusion of Ca<sup>2+</sup>from other release sites in Ca<sup>2+</sup> increase may vary in different regions of the cell”.

      In the Discussion section, 2nd sentence p. 12:

      “CCRICs showed rapid kinetics with an average duration of ca 2.1 seconds and amplitude corresponding to an increase in Ca<sup>2+</sup> cytosolic concentration of a few hundreds nM, seemingly smaller than that of puffs (Fig. S6D), often occurring repeatedly with a frequency of up to 12 CCRICs / min over the whole cell.”

      We have tried to clarify the notion of coordination versus synchronization of CCRICs by showing the delay observed in some instances in the elicitation of CCRICs at distal regions of the cell, now illustrated shown in Fig S6C.

      (3) Specificity of pharmacological tools:

      Suramin and U73122 are known to have off-target effects. Control experiments using alternative P2 receptor antagonists like PPADS or inactive U73343 analogs would strengthen the causal link.

      As suggested by the referee, we have performed complementary experiments showing the inhibitory effects of PPADS and absence of effects of U73343 on EPEC-induced Ca2+ responses including CCRICs now shown in the amended Fig. S2.

      Reviewer #2 (Public review):

      Summary:

      The authors of this study are trying to resolve how cellular infection by enteropathogenic E. coli (EPEC) subverts cellular signaling pathways to promote infection and dampen immune responses. Specifically, alteration in calcium dynamics has been evidenced in the prior literature as a potential initiator of these adaptations, and this study provides ideas and mechanistic detail as to how cellular calcium dynamics may be subverted by pathogens.

      Strengths:

      The clear strengths of this paper relate to the new ideas inherent in the proposed hypothesis and their support from the experimental approaches used. Overall, the proposed work provides new ideas in this area, which will benefit from further investigation. Certainly, this is an interesting and challenging paradigm to pick apart mechanistically, and is important for improving treatments from intestinal infections.

      Weaknesses:

      Additional insight is needed in three specific areas to convincingly support the conclusions drawn by the authors. These three areas are: first, a better description of the infection-associated calcium signals. Second, a mechanistic definition of the relevant purinoceptors versus other pathways to increase cellular calcium. Third, an effort to show that the proposed pathways have relevance in a polarized epithelial cell.

      (1) first, a better description of the infection-associated calcium signals.

      We agree and have added a more detailed description of the CCRICs in the results and discussion section, as detailed in response to referee 1, Weakness 2 by adding:

      In the Results section, p. 8, l. 27:

      “…with a duration of 2.1 ± 1.0 sec (mean ± SEM) (N = 4, 128 responses)”. p. 9, l. 13:

      “In rare instances (less than 3%), typical local “Puff” responses elicited by these ATP concentrations could also be detected often occurring at the cell periphery (Figs. 4B, red region and 4C, red arrow; Fig. S6D, blue trace) (N > 20, cells > 500). As expected from the small concentrations of Ca<sup>2+</sup> released at puff sites, no increase in cytosolic Ca<sup>2+</sup> was detected in a distal cell region (Fig. S6D, top), indicating that isotropic Ca<sup>2+</sup> diffusion from a puff release site cannot account for Ca<sup>2+</sup> increase over large cell area. Puffs could also be detected concomitantly with CCRICs in different ROIs of the same cell (Fig. S6D, bottom). In contrast to puffs, CCRICs often showed responses of comparable amplitude in distal regions over the whole cell (Figs. 4C and S6A, B), suggesting the contribution from IP<sub>3</sub>R cluster activation by Ca<sup>2+</sup>-Induced Ca<sup>2+</sup> Release (CICR). Within a given cell, the vast majority of CCRICs appeared quasi-synchronized at the fatest acquisition rate of 22 ms / frame that we could achieve. However, in few instances a delay could be detected in the elicitation of a peak in distant region of a cell (Fig. S6C). These observations suggest that the quasi-synchronization of CCRICs result from the fast diffusion of Ca<sup>2+</sup> leading to the activation of IP<sub>3</sub>R clusters over large cell area, which may be delayed in a some instances. Scrutinizing of CCRICs showed that while their profiles were comparable, the amplitude of these responses varied in different regions of the cell, with often a single 1 µm<sup>2</sup> region, likely corresponding the initial firing cluster, showing a prominent amplitude and other regions with smaller amplitude for a given response (Figs. 4B and 4C). For example, in Fig. 4C, the highest amplitude is observed in the red region for peaks 1 and 3, whereas it is observed and in the purple region for peak 2. Thus, for a given CCRIC, the respective contribution of local IP<sub>3</sub>R cluster activation and isotropic diffusion of Ca<sup>2+</sup> from other release sites in Ca<sup>2+</sup> increase may vary in different regions of the cell” In the Discussion section, 2nd sentence p. 12:

      “CCRICs showed rapid kinetics with an average duration of ca 2.1 seconds and amplitude corresponding to an increase in Ca<sup>2+</sup> cytosolic concentration of a few hundreds nM, seemingly smaller than that of puffs (Fig. S6D), often occurring repeatedly with a frequency of up to 12 CCRICs / min over the whole cell.”

      We have tried to clarify the notion of coordination versus synchronization of CCRICs by showing the delay observed in some instances in the elicitation of CCRICs at distal regions of the cell, now illustrated shown in Fig S6C.

      CRICCs are observed over the whole cell or very large cell area. We agree that this point as well as comparison with previously described puffs needed clarification. We have added the following sentences in the discussion and inserted the seminal Thomas et al. 1999 citation in the references, p. 13, l. 18:

      “Consistently, while CRICCs were detected in the vast majority of cells at these very low agonist concentrations, in rare instances, local “puff-like” responses were also detected at the cell periphery. These observations are in contrast to previously described Ca<sup>2+</sup> puffs preceding global responses reported to occur preferentially in perinuclear area (Thomas et aL., 1999). These earlier studies, however, involved higher agonist concentrations (1-5 µM ATP) expected to lead to the release of higher IP<sub>3</sub> concentrations, which may preferentially stimulate larger IP<sub>3</sub>R clusters at the perinuclear region because of the higher density of IP<sub>3</sub> Rs. In addition, larger IP<sub>3</sub> clusters may release higher amounts of Ca<sup>2+</sup> for which, as opposed to CCRICs, diffusion would be restrained by Ca<sup>2+</sup> buffers thereby favoring the spatial confinement of the response. “

      (2) Second, a mechanistic definition of the relevant purinoceptors versus other pathways to increase cellular calcium

      We do not believe that CCRICs are specific to EPEC, since they are also elicited by low agonist concentrations. The discrete action of Type III translocons leading to the release of small amounts of extracellular ATP at the onset of EPEC prompted us to perform fast Ca<sup>2+</sup> imaging at low agonists concentrations (150 nM ATP, 100 nM histamine now shown in Fig. S4), which to our knowledge, differ from higher agonist concentrations used in all previous studies describing puffs. Our modelling studies support the notion that CCRICs correspond to generic Ca<sup>2+</sup> release-dependent responses triggered by low levels of IP3.

      We now show inhibition of CCRICs by PPADS, another purinergic receptor antagonist, and extracellular ATP depletion by addition of hexokinase in the extracellular medium in Figs. S4 and S7.

      Knocking down ATP receptors represents a challenging task since HeLa cells were shown to express transcripts for most of the described 8 P2Xs and 7 P2Ys purinergic receptors (10.1016/j.bbamem.2009.03.006). Mostly, we do not believe that CCRICs are triggered by a specific ATP receptor and do not expect to see inhibition of CCRICs in single knock-down experiments. Our experimental and modelling studies suggest that CCRICs are not specific to EPEC nor to a particular ATP receptor, but instead correspond instead to generic Ca<sup>2+</sup> elicited at low agonist concentrations such as ATP or histamine.

      Zhong et al., 2020 indeed previously showed a role for Ca<sup>2+</sup> influx mediated by the TRPV2 receptor in EPEC-mediated cell death. However, this influx occurred following 8 hours of cell infection with EPEC. We do not detect significant cell death or Ca<sup>2+</sup> influx at the onset of infection corresponding to the 12 hours infection kinetics that we used. Our experiments indicate that CCRICs do not involve Ca<sup>2+</sup> influx.

      (3) Third, an effort to show that the proposed pathways have relevance in a polarized epithelial cell.

      We agree and have performed complementary experiments showing induction of CCRICs by EPEC and eATP in polarized intestinal epithelial cells, now shown in Figure S8.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Statistical treatment and data presentation:

      Some figure legends lack clarity on replicates (n = cells vs N = independent experiments). Timecourse quantifications of p-IκB and p-p65 should include normalized fold-change plots with clear statistical tests.

      To clarify, we replaced “n” by “cells”. The number of determinations and independent experiments (N) has been added in the legends to all relevant Figures and Supplementary Figures.

      As requested, we now show the p-IκB and p-p65 plots as plots normalized to basal p-IκB and p-p65 levels. We mentioned in legend to Fig. 6 that we used an ANCOVA test showing significance of the effects of eATP on TNF-∝-induced IκB- and p65 phosphorylation.

      (2) Clarification on the temperature used in imaging (why measured at 35{degree sign} C)?

      We have added the following clarification in the Materials and Methods section p. 14, l. 21:

      “Imaging was then carried out at 35°C to allow for bacterial type III secretion, …”

      (3) Figure 4A:

      The image shows a lower image acquisition interval than every 2s that is stated in the caption.

      We apologize for the mistake. The legend to Fig. 4A now reads:

      “Image acquisition every 52 ms (A)…”

      (4) Figure 4B:

      The color of ROIs could be more intense for better identification.

      We have replaced the colors of blue and green ROIs, by light cyan and purple ROIs

      (5) Figure 4c:

      I don't understand the meaning of the dashed lines described by "The dashed red and green lines point at the aggregation of responses throughout the cell" in the caption or in the text.

      We apologize for the lack of clarity and have re-written the corresponding text p. 9, l.25 as follows:

      “Scrutinization of CCRICs showed that while their profiles were comparable, the amplitude of these responses varied in different regions of the cell, with often a ca 3 µm<sup>2</sup> single region, likely corresponding to a source point release, showing a prominent amplitude and other regions with smaller amplitude for a given response (Figs. 4B and 4C). For example, in Fig. 4C, the highest amplitude is observed in the red region for peaks 1 and 3, whereas it is observed and in the purple region for peak 2. Thus, for a given CCRIC, the respective contribution of local IP3R cluster activation and isotropic diffusion of Ca<sup>2+</sup> from other release sites in Ca<sup>2+</sup> increase may vary in different regions of the cell.”

      (6) Figure S4A:

      The responses for EGTA are not really pointed out. Are the traces meant to show events?

      We have added arrowheads in traces corresponding to ATP + EGTA-AM treatment pointing at “flattened Ca<sup>2+</sup> responses”. The Legend to Fig. S4A now includes the sentence: “ATP + EGTA-AM treatment led to an inhibition of Ca<sup>2+</sup> responses, associated with small variations in the Ca<sup>2+</sup> baseline, that were arbitrarily scored as flattened Ca<sup>2+</sup> pseudo-responses (ATP+EGTA-AM, red arrows).”

      (7) Figure S5:

      Could not identify the purple arrow for the less mobile cluster.

      We agree that the former Figure lacked clarity and have remade Figure S5, now Figure S6, with higher magnification of panels with fast acquisition. The previously purple arrows pointing at larger and less mobile clusters are now shown in black in these enlarged panels. The legend has been changed accordingly.

      (8) There are some typos and suboptimal formulations throughout the manuscript, such as:

      P8: "minute amount" could be changed to low, minor or similar.

      “minute” amounts of eATP was replaced by “low amounts of eATP”.

      P8: put a "%" to the numbers 61.2 {plus minus} 5.8.

      “%” was added.

      P16: "manuscript".

      Thank you.

      Reviewer #2 (Recommendations for the authors):

      Suggestions relate to the following three topics.

      First, a better description of the infection-associated calcium signals. The authors emphasize throughout the paper that their imaging data challenge established concepts in the calcium signaling field (discussion). I do not see the calcium imaging data explained either with data or textually with sufficient clarity to evaluate this assertion. A start would be a clear description of the characteristics of the EPEC-evoked calcium signals relative to other local and global domains of calcium signaling previously described in HeLa cells. Prior work has shown that PI-coupled agonists evoke local calcium signals that are perinuclear in HeLa cells (PMID: 10660296), but the relationship of EPEC-evoked transients to these previously defined responses is not clear.

      We agree and have added a more detailed description of the CCRICs in the results and discussion section, as detailed in response to referee 1, Weakness 2.

      Most importantly, it is ambiguous where in the HeLa cell recordings are made. Are these recordings close to the plasma membrane and/or deeper within the cell? The only spatial information is provided in Figure 3A, and these responses are not well described in the text or presented in a way that comparisons can be made to responses from a PI-coupled agonist.

      CRICCs are observed over the whole cell or very large cell area. We agree that this point as well as comparison with previously described puffs needed clarification. We have added the following sentences in the discussion and inserted the seminal Thomas et al. 1999 citation in the references, p. 13, l. 18:

      “Consistently, while CRICCs were detected in the vast majority of cells at these very low agonist concentrations, in rare instances, local “puff-like” responses were also detected at the cell periphery. These observations are in contrast to previously described Ca<sup>2+</sup> puffs preceding global responses reported to occur preferentially in perinuclear area (Thomas et aL., 1999). These earlier studies, however, involved higher agonist concentrations (1-5 µM ATP) expected to lead to the release of higher IP<sub>3</sub> concentrations, which may preferentially stimulate larger IP<sub>3</sub>R clusters at the perinuclear region because of the higher density of IP<sub>3</sub>Rs. In addition, larger IP<sub>3</sub> clusters may release higher amounts of Ca<sup>2+</sup> for which, as opposed to CCRICs, diffusion would be restrained by Ca<sup>2+</sup> buffers thereby favoring the spatial confinement of the response. “

      If I understand the described responses correctly, could not these rapid local responses result from a change in cellular calcium buffering capacity consequent to infection? Are the authors proposing that these responses occur in other cells also, or represent a pathogen-specific signaling mode?

      We do not believe that CCRICs are specific to EPEC, since they are also elicited by low agonist concentrations. The discrete action of Type III translocons leading to the release of small amounts of extracellular ATP at the onset of EPEC prompted us to perform fast Ca<sup>2+</sup> imaging at low agonists concentrations (150 nM ATP, 100 nM histamine now shown in Fig. S4), which to our knowledge, differ from higher agonist concentrations used in all previous studies describing puffs. Our modelling studies support the notion that CCRICs correspond to generic Ca<sup>2+</sup> release-dependent responses triggered by low levels of IP3.

      Second, evidence supporting a mechanistic role of ATP comes from prior literature, together with the authors' presented data showing the effects of PLC (to inhibit IP3), pharmacological inhibition (suramin, a non-selective purinoceptor blocker), and the effects of T3SS-deficient mutants (to prevent ATP release). However, there are missing steps here to mechanistically identify how ATP is working. First, does degradation of extracellular ATP (e.g., apyrase) block these responses? Second, given HeLa cells are easily amenable to knockdown approaches, does knockdown of particular ATP receptors, or TRPV2 as suggested in the prior literature, impact the calcium signal dynamics?

      We now show inhibition of CCRICs by PPADS, another purinergic receptor antagonist, and extracellular ATP depletion by addition of hexokinase in the extracellular medium in Figs. S4 and S7.

      Knocking down ATP receptors represents a challenging task since HeLa cells were shown to express transcripts for most of the described 8 P2Xs and 7 P2Ys purinergic receptors (10.1016/j.bbamem.2009.03.006). Mostly, we do not believe that CCRICs are triggered by a specific ATP receptor and do not expect to see inhibition of CCRICs in single knock-down experiments. Our experimental and modelling studies suggest that CCRICs are not specific to EPEC nor to a particular ATP receptor, but instead correspond instead to generic Ca<sup>2+</sup> elicited at low agonist concentrations such as ATP or histamine.

      Zhong et al., 2020 indeed previously showed a role for Ca<sup>2+</sup> influx mediated by the TRPV2 receptor in EPEC-mediated cell death. However, this influx occurred following 8 hours of cell infection with EPEC.

      We do not detect significant cell death or Ca<sup>2+</sup> influx at the onset of infection corresponding to the 12 hours infection kinetics that we used. Our experiments indicate that CCRICs do not involve Ca<sup>2+</sup> influx.

      Third, while the use of HeLa cells provides advantages for imaging and mechanistic assays, the effort to replicate findings in an intestinal cell line would heighten relevance, given the likely importance of cell type and cell polarity on the pathogen-evoked responses.

      We agree and have performed complementary experiments showing induction of CCRICs by EPEC and eATP in polarized intestinal epithelial cells, now shown in Figure S8.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a compelling case for the necessity of age-specific templates in functional hyperalignment. Given that the brain undergoes substantial developmental, structural, and functional changes across the lifespan, a 'one-size-fits-all' canonical template is often insufficient. This study effectively demonstrates that incorporating age-congruent features significantly enhances the performance and sensitivity of hyperalignment models. By validating these findings across two independent datasets (Cam-CAN and DLBS), the paper provides robust evidence that accounting for age-related functional organization is a critical prerequisite for accurate functional alignment in lifespan research.

      Strengths:

      (1) The authors used three metrics to evaluate performance. Across all metrics, they found that age-congruent templates outperformed age-incongruent templates, suggesting that age-specific templates can improve alignment.

      (2) These findings highlight the superiority of age-congruent templates for hyperalignment. This work underscores the importance of age-matching in cross-subject functional mapping and represents a vital step forward for the methodology.

      We thank the reviewer for the summary and the positive evaluation of our manuscript.

      Weaknesses:

      (1) Participant Demographics and Group Separation:

      The study defines the 'older' cohort as 65-90 years and the 'younger' cohort as 18-45 years. While this 20-year gap (ages 46-64) effectively maximizes the contrast between groups, the results in Figure 4a suggest that the predicted individualized connectomes follow a continuous distribution. Given this continuity, could the authors provide the average median trends for Figures 2a and 2b to illustrate how the model behaves across the missing age range?

      Thanks for raising this important point. We had calculated the results for the middle-aged cohort template and have included them in the Supplementary Figures 4 & 5. Similar to Figure 2a, 2b, 3a and 3b, we directly compare the intersubject correlation and prediction performance of the middle-aged participants when aligned to their congruent middle-aged template versus an incongruent template. We observed consistent results across validation analyses (ISC and prediction) and groups (young vs. middle-aged, middle-age vs. old). Consistent with our main findings, the middle-aged cohort exhibits significantly higher intersubject correlation and prediction performance when using the age-congruent middle-age template. These results confirm that the age-related shifts in functional brain organization captured by the hyperalignment templates follow a continuous trajectory across the lifespan.

      (2) Request for Implementation:

      I have been unable to locate the source code associated with this publication. Could the authors please provide a link to the repository or clarify if the implementation is available for reproduction?

      We have made our scripts public in GitHub and here’s the link: https://github.com/yuqi98/Aging_templates_scripts

      (3) Analysis of Prediction Performance and Distribution:

      While Figures 3b and 5b clearly demonstrate that the congruent template improves correlation, Figure 4a shows a distinct shift in the scatter distribution. Could the authors provide a detailed explanation of the prediction performance metrics used? Specifically, I would like to understand how the underlying method accounts for the distribution differences observed when applying the congruent template.

      Our prediction performance metric is the average Pearson correlation. We calculated the correlation between the model-predicted data (the individualized connectome in Figure 3 and the movie response in Figure 5) and the participant's actual measured data for each cortical vertex and averaged the correlations across vertices. A higher correlation indicates that the group template, when combined with the participant’s individualized transformation matrix, more accurately reconstructs the individualized functional connectome and responses to stimuli.

      The distinct upward shift in prediction performance when using a congruent template occurs because brain functional organization shows age-specific features. A congruent template captures these age-specific connectivity and response features. Importantly, the template creation algorithm aims to reflect the central tendency of the training data, including representational/connectivity geometry and functional topographies. Therefore, the observed differences in templates reflect differences in functional organization across age groups. As a result, when projecting the common template back into an individual’s native cortical space using the transformation matrix derived from independent data, the congruent template provides a richer, more accurate basis for reconstructing the individualized connectome and movie-watching responses.

      Reviewer #2 (Public review):

      Summary:

      In this study, Zhang and colleagues examine the role of participant selection in creating and using functional templates to improve analyses using hyperalignment. Hyperalignment aligns participants' functional MRI data to a shared functional template, analogous to the anatomical templates used to bring anatomical MRI data into a shared space (e.g., MNI152). The question of appropriate template creation is especially pressing for population-level analyses, where a large number of demographic groups (e.g., different age ranges, clinical statuses) may be included in the same analysis. These different demographic groups may have differences in their functional organization that complicate the creation of a single study-specific functional template.

      To provide an initial investigation of the potential effect of demographic-specific templates, the authors use the publicly available Cam-CAN dataset, which contains participants from 18 to 87 years of age. They define a young adult (< 45 years of age) and an older adult group (> 65 years of age) from this dataset with approximately the same number of participants. They investigate whether "age-congruent" templates (i.e. defined in the same age group they are used) improve three analyses where hyperalignment has been previously shown to boost performance: inter-subject correlation, predicting individual connectomes, and predicting individual functional responses. Using the Cam-CAN-derived older adult template, they then replicate the ISC analyses using the publicly available Dallas Lifespan Brain Study (DLBS).

      Overall, the presented results are highly suggestive that age-congruent templates consistently improve performance, though the absolute effects are small.

      Strengths:

      The use of a separate validation sample, reusing the same template calculated with Cam-CAN, highlights the potential of developing independent templates for individual demographic groups and then distributing these for wider use, analogous to the MNI templates that are widely used throughout the field of neuroimaging. This suggests that the potential impact of this framework is significant.

      We thank the reviewer for the summary and the positive evaluation of our manuscript.

      Weaknesses:

      While the authors appropriately highlight the potential applications of this result (e.g., to different clinical statuses), it is not apparent how to appropriately extend this methodology to many common experimental paradigms. For example, in case-control studies (where researchers are interested in comparing clinical and non-clinical participants) the use of two different functional templates may complicate rather than ease analyses. Providing this as a potential limitation of the current template construction method, or providing recommendations to researchers interested in comparing across groups, would help to increase the impact of this work.

      We appreciate the reviewer raising this important practical consideration. We have added additional explanation to the Discussion section to provide clear recommendations for researchers applying this methodology, which we summarize below:

      When the goal of a case-control study is to directly compare functional organization or brain responses between clinical and non-clinical participants, it is essential that all individuals are hyperaligned to the same common template. For these analyses, researchers should either construct a joint template containing a balanced, representative sample from both groups, or align all participants to a normative control template. This ensures that the resulting data share a single coordinate system, allowing for valid statistical comparisons between groups.

      However, disease-specific or age-specific templates are highly advantageous when the research objective is to maximize decoding accuracy or predictive performance within a specific population. In real world clinical or lifespan research, if the goal is to build a reliable diagnostic biomarker for disease progression or map individualized connectomes for a specific patient's cohort, researchers should use a template congruent with that specific group. The congruent template will preserve the group-specific representational geometry, providing a better individual-level prediction than a general cortical template.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      In general, there appears to be significantly more spread in the values for older adults (e.g., Figure 4b). It would be useful to know whether subdividing this group improves its relative performance; however, this will likely require additional investigation into the number of participants needed to establish a minimal template.

      We thank the reviewer for this constructive comment. We agree that older adults exhibit greater inter-individual variability in functional organization, which likely drives the larger spread observed in Figure 4b. We also appreciate the suggestion to subdivide this group to see if narrower age bins improve relative performance.

      We have constructed templates using narrower, 10-year age intervals and evaluated their performance. Because model performance increases with the amount of training data, we use a fixed number of training participants for each age group (two thirds of the people from the group with the minimal number of people) to build the templates to make a fair comparison. We have added the results in the Supplementary Figure 6. The results show a continuous gradient of age-related divergence. When predicting data for the 80–90 cohort, the 20–30 template performs the worst and the performance steadily improves as the template age gets closer to the target demographic. This systematic gradient further supports our main finding: the penalty for using an incongruent template increases with the discrepancy between the template age and participant age.

      Interestingly, we noticed that at the extreme ends of the age range (20–30 and 80–90), the strictly congruent template was slightly outperformed by the immediately adjacent age bin (i.e., the 30–40 template for young participants, and the 70–80 template for the oldest participants). Because we strictly matched the number of training subjects across all bins, this slight dip is likely driven by differences in raw data quality. It is common for fMRI data from the extreme ends of the lifespan to have slightly lower signal-to-noise ratios or higher head motion compared to the intermediate 30–40 or 70–80 cohorts. This suggests that while age congruency is a key driver of hyperalignment success, the intrinsic data quality of the cohort used to build the template also plays a practical role in its overall performance.

      This brings up the reviewer’s second point regarding the number of participants needed to establish a minimal template. Subdividing the age groups reduces the sample size available to construct each template. Previous research has demonstrated that while a hyperalignment template derived from a relatively small number of participants can achieve acceptable performance, increasing the amount of data and the number of subjects in the template space consistently and robustly improves alignment quality (See Supplementary Figure 7 in Feilong et al., 2023). Ultimately, our long-term goal is to build highly robust, standardized templates for fine-grained age cohorts across the entire lifespan. We are preparing to collect large-scale datasets from age 20 to 100 to build age-specific templates and provide them as open resources. This will allow future researchers to directly align their data to an age-appropriate template without needing to construct one from their own limited samples.

      Reference

      Feilong, M., Nastase, S. A., Jiahui, G., Halchenko, Y. O., Gobbini, M. I., & Haxby, J. V. (2023). The individualized neural tuning model: Precise and generalizable cartography of functional architecture in individual brains. Imaging Neuroscience, 1, 1–34. https://doi.org/10.1162/imag_a_00032

    1. Briefing : Analyse de la lutte contre la pédocriminalité et des enjeux de prise en charge

      Résumé exécutif

      Ce document synthétise les enjeux actuels de la lutte contre la pédocriminalité en France, tels qu'analysés par des experts du droit, de la psychiatrie et de la politique.

      Le constat est alarmant : environ 160 000 mineurs subissent des violences sexuelles chaque année, soit 438 par jour.

      Les principaux obstacles identifiés résident dans le déni systémique des institutions (famille, Église, éducation), les limites structurelles de la justice et l'insuffisance des dispositifs de prévention et de soin pour les auteurs.

      L'analyse souligne l'importance de redéfinir les termes (pédocriminalité plutôt que pédophilie), de réformer les délais de prescription et de s'attaquer aux racines sociétales du problème, notamment l'influence de la pornographie sur le développement psychosexuel.

      --------------------------------------------------------------------------------

      1. La réalité des victimes : Traumatisme et libération de la parole

      Le témoignage de Frédéric Pommier

      Le récit du journaliste Frédéric Pommier illustre la complexité du vécu des victimes.

      Agressé par quatre hommes de son entourage entre 4 et 7 ans (dont un ancien député et un ami de la famille), il n'a retrouvé la mémoire qu'à l'âge adulte suite à une agression fortuite.

      • Amnésie dissociative : Un mécanisme de protection du cerveau qui peut occulter les faits pendant des décennies.

      • Conséquences à long terme : Sentiments de culpabilité, honte, addictions (alcool, tabac), comportements à risque et idées suicidaires quotidiennes.

      • La figure de l'agresseur : Souvent perçu comme "très gentil", l'agresseur utilise la manipulation et bénéficie d'un sentiment d'impunité lié à sa position de confiance.

      Les défaillances de l'écoute

      Malgré une parole qui a "toujours existé", la société peine à écouter les enfants.

      • Le tabou de la famille : La famille est désignée comme le lieu le plus dangereux pour l'enfant.

      La dénonciation est souvent perçue comme une menace pour l'institution familiale, entraînant parfois le bannissement de la victime par son propre entourage.

      • Le manque de formation : L'accueil des victimes dépend souvent de la "loterie" des interlocuteurs (policiers ou magistrats formés ou non).

      --------------------------------------------------------------------------------

      2. Obstacles institutionnels et débats législatifs

      Pédophilie ou Pédocriminalité ?

      La sénatrice Laurence Rossignol récuse le terme "pédophilie" (qui signifie "aimer l'enfant") au profit de "pédocriminalité".

      Selon elle, ces actes relèvent de la soumission, de la domination et de la manipulation, et non de l'amour.

      Le secret de la confession en question

      Le débat sur la levée du secret de la confession cristallise les tensions entre lois républicaines et dogmes religieux :

      • Position républicaine : Le code pénal impose un devoir de dénonciation des maltraitances sur mineurs.

      Aucune institution ne devrait bénéficier d'un régime d'exception.

      • Arguments contre la levée : Certains craignent que cela n'incite au silence absolu, supprimant le seul lieu où un prêtre pourrait inciter un auteur à se dénoncer ou à se soigner.

      • Contexte politique : L'amendement visant à lever ce secret a été retiré sous la pression de la droite et de l'extrême droite.

      Avancées et limites de la loi

      Une loi récente prévoit le renforcement du contrôle des intervenants auprès des mineurs et l'accès systématique à un avocat pour les enfants victimes.

      Cependant, des manques subsistent, notamment en ce qui concerne les moyens d'enquête et le suivi effectif des mesures de protection.

      --------------------------------------------------------------------------------

      3. Le système judiciaire face à ses limites

      Le document met en évidence plusieurs points de blocage majeurs dans le traitement judiciaire des affaires :

      | Obstacle | Description | | --- | --- | | Prescription | Bien que les délais soient de 30 ans pour un viol (à partir de la majorité), de nombreuses victimes ne retrouvent la mémoire qu'après ce délai. L'imprescriptibilité est de plus en plus évoquée. | | Moyens d'enquête | Pénurie criante d'enquêteurs spécialisés pour des dossiers longs et complexes. | | Classements sans suite | Fréquents en raison d'une dépersonnalisation des dossiers par les parquets ou de critères de poursuite trop restrictifs, ce qui discrédite la parole de l'enfant. | | Suivi des mineurs | Absence de soins effectifs pour les mineurs auteurs ou victimes (délais d'attente supérieurs à un an pour l'assistance éducative). |

      --------------------------------------------------------------------------------

      4. Prévention et soin : Peut-on soigner les auteurs ?

      Le psychiatre Antoine Pellissolo et d'autres experts explorent les pistes de prévention pour éviter le passage à l'acte.

      Modèles internationaux de prévention

      • Allemagne (Programme "Dunkelfeld") : Depuis 2005, offre un suivi gratuit et anonyme (45 séances) axé sur l'autorégulation et l'empathie.

      Résultats : 98 % de non-passage à l'acte 6 ans après le programme.

      • Royaume-Uni ("Stop It Now") : Ligne téléphonique dédiée aux agresseurs potentiels et à leurs proches pour freiner les pulsions avant le crime.

      • Castration chimique : Utilisée en Pologne (imposée), au Danemark (volontaire) et débattue en Italie.

      Il s'agit d'un traitement réversible réduisant la libido.

      Facteurs de risque et racines du problème

      • Cycle de la violence : Environ un tiers des auteurs ont eux-mêmes été victimes durant leur enfance.

      Soigner les victimes aujourd'hui est une mesure de sécurité pour demain.

      • Impact de la pornographie : La consommation précoce et intense de pornographie est accusée de brouiller les limites entre fantasme et réalité, d'encourager la domination et de pousser les individus vers des contenus de plus en plus transgressifs (pédopornographie).

      Conclusion

      La lutte contre la pédocriminalité nécessite une réponse systémique.

      Elle passe par une meilleure formation des professionnels, une augmentation des moyens d'enquête, une remise en question des protections institutionnelles (secret de la confession, autorité familiale absolue) et le développement de structures de soins spécialisées tant pour les victimes que pour les auteurs potentiels.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public Review

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript is much stronger now after the revision which incorporates the requested changes. We added results of new experiments and additional analyses. Although these new insights did not change the previous conclusions, we significantly reworked the Discussion and added further references to clarify the conclusions we want to make.

      Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs associate with the olfactory receptor co-receptor (Orco) to be trafficked to the membrane of the cilium of the ORN, where they can be contacted by pheromones and odorants. In Manduca sexta, evidence is accumulating for G-protein coupled metabotropic pheromone transduction and not for OR-Orco dependent ionotropic transduction, as shown for Drosophila melanogaster. In both insect species, besides its chaperone function, Orco can form leaky cation channels, which can regulate the spontaneous spiking activity of ORNs. In this study, we explored this role of Orco.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Please see our responses to the detailed comments.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2013). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco ligand candidates (OLCs) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). There, we also demonstrated that OLC15 dose-dependently antagonizes the VUAA1-dependent activation of Orco.

      Furthermore, we tested other published Orco antagonists, which were characterized in heterologous assays, in primary cell cultures of hawkmoth ORNs, as well as in in vivo assays in intact hawkmoths. We focused on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific antagonists but instead affected different ion channel targets depending on the time of day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Due to those results and other comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15 as most adequate to antagonize Orco functions in Manudca. In the current study, we focus on Orco without excluding the possibility that other ion channels in the ORNs contribute to the control of membrane potential rhythms.

      We have clarified the Methods section accordingly.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We included these additional qPCR experiments and edited the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). For clarification, we added the 2015 citation to the Modeling chapter in the Methods section.

      We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs but we know that both are expressed in the pheromone-sensitive ORNs. Thus, as the referee suggests, we added text regarding the presence and localization of OR-Orco heteromers. Consistent data collected across different experiments (heterologous expression systems, primary cell cultures of hawkmoth ORNs, in vivo/in situ studies) support that Orco homomers are present in hawkmoth ORNs. In addition to co-expression of MsexOrco and MsexSNMP-1 with either MsexOr-1 or MsexOr-4 in a heterologous expression system, MsexOrco expression alone was already sufficient to increase intracellular Ca<sup>2+</sup> levels spontaneously as a result of its property as leaky, non-specific cation channel, and in response to VUAA1 application (Nolte et al., 2013). Both in developing hawkmoth pupae and differentiating primary cell cultures of hawkmoth ORNs, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors but where Orco affected spontaneous activity and intracellular Ca<sup>2+</sup> levels dependent on VUAA1 (Nolte et al., 2016). In vitro patch clamp studies of differentiating cultured hawkmoth ORNs during this time window of pupal development characterized ion channels/currents with properties of Orco as a leaky, non-specific cation channel/current that depends on protein kinase C and cyclic nucleotides (Dolzer et al., 2021, 2008; Krannich and Stengl, 2008; Stengl, 1993). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but they do not heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths because all OR-specific antibodies tested did not work in immunocytochemical studies of hawkmoth antennae (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990). Our hypothesis of differential distribution of Orco homomers in the some and dendrite compartment, and OR-Orco heteromers in the cilia is based on differential immunocytochemical localization of Drosophila ORs mainly in the cilia compartment (Benton et al., 2006).

      We clarified our manuscript accordingly.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during an experimentally very challenging long-term recording experiment over several days. In addition, we observed over the years in our animal raising facility that in 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths, next to stress signals, rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Because we focus on spontaneous activity and not on pheromone-dependent physiology in this study, we used isolated males that were never exposed to the female pheromones, taking phase dispersal into account. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a phase-dispersed free-running population. As requested by the referees in point (7), we added RAIN to test for rhythmicity in each of our recordings and revised the manuscript accordingly.

      Furthermore, in preliminary experiments we briefly exposed hawkmoths to pheromone the night before the start of the experiment. However, we failed to obtain phase-synchronized spiking rhythms. Most likely, a circadian pattern of pheromone exposure would have been necessary as zeitgeber, which could not be used here due to long-term pheromone-dependent effects in spiking activity. These results are added as supplementary figure to Fig 3.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording sites are located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We clarified this in the Methods section.

      In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Lee and Strausfeld (1990) mapped all types of antennal sensilla, and together with pheromone-dependent tip-recordings of Kaissling et al. (1989) it was shown that most of the male antennal sensilla are pheromone-sensitive long trichoid sensilla, with one of the two innervating ORNs always responding to bombykal, ensuring high sensitivity to pheromone detection. Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs (review: (Stengl, 2010)). This would indicate that all ORNs, whether they express ORs sensitive to pheromone or general odorants, could potentially share the same Orco-dependent spontaneous activity rhythms. Furthermore, in our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum and is very likely shared by all types of OR-Orco expressing ORNs.

      (6.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…

      There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that PKC- and cGMP/cAMP-dependent regulations are present for Orco in other insect species. To test this, we are currently characterizing second messenger-dependence of spontaneous spiking activity, which is the focus of a follow-up manuscript. Nevertheless, to provide more evidence for our hypothesis of the current manuscript, we added a new set of tip-recording experiments that demonstrate cAMP-dependent gating of Orco. Because of the addition of this figure, we merged figures 8-10 into Figure 8 and added the cAMP data as Figure 9.

      (6.2) … and the PTTF model proposed is somewhat disappointing.

      For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper that we refer to in the manuscript (Stengl and Schneider, 2024). We added clarification of how Orco activation can influence cAMP levels. A more elaborate PTFL clock model including many more of the identified ion channels in hawkmoth ORNs is the focus of another manuscript to come.

      (6.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.

      Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro (Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (Stengl, 2010; Stengl and Funk, 2013; Takagi et al., 2025; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)). We added references to the possible odor transduction mechanisms to the introduction.

      (6.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of indirect cAMP effects via e.g., Orco subunits complexing with other molecules under direct cAMP control, such as other ion channel subunits. Furthermore, it does not exclude so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a zeitgeber time-dependent PKC- and cyclic nucleotide-dependent modulation of Orco. These detailed studies will be published in a follow-up publication. In the revised version of this manuscript, we added tip-recording experiments that indicate cAMP involvement in Orco gating (new Figure 9).

      (7) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=40). We are thankful to the reviewers’ suggestion to use RAIN since this analysis revealed circadian rhythms in 7 of 11 LD recordings, 8 of 12 DD recordings, and 2 of 12 OLC15 recordings. Please see also our response to (4) above, commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.

      (8) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      According to the valuable suggestions of the referees, we used RAIN to detect circadian rhythms in the spiking attributes in each individual animal. Since only 2 of 12 animals displayed a circadian rhythm in OLC15, statistical comparison of circadian amplitudes is not possible. We revised the results section accordingly and added to the figure legend to make it clearer that the heat maps in Fig 5 are representative from one animal each and not averages across animals.

      As the reviewer states correctly in (7), wavelet results of circadian rhythmicity must be interpreted carefully because of the low number of circadian cycles in ~3-4 day recordings. Since the heatmaps in Figure 5 visually revealed the presence of ultradian rhythms, the main focus of the wavelet analysis in Figure 6 is in the detection and quantification of ultradian periods up to 20 h.

      We revised the Methods section to include references to previous experiments that characterized the effect of different doses of OLC15 and other Orco antagonists and agonists in M. sexta antennae (Nolte et al., 2016). Please see also our response to (1).

      (9) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      We revised the manuscript accordingly and clarified which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).

      (10.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).

      We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We are currently searching for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single-nucleus transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript. However, we added a set of experiments to this manuscript in which we demonstrate that the effect of increased cAMP on the spontaneous spiking activity is mediated by Orco (new Figure 9).

      (10.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. In section 6.2 we already stated that our experiments do not exclude that Orco is under indirect control of the TTFL. We revised our discussion accordingly.

      The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrated that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      (a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      We revised the discussion accordingly.

      (b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      We added those experiments to the revised version of the manuscript (see our response to (2)).

      (c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      We added possible negative feedback elements to the Discussion to explain how our proposed PTFL could in principle work independent of TTFL clock.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      We have revised the discussion accordingly.

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

      These large fluctuations stem from doing experiments at different seasons (higher temperature and humidity in the summer months, lower temperature and humidity in winter). Throughout each individual experiment, conditions were stable. We clarified the Methods section accordingly.

      Recommendations for the authors:

      The authors should post the code for their computational model to a repository like GitHub.

      The code for the computational model is now available at https://github.com/a-c-schneider/VijayanForlinoEtAl2025_Model.git

      References

      Benton R, Sachse S, Michnick SW, Vosshall LB. 2006. Atypical Membrane Topology and Heteromeric Function of Drosophila Odorant Receptors In Vivo. PLOS Biology 4:e20. DOI: https://doi.org/10.1371/journal.pbio.0040020

      Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. DOI: https://doi.org/10.1371/journal.pone.0036784

      Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. Journal of Experimental Biology 206:1575–1588. DOI: https://doi.org/10.1242/jeb.00302

      Dolzer J, Krannich S, Stengl M. 2008. Pharmacological Investigation of Protein Kinase C- and cGMP-Dependent Ion Channels in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Chemical Senses 33:803–813. DOI: https://doi.org/10.1093/chemse/bjn043

      Dolzer J, Schröder K, Stengl M. 2021. Cyclic nucleotide-dependent ionic currents in olfactory receptor neurons of the hawkmoth Manduca sexta suggest pull–push sensitivity modulation. European Journal of Neuroscience 54:4804–4826. DOI: https://doi.org/10.1111/ejn.15346

      Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Frontiers in Cellular Neuroscience 12:218. DOI: https://doi.org/10.3389/fncel.2018.00218

      Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. DOI: https://doi.org/10.1371/journal.pone.0058889

      Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Current biology 34:1414-1425.e5. DOI: https://doi.org/10.1016/j.cub.2024.02.042, PMID: 38479388

      Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. DOI: https://doi.org/10.3390/insects15121016

      Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proceedings of the National Academy of Sciences 108:8821–8825. DOI: https://doi.org/10.1073/pnas.1102425108

      Kaissling KE, Hildebrand JG, Tumlinson JH. 1989. Pheromone receptor cells in the male moth Manduca sexta. Archives of Insect Biochemistry and Physiology 10:273–279. DOI: https://doi.org/10.1002/arch.940100403

      Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. Journal of Experimental Biology 172:345–354. DOI: https://doi.org/10.1242/jeb.172.1.345

      Krannich S, Stengl M. 2008. Cyclic Nucleotide-Activated Currents in Cultured Olfactory Receptor Neurons of the Hawkmoth Manduca sexta. Journal of Neurophysiology 100:2866–2877. DOI: https://doi.org/10.1152/jn.01400.2007

      Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. DOI: https://doi.org/10.1038/22566

      Lee JK, Strausfeld NJ. 1990. Structure, distribution and number of surface sensilla and their receptor cells on the olfactory appendage of the male mothManduca sexta. Journal of Neurocytology 19:519–538. DOI: https://doi.org/10.1007/BF01257241

      Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. Journal of Biological Rhythms 22:502–514. DOI: https://doi.org/10.1177/0748730407307737

      Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. DOI: https://doi.org/10.1371/journal.pone.0062648

      Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. DOI: https://doi.org/10.1371/journal.pone.0166060

      Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. Journal of Biological Rhythms 22:43–57. DOI: https://doi.org/10.1177/0748730406295462, PMID: 17229924

      Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. Journal of Biological Rhythms 29:318–331. DOI: https://doi.org/10.1177/0748730414546133

      Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. Journal of Biological Rhythms 27:388–397. DOI: https://doi.org/10.1177/0748730412456265

      Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. DOI: https://doi.org/10.1371/journal.pone.0121230

      Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. DOI: https://doi.org/10.1523/ENEURO.0376-24.2024, PMID: 39880675

      Stengl M. 2010. Pheromone Transduction in Moths. Frontiers in Cellular Neuroscience 4:133. DOI: https://doi.org/10.3389/fncel.2010.00133

      Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. Journal of Comparative Physiology A 174:187–194. DOI: https://doi.org/10.1007/BF00193785

      Stengl M. 1993. Intracellular-Messenger-Mediated Cation Channels in Cultured Olfactory Receptor Neurons. Journal of Experimental Biology 178:125–147. DOI: https://doi.org/10.1242/jeb.178.1.125

      Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. Journal of Comparative Physiology A 199:897–909. DOI: https://doi.org/10.1007/s00359-013-0837-3

      Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. Journal of Neuroscience 10:837–847. DOI: https://doi.org/10.1523/JNEUROSCI.10-03-00837.1990, PMID: 2319305

      Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Frontiers in Physiology 14:1243455. DOI: https://doi.org/10.3389/fphys.2023.1243455

      Takagi S, Abuin L, Mermet J, Lee D, Benton R. 2025. A GPCR signaling pathway in insect odor detection. DOI: https://doi.org/10.1101/2025.10.03.680299

      Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Current Biology 14:638–649. DOI: https://doi.org/10.1016/j.cub.2004.04.009, PMID: 15084278

      Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla. In: Locke M, Smith DS (Eds). Insect Biology in the Future. Academic Press. p. 735–763. DOI: https://doi.org/10.1016/B978-0-12-454340-9.50039-2

      Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell and Tissue Research 383:7–19. DOI: https://doi.org/10.1007/s00441-020-03363-x

    Annotators

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper provides a novel method to improve the accuracy of predictions of the impact of ITN strategies, by using sub-national estimates of the duration of ITN access and use over time from cross-sectional survey data and annual country ITNs received.

      Strengths:

      The approach is novel, makes use of available data, and has considered all of the relevant components of ITN distributions.

      Weaknesses:

      (W1.1) The main message of the paper was not very clear, and did not seem to fit the title. The title focuses on sub-national tailoring of ITN, but the abstract did not feature results directly about SNT. It was not very clear what the main result of the paper was - there are several ITN observations in the results and discussion. Most did not seem to be directly about SNT, but rather sub-national differences in use and access were accounted for in the analyses. It was not clear if the same conclusions would be reached without accounting for sub-national differences, but the estimates and predictions could be expected to be more accurate.

      Thank-you for highlighting this. We agree the title could be improved to better reflect the main messages of the paper and have now updated it to “Heterogeneity of use, access and retention of insecticide-treated nets: implications for subnational tailoring to maximise malaria control”. All parameters are estimated at a subnational level; this is not always the case a national level. We therefore do not have national-level models without subnational differences that our results could be compared to.

      (W1.2) Some of the results seemed to me to be apparent even without a modelling exercise (eg high coverage could not be maintained between campaigns, use would be higher with 2-yearly distributions rather than 3-yearly) or were not in themselves new insights (eg estimates of the duration of use). It would be helpful to clearly state what the novel results are in the abstract, the first paragraph of the discussion and the conclusions, and to make sure that the title is consistent.

      It is our understanding assessments on ITN coverage are often made from infrequent surveys, for example from MIS. These are typically conducted six months postcampaign and may miss notable reductions in use and access beyond this. Comparisons on ITN use and access are also frequently made directly between DHS surveys, which can be misleading in isolation if the time between campaigns and surveys is not considered. We have tried to highlight this more clearly in relation to Burkina Faso with the following text:

      “The observed decrease in use and access across many regions in Burkina Faso may therefore be a by-product of DHS surveys being conducted at progressively later dates relative to the most recent campaign; this does not necessarily indicate an underlying trend in decreasing use or access over longer timescales.”

      We do believe modelling exercises, such as the methodology presented here, can help generate improved estimates of ITN use and access over time than estimates from surveys alone, which can be biased by the relative timings of campaigns. It is also our understanding previous studies have generated national estimates of ITN retention. We are not aware of any previous studies that have estimated the duration ITNs continue to be used for, which is arguably of greater epidemiological importance than retention time. To best knowledge, these have also not been estimated at subnational scales previously.

      We acknowledge the novelty of some results were not clearly presented previously and are grateful to the reviewer for highlighting this. We have now highlighted some of the novel findings more clearly in the abstract, with the following text:

      “However, subnational variation in ITN retention and the duration that ITNs remain in use have not previously been quantified.”

      “Our results highlight that although transmission intensity remains an important factor for subnational tailoring of malaria control interventions, other factors, such as ITN use given access, meaningfully influence optimal deployment strategies.”.

      We have also highlighted the novelty and relevance of our findings more clearly in the first paragraph discussion, with the following text:

      “Funding constraints have also increased the need for consideration of subnational tailoring, with many recommendations being made on the basis of transmission intensity in the World Health Organisation (2025) Subnational Tailoring Reference Manual. However, a key uncertainty in assessing the potential impact of different ITN interventions has been how long nets remain in use rather than how long they are retained, and how this varies between regions. Here, to our best knowledge, we present the first estimates of subnational variation in ITN retention and the duration that ITNs remain in use, and also quantify for the first time how ITN use, access and retention vary between subnational regions across multiple African countries. Our work supports the change in guidance to optimal coverage as it highlights ITN interventions have notable differences in impact between settings, and that distributing fewer but more effective ITNs, particularly pyrethroid-chlorphenapyr products, is likely to be more impactful than maximising long-term coverage through increased campaign frequencies with pyrethroid-only ITNs. Our work also broadly supports World Health Organisation (2025) recommendations for subnational tailoring, particularly the consideration of deprioritisation of ITN distribution in very low transmission settings. However, our results provide new indications that deprioritisation of areas with higher ITN use given access may lead to greater resurgences in cases, highlighting that subnational tailoring decisions could be optimised further by considering additional factors to transmission intensity alone.”

      The novelty and relevance of our results are also now highlighted in the following text, which has been incorporated into the concluding paragraph:

      “In conclusion, the work indicates that universal coverage targets of 80% are unlikely to be consistently met due to waning overall ITN use in the intervening years between triennial mass campaigns. Improved coverage can be achieved through more frequent biennial distributions, though this is unlikely to be feasible at scale given the current funding landscape. Indeed, when resources are constrained, deprioritisation of ITN mass campaigns in certain settings is being increasingly considered through subnational tailoring of malaria control interventions. Our work highlights that the relationship between transmission intensity (whether measured in terms of prevalence or clinical cases) and intervention impact is non-linear, and notable resurgences in cases may follow when campaigns are deprioritised in all but very low transmission settings. This broadly supports WHO subnational tailoring guidance, which suggests consideration of deprioritising distribution of ITNs in regions with PfPR<sub>2-10</sub> < 1% (World Health Organization, 2025). However, while the World Health Organization (2025) Subnational Tailoring Reference Manual proposes that the withdrawal of ITNs in favour of indoor residual spraying should be considered in areas with low ITN use, here we estimate that ITN use alone appears to be a notably poorer predictor of the impact of ceasing mass campaigns than use given access. Our findings suggest that regions with higher use given access may experience disproportionately greater resurgences in cases following deprioritisation. This implies that regions with low use given access may warrant consideration for cessation of ITN distribution, rather than decisions being based solely on low overall ITN use irrespective of whether communities have sufficient ITN access. However, subnational differences in ITN use, access and retention are key knowledge gaps in many settings, and when estimated from infrequent surveys they are highly sensitive to bias arising from the timing of surveys relative to when campaigns were conducted. To our knowledge, this study is the first to estimate subnational variation in ITN retention and the first to estimate the duration that ITNs remain in use, which is of greater epidemiological relevance than retention time. It also provides a novel framework to correct for biases in estimates of ITN use and access arising from when campaigns were conducted. Although campaigns have historically aided increasing ITN use and access over time, we estimate the mean duration of ITN use is consistently shorter than mean retention times in all regions. This raises questions about whether punctuated distribution of ITNs through campaigns is the optimal mechanism for maximising their effectiveness and cost-effectiveness. Maximising the cost-effectiveness of interventions has become increasingly pertinent in the current funding context, and consideration of alternative distribution strategies, such as increased distribution through continuous distribution channels, including school- or community-based distribution, may be warranted. Frameworks such as the one presented here, which take into account the potential for impact from different net types and the high variability of ITN duration and use, could support NMP decision making on how best to maximise impact from available funds. Whilst such frameworks may be a useful tool, local knowledge of factors impacting ITN access and use as well as operational decision making will be paramount for NMP-led tailoring of subnational strategies.”

      (W1.3) On L236, the link to SNT is stated: "the models indicate trends that can support subnational tailoring of ITNs". They could indeed, but SNT itself is not done in this paper. It seems to be about improving sub-national predictions of the impact of single ITN strategies, by taking into account sub-national variation in access and use duration. This is useful, and the model developed has novel aspects.

      Thank-you for highlighting this. We hope our updated title and response to W1.12 below help address this. Where relevant we have also framed our findings in relation to the World Health Organization’s Subnational tailoring of malaria strategies and interventions: refence manual which was published following our original submission; examples of this are highlighted in our response above to W1.2.

      (W1.4) Individual countries may have records on when nets were distributed to the regions rather than needing to use the annual country number of nets together with the DHS data. It could be helpful to say what the analysis steps would be in that case.

      We have now added the following text of appendix 3.2 to clarify how the methodology could be adapted:

      “In contexts where national malaria programmes or other stakeholders have knowledge of the timings of mass campaigns (i.e. when there is no uncertainty in ɸ<sub>ij</sub>), the methodology can be adapted by deterministically evaluating the time since the last campaign (equation S18) for each time point.”

      (W1.5) There were several assumptions that needed to be made in building the model. There is some validation of the timing of the distributions (L633 "verified where possible through discussion with interested parties nationally and internationally") and the fit of estimated access and use to survey data, and agreement between predictions of prevalence and MAP estimates. It would be helpful to say which assumptions are important for the results (and would be key knowledge gaps) and which would not make a difference. It might be possible to validate the net timing model using a country where net distributions are known reasonably well.

      Thank-you for raising this. We acknowledge that to investigate which assumptions are less likely to make a meaningful difference, we would ideally have conducted a full sensitivity analysis on these. This however would be challenging, since many of these are structural assumptions rather than numerical ones (for example, the assumption of an exponential decay in use and access) which would require the entire methodology to be adapted to conduct a sensitivity analysis. We did validate our estimated campaign timings against some known subnational campaign timings for Senegal. However, we could not source data on when all campaigns were conducted for all regions of Senegal to the nearest month to be able to conduct validation against this. We were also not able to source other use and access data from separate data sources to the DHS to be able to validate our discrete-time models of historical use and access. PfPR2-10 estimates are however fitted to equivalent MAP estimates. These were validated against DHS estimates of PfPR6-59mo, which were not used at any stage to fit our models. We have made slight changes to the original wording in relation to this at the end of appendix 5.2.

      (W1.6) What was assumed about what happens to old nets after a mass campaign was not clear. This assumption is likely to affect the predictions of access for the biennial distributions.

      To generate our initial estimates of the mean duration of use and retention time with our hierarchical model, we assume nets are only distributed to individuals who do not already have ITNs (appendix 2). This initial step is necessary for our methodology, but is relaxed later under our discrete-time model where we assume ITNs are distributed at random such that individuals with an ITN are equally likely to receive a new ITN (and replace their existing one) following a mass campaign (appendix 4). Much of the aforementioned sections has been rewritten and we hope this is now clearer.

      (W1.7) L312 and elsewhere: That use given access declines with net age is plausible. However, I wondered if this could be partly a consequence of the assumptions in the model (eg the two exponential decays for access and use, the possible assumption that new nets displace the current ones when there is a mass campaign).

      Declining use given access as nets age is not affected by model assumptions. Due to being fitted independently of each other, there are no constraints that would prevent a faster decay in access than use. Had the data supported this, this would have led to use given access increasing over time since the last campaign. The data did not support this. Further clarification that use and access are fitted independently of each other is has now been provided in the following text:

      “All subsequent analyses described are conducted independently for use and access”

      (W1.8) The Methods section on Estimating historical use and access seemed to be aimed at readers familiar with formulae, but I think it could lose other interested readers. It could be useful to explain a little more about what is happening at each step and also why.

      Thank-you for highlighting this. We have re-written this section in the main manuscript, now named ‘Historical use, access and retention times’, where we now only highlight key equations and provide a high-level overview of the methodological steps. We have sought to provide clearer explanations here behind the rationale for each step to ensure maximum accessibility for interested readers. The original wording was used as a basis for the newly provided series of appendices which provide further technical detail; this wording has also been heavily re-drafted to improve clarity of each step.

      (W1.9) The model was fitted to MAP estimates of PfPR2-10, which themselves come from a model. It may be that there is different uncertainty in the MAP estimates for different regions. I couldn't see this on the graph, but maybe the uncertainty is small. Was this taken into account in the fitting?

      We only used median MAP estimates of PfPR2-10 to calibrate the baseline EIR for each region in our model. We have clarified our rationale in appendix 5.2:

      “Since the relationship between baseline EIR and PfPR2-10 here is specific to malaria simulation, MAP uncertainty estimates were not propagated through to our estimates in baseline EIR since these would not faithfully represent its true uncertainty.”

      (W1.10) Was uncertainty from each estimated component integrated into the other components?

      Thank-you for highlighting this as this indicates we had failed to clearly indicate this. To confirm, we propagate uncertainty in each component through to our estimates of cases averted. New text has been provided to clarify this in the following text:

      “Region-specific uncertainty in ITN efficacy, use, retention, and the relative contributions of continuous and campaign channels is therefore propagated through to our estimates of cases averted.”

      Further details are also provided in the preceding text of the same paragraph. The central 95% credible intervals of cases averted shown in figures 5.C and 6 and associated figure supplements are reflective of this uncertainty.

      (W1.11) Eyeballing Figure 2 (Burkina Faso), there is a general pattern of decline in all the regions, some differences between the regions and some differences in how well the model fits between the regions. If possible, it could be helpful to say how much better the fit was when using regionspecific compared to countrywide parameter values for access and use, and how different the results would be.

      In the “Universal coverage: was it achievable under triennial mass campaigns” results section, we have now provided further emphasis that the observed decrease from DHS data may be driven by surveys being conducted progressively later in relation to the last campaign:

      “The observed decrease in use and access across many regions in Burkina Faso may therefore be a by-product of DHS surveys being conducted at progressively later dates relative to the most recent campaign; this does not necessarily indicate an underlying trend in decreasing use or access over longer timescales.”

      In the case of Burkina Faso (figure 2.A), aside from months when very small numbers of individuals were surveys where either 0% or 100% use or access was reported, no other data lie outside our 95% credible interval for any region.

      We are unable to generate comparisons with countrywide parameters as these are not generated when fitting our discrete-time model, even though they are a by-product of the initial hierarchical model used to generate initial estimates of region-specific ITN retention, which was a necessary methodological step. We hope the extensive revision of the text in the methods and appendices helps to improve the clarity on this. Where national estimates are provided, these are population-weighted means of the subnational median posterior estimates. New text is included in appendix 1 to clarify this:

      “National and continental values are reported as population-weighted summaries of the median subnational estimates generated from the discrete-time models”

      (W1.12) The question of moving from a campaign every three to every two years may not be the most pertinent question in the current funding landscape. I realise that a paper is in development for a long time, but it would be helpful to comment on what else the model could be used for when fewer rather than more nets are likely to be available.

      We acknowledge the funding landscape has changed substantially, but we still believe this work has important implications in the current context. We have emphasised this further in the following text:

      “If budget constraints necessitate the deprioritisation of campaigns, our results highlight that this should be avoided, if possible, in regions with moderate to high transmission intensity, particularly those with mean annual incidence exceeding 100– 150 clinical cases per 1,000 people. Shortening campaign intervals from three to two years in moderate- and high-transmission regions is projected to avert more cases than the additional cases that may arise from ceasing campaigns in some lower-transmission settings. Additionally, although pyrethroid–chlorfenapyr ITNs are more costly, the additional cases projected to be averted by them relative to pyrethroid-only and pyrethroid–PBO ITNs are substantial. In certain national contexts it may be more cost-effective for biennial pyrethroid-chlorfenapyr campaigns to be conducted in fewer subnational regions even under reduced budgets. However, more thorough economic analyses will be needed to understand this fully. Moreover, as ITNs remain one of the most cost-effective malaria control interventions, improving the impact of them could still be more cost-effective than the introduction of new untested interventions (Topazian et al., 2023; Schmit et al., 2024).”

      We have also related some of our findings to the WHO Subnational Tailoring Reference Manual (as highlighted in W1.2), which we hope better relates our findings to the current context.

      Reviewer #2 (Public review):

      Summary:

      The authors design a custom Bayesian model to estimate the probabilities of access, use and use given access of insecticide-treated nets in six African countries, providing sub-national estimates and inferring the average duration of ITN use and access. An individual-based model was employed to simulate malaria epidemics and estimate the effectiveness of different ITN distribution strategies. The study finds that the mean probability of use or access did not reach 80% (a universal coverage formely targeted by WHO) for any of the regions, even for biennial campaigns, demonstrates that switching from triennial to biennial distribution campaigns increases population use by 7.9%, and evaluates the impact of employing more efficient ITNs on P. falciparum prevalence.

      Strengths:

      The authors developed a data-driven model that accounts for data collection imperfections and sources of uncertainty while differentiating between ITN use and access. They developed a methodology to infer the timing of a mass campaign from publicly available data instead of assuming fixed dates. The probability of use given access allows for determining the regions where ITN distribution is least effective. This work can help better inform future interventions by identifying regions where increasing mass campaign frequency or employing better ITNs are most effective. Finally, in addition to insights on ITN access and use for the six countries analyzed, the paper contributes a methodological framework that can likely be extended to other countries.

      Weaknesses:

      Since the models employed are rather complex, the description of the methodology may be hard to follow for most readers. In addition, the models assume many hypotheses, including:

      (W2.1) Exponential decay of ITN use/access.

      We do acknowledge different modelling studies have typically assumed either an exponential decay or an “S-shaped” smooth-compact loss function, with many of these studies having been validated against cluster-randomised trial data for both functional forms. We believe the ITN age distribution data across the DHS surveys inspected provides reasonable evidence to support the use of an Exponential decay function here. We have now included a proof (appendix 2.1) demonstrating an exponentially distributed ITN age distribution will be yielded for an exponential decay function with the same rate parameter; this is true under periodic ITN distribution and becomes an approximation for a finite number of surveys. We now also included additional text (appendix 2.2) highlighting the empirical ITN age distributions appear to support our exponential decay assumption.

      (W2.2) The decay rates for the probability of the ITN repelling and killing a mosquito are the same.

      Although the same decay rate parameter (\gamma_N) is present in our expressions for the probability of repellency and mortality (equations (53) and (54)), the half-life of the latter is shorter, since repellency is assumed to decay towards a constant value. These structural forms are not unique to this paper but are shared among all malaria simulation-based studies with ITN interventions. This decay rate parameter has been estimated in previous studies (Sherrard-Smith et al., 2022; Churcher et al., 2024), and we carry through uncertainty estimates from those previous studies into the work presented here; additional text has been added to clarify this:

      “Uncertainty in ITN repellency and mortality parameters (equation (53) and (54)) is also propagated forward to this study by simulating random draws from previous posterior distributions (Sherrard-Smith et al., 2022; Churcher et al., 2024) across each distribution event and realisation.”

      (W2.3) Given a time instant, all individuals in the same administrative unit and have the same probability of using a net;

      Our discrete-time model estimates the proportion of the population with use and access at each time instant. We purposefully do not conflate this with the probability of use and access, which can vary between individuals within the same subnational unit of analysis (urban and rural regions of each administrative-one area). We are grateful this point has been raised as it indicates we had not communicated this sufficiently clearly before. We hope the extensive re-draft of the ‘Historical use, access and retention times’ methods section has helped address this, in particular in the following text preceeding equation (7):

      “We do not assume the probability of access is the same for all individuals in a region at a given point in time. Instead, we assume the probability any given individual has access to an ITN at time t<sub>j</sub> can be described by a Beta distribution”

      (W2.4) ITN use/access decay models do not depend on the distribution strategy (e.g. bienal vs trienal distribution).

      We may not have fully understood this point, but in terms of our historical models of use and access, assumptions are not imposed on the frequency of previous campaigns. Instead, historical campaign timings are estimated from data from DHS surveys and the AMP Net Mapping Project (now detailed in appendix 3.1); historical estimated intervals could be either two or three years (or indeed any interval) as informed by this data. In terms of the duration of use and retention time, these are estimates how long a net would continue to be used, or provide access, if an individual were not to replace it at earlier date; these estimates are therefore independent of campaign intervals, and we have now added addition text to provide additional clarity:

      “However, throughout this study, the durations of use and retention time are always estimates of how long an individual continues to use or have access to a net in the absence of future replacement; estimates of these are therefore reflective of behaviour or ITN durability and not distribution patterns themselves.”

      We do acknowledge under our approach, use immediately following a campaign is agnostic of campaign frequency; however, given an absence of data on how use changes following a switch from triennial to biennial campaigns, we believe this was a reasonably conservative assumption. Further confirmation is now provided in the following text, with additional preceding context:

      “Future campaigns, whether conducted every two or three years, are therefore assumed to achieve a consistent initial level of use.”

      (W2.5) The Bayesian model assumes some narrow prior distributions.

      Thank-you for highlighting this. We acknowledge the need for further justification for the choice of priors. We have provided this in depth for the hierarchical model of the mean duration of use and access (in appendix 2.2). Further justification for the choice of priors for the discrete-time model are also now provided in appendix 4.2).

      The impact of these hypotheses on the estimated parameters is not explored in the paper, and no sensitivity analyses are performed, although some limitations are discussed.

      We fully acknowledge we had not conducted sensitivity analyses for many of our assumptions, and we have now tried to provide better justification for our assumptions. The assumptions most likely to influence inference are structural components of the modelling framework rather than scalar parameters that can be varied independently in a conventional sensitivity analysis. Many of the assumptions highlighted above are structural, such as the assumption of an exponential decay (W2.1). In the case of our assumption of exponential decay, multiple elements of the methodology are restricted by this (for example, when correcting for biases that arise from nets being lost between campaigns and survey times when estimating the timing of campaigns in appendix 3.1). Investigating the sensitivity of this assumption over an assumed smooth compact function would require extensive adaptation of the methodology that would be beyond the scope of this paper. Some other assumptions, such the assumption of the same decay rate parameter for repellency and mortality (W2.2) have been estimated in the previous studies referenced and have been validated against cluster-randomised, controlled trials. We nevertheless recognise our justification of some assumptions could have been expanded upon previously, and we hope the changes highlighted above go towards addressing this.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (R1.1) I looked for the reference WHO 2024b for the recent optimal allocation guideline, but there were just three WHO 2024 references in the bibliography. In addition, what exactly the 80% rule applies to is not clear - this could be explained so it is clearer what result to compare to it (or explain that the rule itself is not clear).

      We have used the eLife LaTeX/BibTex template for citations throughout and acknowledge this doesn’t show letter suffixes in the reference list for multiple author-year entries. We unsure of how to address this given this is generated by the official template, though we note that when citations are clicked on in the document, the relevant citation is then shown at the top of the page on the web version.

      (R1.2) L24 'estimated', but this seems more like a prediction. The words 'estimated' and 'predicted' should be carefully used throughout when combining statistical and mechanistic modelling.

      This has now been changed.

      (R1.3) The point estimates should always have measures of uncertainty.

      The rationale for the omission of credible intervals for some point estimates has now been clarified in the manuscript (appendix 1). The following text has been added:

      “Additionally, in relation to uncertainty estimates, credible intervals are shown for all subnational quantities that are directly estimated in our models. National and continental values are reported as population-weighted summaries of the median subnational estimates generated from the discrete-time models (appendix 4) and therefore do not correspond to explicitly estimated model parameters, so credible intervals are not shown for these aggregated estimates.”

      (R1.4) It would be helpful to justify the choice of ADM1 as the geographical unit.

      We have clarified the rationale for this on the following text:

      “Here, (subnational) regions are defined as the first administrative unit below the country level and are further divided into rural and urban areas to align with DHS stratification”

      (R1.5) The terminology was slightly confusing: in some places, it sounded as if regions were the sub-national regions, in others as if they were different things (eg L74, L105). L45 'and' seems odd here.

      ‘Region’ is used interchangeably with ‘subnational region’ at points in the paper to aid the flow of the text. We hope the use of paratheses around (subnational) in the updated text quoted above (and on the following text) helps provide clarity:

      “here, the units of analysis are consistently referred to as (subnational) regions”

      (R1.6) Spurious accuracy in some estimates, e.g. L52.

      This was a result cited from Bertozzi-Villa et al. (2021) for which uncertainty estimates were not available. We hope the response to R1.3 above helps clarify the rationale for omitting credible intervals for some estimates generated here.

      (R1.7) L68 'lose' instead of 'loose'.

      Now corrected.

      (R1.8) L534. I suspect that the model was actually fitted in Stan via the R interface rstan.

      Language adjusted accordingly.

      (R1.9) L633 'through' rather than 'though'.

      This section has been heavily redrafted and we have checked for typos.

      Reviewer #2 (Recommendations for the authors):

      The paper is well-written and presents an important contribution to better aid interventions. The proposed models are reasonable, but because of their complexity, even readers who work with epidemic modelling might have issues understanding the methodology.

      We thank the reviewer for highlighting that the methodology may be difficult to follow. The methods section has now been substantially rewritten to provide a clearer conceptual description of the modelling framework, with detailed model specification and derivations moved to the appendices. We hope this restructuring will allow readers to follow the modelling approach at a high level in the main text with technical details contained in the appendices.

      To improve the clarity of the methods section, I suggest:

      (R2.1) Include a list of symbols with the meaning of each variable defined in the text.

      Definitions for symbols are now also shown in appendix 1 – tables 1-5.

      (R2.2) Include a centralized full description of each model, clearly stating the priors and likelihood (similarly to a Stan code).

      There are two models that are fitted with Stan (the hierarchical retention model and discrete-time use/access model). To improve clarity for the hierarchical model, priors are now presented in a single block (equations 11 – 17) in appendix 2.2, with the likelihood (equation 18). For the discrete-time model, we have split the presentation of the priors (equations 37 – 42) and the likelihood expressions (equations 43 – 45) into different subsections (respectively appendices 4.2 and 4.3).

      (R2.3) If needed, include additional data preprocessing in the form of an algorithm.

      Although we have not included an algorithm outlining the preprocessing steps, we have ensured sufficient detail has been provided to facilitate replicability. For example, in appendix 1, we now outline how use and access are inferred from DHS data:

      “ITN use is inferred from DHS data (ICF, 2025) on whether individuals slept under an ITN the previous night, while all individuals who used an ITN are assumed to have access; when fewer than two individuals used an ITN, the ITN is assumed to be able to provide access at random to up to two individuals in a household.”

      (R2.4) Mention the main hypotheses and limitations of the model in the main text.

      We have ensured key assumptions of the model are stated in the re-written ‘Historical use, access and retention times’ methods subsection; for example, in the following text:

      “Due to the sparsity and irregularity of DHS and MIS surveys, we were unable to investigate seasonal fluctuations in either access or use; we therefore assume that nets provide access or are used continuously over some period of time.”

      (R2.5) Including a flowchart or diagram that provides an overview of the proposed framework could be helpful.

      We have now included a flowchart of methodological steps in appendix 1 – figure 1.

      (R2.6) Line 89: Define NMP before presenting the acronym.

      We have ensured this is defined in the first instance on line 39.

      (R2.7) Equation (1): Explain why you chose the Exponential distribution (e.g. constant hazard), as this is one of the main hypotheses of the model.

      As highlighted in our response to W2.1, we have now included justification of this assumption in the final paragraph of appendix 2.2.

      (R2.8) Equation (2): Although Equation (2) passes a clear message of how alpha_i^x is distributed, I wonder if it is mathematically correct to express the limit this way, since the argument of the limit is a random variable. Maybe the limit should be applied to gamma_i^x instead.

      Thank-you for highlighting this. We acknowledge the limit behaviour was expressed in a short-hand manner that is not strictly mathematically correct. Indeed, the limit should be applied to the decay rate parameter gamma (now shown in equation 10). In appendix 2.1, we have now provided a proof demonstrating the rate parameter of the pooled ITN age distribution should tend to the same decay rate as the assumed exponential loss function.

      (R2.9) I think the difference between pho_i^x (Equation (1)) and alpha_i^x (Equation (2)) is not very clear in the text.

      In the context of access, rho_{i(l)} and alpha_{i(l)} are respectively the duration an ITN l is retained for and its age at the time of a survey. We hope the redrafted appendices make this clearer, in addition to the inclusion of the new parameter tables in appendix 1.

      (R2.10) Line 479: Typo (and or).

      Updated wording is now contained in appendix 2.

      (R2.11) Line 711: Typo (The limit is equal to infinity).

      This has now been corrected.

      (R2.12) Equation (15): I could not understand this equation. What is rho(s) and rho(s \in I), where I is one of the intervals mentioned in this equation?

      Rho(tau_ik) was introduced as simplified notation for the probability density of the timing of campaign k in region i (tau_ik) but we acknowledge this was not explained clearly. We also acknowledge this equation presented a lot of concepts at once. The equation attempted to describe the probability density of the last campaign in region i relative to time t_j, denoted phi_ij. We no longer make use of this previously notation (rho) for the probability density. This equation has been updated to equation (30), with incremental explanation of its construction now provided on lines in appendix 3.2.

      (R2.13) Line 642: What is t?

      The use of $t_j \ni t$ was previously used to indicate that the discrete time point t_j lies within continuous time t. We acknowledge this was a non-standard use of notation and was not clearly explained. This section (now in appendix 4) has been rewritten without this notation. The use of t and t_j to denote continuous time and discrete time points respectively is now defined in the core notation table (appendix 1 – table 1).

      (R2.14) The proposed model has narrow hyperhyperpriors because of convergence issues. Are the estimated parameters sensitive to the choice of hyperhyperpriors?

      We acknowledge limited justification was previously provided for the choice of hyperhyperpriors. We have now provided additional justification within appendix 2.2.

      (R2.15) Since the proposed Bayesian models are relatively complex, it might be useful to provide convergence diagnostic plots in the supplement.

      Convergence diagnostics were inspected using the ShinyStan packagxe. Chains showed satisfactory convergence based on standard diagnostics. We have not included diagnostic plots due to the large number of parameters in the fitted models. Under the hierarchical model (appendix 2) for ITN use, 146 region-specific parameters (one for each region), 12 country-level hyperparameters (two for each country), and four hyperhyperparameters were estimated. Under the discrete-time model (appendix 4), a further 876 parameters (six for each region) were estimated. In total, 1,038 parameters were fitted for the ITN use models. The same number of parameters were estimated for the ITN access models, giving a total of 2,076 estimated parameters.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Responses to the reviewers

      First of all, we would like to thank all the reviewers for their valuable and constructive comments on our manuscript. We have considered each comment and revised the manuscript accordingly. We respond to each comment below in blue font.

      To Reviewer #1

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *Iwase et al have used multiomics and spatial transcriptomics to comprehensively map neural crest cell contributions to the mouse heart and great arteries. This careful and detailed analysis reveals changes in the transcriptional profile of neural crest cells as they give rise to different regions and cell types in the heart and great vessels. The study significantly builds on a number of recent scRNA-seq analyses of neural crest cell development and includes development of a new informatic tool for regulatory network investigation. Among the new findings documented are downregulation of Hox gene expression in intracardiac crest cells and regulation of Sox9 by Meis transcription factors. Addressing the following points would improve clarity and accessibility. *

      Thank you for your encouraging feedback and comments. We have responded to your comments below.

      * In Figure 1C it is difficult to visualize all the colors given the mixed contribution of NCC and nonNCC cells to mesenchyme. Please also show YFP transcript distribution in NCC versus nonNCC plots. In addition, it would be helpful to show plots for both NCC and nonNCC for Gata4 and Tbx20. *

      To improve visualization, we separated EYFP-positive NCCs and EYFP-negative non-NCCs into distinct plots (Figure 1e), rather than displaying EYFP transcript distribution within a single combined plot. In addition, we have included separate plots for both NCCs and non-NCCs showing the expression of Gata4 and Tbx20 (Figure 1g, h).

       Furthermore, in the revised manuscript, we subdivided the original clusters c5 and c13 into two subclusters each, resulting in a total of 23 clusters in the UMAP shown in Figure 1. This refinement was introduced to facilitate clearer interpretation and subsequent analyses.
      

      * The authors identify a cardiomyocyte cell cluster in their integrated NCC scRNA-seq plots. Are these cells labelled by Wnt1-Cre in the authors' own dataset? Is the trajectory analysis informative as to the steps preceding acquisition of cardiomyocyte fate? *

      A total of 20 EYFP-positive NCCs in our own dataset were assigned to the cardiomyocyte cluster in the integrated UMAP. Of these, 6 cells were located within the cardiomyocyte cluster (c21), while the remaining cells were classified as pharyngeal mesenchyme but positioned in close proximity to c21 in the original UMAP (Figure 1e). Although this observation is consistent with previous reports showing that NCCs give rise to cardiomyocytes, the small number of cells precluded meaningful characterization or trajectory analysis of cardiomyocyte fate acquisition. Accordingly, we have addressed this point only briefly in the revised manuscript as follows:

      Only a few NCCs were detected within cardiomyocyte clusters, which were predominantly composed of non-NCCs, consistent with previous reports demonstrating NCC differentiation into cardiomyocytes15,16. The overall number of cardiomyocytes was low, likely reflecting the restricted sampling of the cardiac outflow tract (Figure 1a).

      (P6, L20–25)

      * Linked with this point, is it possible that there are nonNCC cells in the integrated plots? Of note, many of the NCC genes overlap with genes that have also been shown to be expressed in mesodermal cardiac progenitors (including Osr1, Pparg, Dlk1, Tcf21, Ebf2, Tbx20, Sox9). For example, is it possible to distinguish NCC derived smooth muscle within the heart from cells originating from the second heart field that may express smooth muscle genes? Cluster 27 for example appears broadly expressed in the region of ventricular outlets in Figure 3. Comparison with YFP transcript distribution may be helpful here. *

      In principle, non-NCCs were excluded from the integrated plots shown in Figure 3. However, we cannot completely rule out the possibility that a small number of non-NCC cells were inadvertently included, for example due to false-positive signals during cell isolation. In our dataset, NCCs and non-NCCs were stringently distinguished based on FACS profiles, detection of EYFP transcript reads in the RNA-seq data, and rigorous exclusion of doublets and low-quality cells.

       Regarding the distinction between NCC- and non-NCC-derived SMCs, a key challenge lies in defining comparable cell populations. We first validated the annotations of SMC clusters (C4, C23, and C27) using immunostaining for Myh11, Sost, and Reln, confirming consistency with their transcriptomic identities. We then re-clustered the SMC populations and projected non-NCC SMCs (clusters 4 and 20 in Figure 1d) onto this UMAP. These non-NCC SMCs were mapped to clusters corresponding to great artery and coronary artery SMCs (C27 and C23, respectively).
      
       However, we lack confidence that these projected populations are directly comparable. For example, non-NCC SMCs mapped to C23 or C27 may not necessarily represent bona fide coronary or great artery SMCs from equivalent anatomical regions, and could include other SMC subtypes such as venous SMCs or pericytes. Given the known regional heterogeneity of SMCs and the absence of strict spatial matching criteria, such comparisons would be difficult to interpret. This limitation is further compounded by the relatively small number of cells available.
      
       For these reasons, we focused on spatial validation of cluster annotations by immunostaining in this study, and have reserved detailed comparisons between NCC- and non-NCC-derived SMCs for future work. We believe this does not detract from the overall consistency or value of the present study.
      

      * Can the authors add any validation of key expression patterns, for example using fluorescent in situ hybridization? *

      Figures 2n-y present Xenium-based multiplexed fluorescent in situ hybridization data that validate the spatial expression patterns of marker genes characterizing NCC derivatives in pharyngeal mesenchyme, intracardiac mesenchyme, and SMC populations. In addition, we have incorporated new Xenium images highlighting key gene expression patterns in the aorticopulmonary (AP) septum at E12.5 (Figure S4), supporting the annotation of cluster C16 in the integrated UMAP as corresponding primarily to the AP septum. We have also added immunostaining data for Myh11, Sost, and Reln to further validate the annotations of SMC clusters (see the response above). Together, these data provide independent spatial confirmation of the transcriptional signatures identified in our single-cell analyses. Based on these data, we revised the relevant section of the Results as follows:

      The SMC clusters, which were continuous with the pharyngeal mesenchyme via transitional populations in the UMAP, were identified by high expression of the mature SMC marker Myh11 (Figure 3i). Differential gene expression analysis further distinguished individual clusters (Figure S4a-g). Among these, C27 displayed a transcriptomic profile characteristic of the great artery SMCs, including high expression of Sost (Figure S3j). C4 was enriched for Tfap2b and Ptger4 (Figure S3j), markers of the ductus arteriosus SMCs21,22, supporting its annotation. C0 and C7 likely represent transitional states between pharyngeal mesenchyme and differentiated lineages, potentially bifurcating toward great artery SMCs or cardiac cushion mesenchyme (Figure S3a and Table S6). C23 was characterized by high expression of Gja4, a marker of coronary artery SMCs, along with pericyte markers Kcnj8 and Rgs5 (Figure 3j and Figure S3k), corresponding to the cluster similarly annotated by Chen et al14. In addition, C23 was also distinguished from C4 and C27 by its expression of Reln (Figure S4d).

      • Immunostaining supported these cluster annotations. Sost expression is observed in great artery SMCs but not in ductus arteriosus and coronary artery SMCs, whereas Myh11 expression was higher in ductus arteriosus and coronary artery SMCs than in aortic SMCs (Figure S4f-m). Furthermore, Reln expression was restricted to coronary artery SMCs (Figure S4n-s).*

      (P8, L30 – P9, L14)

      * Please elaborate on the decoded Hox code patterns that appear to be indicative of arch origins. Do the results allow determination of whether the trajectories to different cardiac fates inferred in Figure 3D differ in different arches? *

      • *

      In response to the reviewer’s suggestion, we have further elaborated on the decoded Hox code patterns indicative of pharyngeal arch origin and examined whether trajectories toward distinct cardiac fates differ between arch-derived NCC populations (see new supplemental figure).

      To further delineate Hox code patterns associated with pharyngeal arch origin, we stratified the integrated UMAP by distinct Hox expression profiles (Figure S9). Cells expressing any Hox2 paralog, but lacking Hox3–5 paralogs, were defined as PA2-derived preotic NCCs, whereas cells expressing any of Hox3–5 paralogs were classified as PA3/4/6-derived postotic NCCs. Preotic, postotic, and Hox-negative populations were then projected onto the integrated UMAP across developmental stages (E10.5–E14.5). Trajectory inference indicated that transitions toward intracardiac mesenchyme occur earlier in preotic cells (E10.5) than in postotic cells (E11.5), consistent with their known sequential migration into the cardiac cushion8. From E12.5 to E14.5, postotic cells showed a progressive emergence of the aorticopulmonary septum–associated cluster C16 from transitional states. Notably, the proportion of Hox-negative cells increased within intracardiac mesenchyme, except in C16 where Hox expression was retained, supporting the notion that Hox genes are broadly downregulated in cushion-associated intracardiac NCCs (Figure 4k, S9).

      (P11, L24 – P12, L4)

      * The authors need to explain why the authors place an arrow from mesenchymal cluster 18 to 23 in Figure 3D while the trajectory analysis in 3C predicts the opposite direction. *

      RNA velocity analysis of scRNA-seq data is fundamentally based on splicing dynamics. The original framework assumes that transcriptional induction and repression persist long enough for cells to reach active (transcribing) or inactive (silenced) steady-state equilibrium. However, this assumption is often violated during cell differentiation, where transient cell populations frequently exhibit rapidly changing mRNA levels that do not reach steady-state equilibrium. To address this limitation, the scVelo method was developed (Bergen et al., 2020, Nature Biotechnology), and we applied this approach to the integrated NCC datasets in the present study. This analysis successfully inferred directional flows from the pharyngeal mesenchyme toward SMCs and intracardiac mesenchyme through transitional states (Figure 3c,d). However, as the reviewer correctly pointed out, the analysis predicted a directional flow from C23 to C18, apparently opposite to the biological directionality supported by previous findings (see P9, L15–19).

       We consider this discrepancy to reflect intrinsic limitations of RNA velocity analysis. As discussed by Bergen et al., in systems containing multiple lineages and cellular processes, differences in gene regulatory networks among heterogeneous cell states can generate multiple trajectories in phase space owing to distinct splicing kinetics. In addition, incompletely captured splicing kinetics may represent only a limited portion of the overall dynamics, particularly near terminal differentiation states. In such cases, phase portraits of unspliced versus spliced transcripts may appear nearly linear rather than curved, potentially leading to erroneous assignment of positive or negative RNA velocities. Consistent with this limitation, we obtained opposite directionalities between C18 and C23 depending on whether steady-state or dynamical models were applied and according to different parameter settings. Through these repeated computational re-evaluations of lineage directionality, we concluded that RNA velocity analysis is suitable for capturing the global landscape of differentiation flow, but that accurate inference of local lineage directionality may require careful model selection and parameter optimization to ensure consistency with established biological evidence.
      
       To avoid arbitrariness and potential confusion, we removed the arrow between C18 and C23 from the revised Figure 3d. Instead, we now describe the observed continuity between these populations in the Discussion section as follows:
      

      Notably, the UMAP revealed a continuum between C23 and C18 within the intracardiac mesenchyme population. Given previous findings that the proximal coronary artery SMCs originate from preotic NCCs8 and that pericytes give rise to coronary artery SMCs23, this connection likely represent a differentiation trajectory from intracardiac mesenchyme to coronary artery SMCs via a pericyte-like intermediate stage.

      (P9, L15–19)

      The continuity between intracardiac mesenchyme and coronary artery SMCs through a pericyte-like intermediate state is consistent with previous developmental studies showing that proximal coronary artery SMCs originate from preotic NCCs and may arise through pericyte intermediates8,21.

      (P18, L24-27)

      * The authors nicely show downregulation of Hox gene expression in NCC cells entering the heart. Can they add discussion of any insights into this from prior studies of loss or gain of Hox gene function? *

      • *

      We have added the following discussion on the roles of anterior Hox genes in cardiovascular development, together with appropriate references on loss or gain of Hox gene function:

      • The regional identities of pharyngeal NCCs that contribute to cardiac development are established by Hox genes and their associated regulatory networks. Genetic studies have demonstrated essential roles for the anterior Hox genes in patterning the pharyngeal arch artery system and semilunar valve structures. Loss of Hoxa1 and Hoxb1 results in severe defects in pharyngeal arch artery development42, whereas ectopic or sustained expression of Hoxb1 in NCCs disrupts cardiovascular morphogenesis and causes malformations of the great arteries and semilunar valves43. Likewise, Hoxa3 contributes to proper patterning of the pharyngeal arch region and its NCC-derived derivatives44,45. These findings underscore the importance of precise spatial and temporal regulation of Hox genes during cardiovascular development. Among pharyngeal NCCs contributing to cardiac development, cushion-independent NCC derivatives (great artery SMCs and the aorticopulmonary septum) retain their origin-specific Hox-codes. In contrast, cushion-associated NCC derivatives (coronary artery SMCs and valvular/subvalvular interstitial cells) downregulate Hox expression and transition toward region-specific GRNs involving TFs such as Tbx20 and Gata4, whose expression is known to be induced by BMP signaling in cardiomyocytes46,47. Bmp2 and Bmp4 are expressed in the regions of the pericardial reflection traversed by NCCs en route to the cardiac cushion48. Together, these observations suggest that appropriate repression of Hox programs, coupled with activation of cardiac-specific regulatory networks, is required for normal differentiation of cushion-associated NCC derivatives.*

      (P18, L30 – P19, L16)

      • Roux, M. et al. Hoxa1 and Hoxb1 are required for pharyngeal arch artery development. Mech. Dev. 143, 1–8 (2017).*
      • Zaffran, S., Odelin, G., Stefanovic, S., Lescroart, F. & Etchevers, H. C. Ectopic expression of Hoxb1 induces cardiac and craniofacial malformations. genesis 56, (2018).*
      • Chisaka, O. & Capecchi, M. R. Regionally restricted developmental defects resulting from targeted disruption of the mouse homeobox gene hox-1.5. Nature 350, 473–479 (1991).*
      • Kameda, Y., Watari-Goshima, N., Nishimaki, T. & Chisaka, O. Disruption of the Hoxa3 homeobox gene results in anomalies of the carotid artery system and the arterial baroreceptors. Cell Tissue Res. 311, 343–352 (2003).*

        In addition, we previously generated conditional Hoxa2 overexpression mice and demonstrated that ectopic Hoxa2 expression in Hox-negative PA1 cranial neural crest derivatives induced PA2-like structures, indicating a partial homeotic transformation (Kitazawa et al., Developmental Biology, 2015; 10.1016/j.ydbio.2015.04.007). Because cardiovascular phenotypes were not examined in that study, we have now resumed breeding of these mice for detailed cardiovascular phenotypic analyses.

        In parallel, we have also established knockout mice for a downstream target of Hoxa2, which are expected to complement the Hoxa2 gain-of-function model and provide further insight into the regulatory mechanisms underlying cardiac NCC differentiation and patterning. Although we have obtained preliminary observations from these models, a comprehensive analysis is still ongoing, and we therefore prefer to reserve these results for a future study with more detailed investigation.*

      • Figure 3Y could be simplified to more clearly distinguish the two types of Meis binding sites. For example, it may be helpful to reorder the mesenchymal cell types based on Hox expression. *

      To improve clarity and better distinguish the two types of Meis binding sites, we have reordered the heatmap of motif enrichment based on the hierarchical clustering with the updated JASPAR2024 database. In parallel, we have revised the heatmap of transcription factor gene expression to provide a more consistent and interpretable presentation. These diagrams were now presented as Figures 4k and 4l of the revised manuscript.

      * The authors provide nice in vitro and in vivo evidence for an upstream role of Meis transcription factors in regulating Sox9 expression. Can the authors identify from the enhancer sequence (or their transcriptomic dataset) any of the non-Hox transcription factors that Meis may be working with here? Please discuss the significance of Sox9 expression in epicardium driven by the same enhancer. Might this regulation also operate in second heart field progenitor cells where both genes are expressed? It is not evident in Figure 7 that Sox9-EGFP is also expressed in epicardium. *

      The distal Sox9 enhancer containing Meis2 binding site that we identified (chr11-112850240-112851186) also contains several consensus motifs including predicted Hand2- and Nfatc1-binding sites. However, to our knowledge, these transcription factors have not been reported as non-Hox partners of Meis proteins. Their recruitment to this enhancer, as well as potential cooperative interactions with Meis transcription factors, were not examined in the present study and remain subjects for future investigation.

       As the reviewer pointed out, Sox9 was expressed not only in intracardiac NCCs but also in the epicardium (revised Figure S13). Consistent with this expression pattern, distal *Sox9* enhancer was accessible in both intracardiac mesenchyme and epicardial cells (revised Figure S13). Especially, *Wt1*low/*Sox9*high mesenchymal cells, likely derived from the epicardium via epithelial-mesenchymal transition, also exhibit chromatin accessibility at this enhancer comparable to that observed in NCC-derived mesenchymal cells. These findings suggest that the same regulatory element may function across multiple cardiac lineages.
      
       We have addressed these points in the revised manuscript as follows, including additional supporting data in the supplementary figures.
      

      *Enhancer activity in the epicardium corresponds to Sox9 expression and an open chromatin peak at the putative distal enhancer region in clusters 22 and 5 in Figure 1d, which represent Wt1high epicardial cells and intracardiac mesenchyme likely including Wt1low epicardial EMT derivatives, respectively (Figure S13). *

      (P16, L28–32)

      • *

      • Could this approach yield similar data for Osr1? Please clarify if there is any experimental evidence supporting the predicted negative regulation of Sox9 by Osr1 in the heart illustrated in Figure 8. *

      There are currently no experimental data demonstrating Sox9 repression by Osr1 in the heart. However, such an effect has been reported in tongue and limb mesenchyme (Liu et al., PNAS, 2013), as noted in P16, L5–7. Although direct experimental validation, such as Osr1 overexpression in cardiopharyngeal NCCs, would provide stronger evidence, preparation of this specific NCC lineage is difficult. Given this limitation, we instead performed in silico gene perturbation analysis using CellOracle, which predicted antagonistic roles for Sox9 and Osr1 during lineage bifurcation from pharyngeal NCCs. Because this regulatory relationship has not yet been experimentally validated in the cardiac context, we revised the illustration accordingly by adding a question mark to indicate the hypothetical nature of this antagonism (Figure 9 in the revised version).

      * Concerning the links between valve mesenchyme and skeletogenic programs it would be relevant to cite the earlier work of Lincoln and Yutzey (reviewed in PMID: 16643886): *

      • *

      We cited the suggested work in the relevant portion of the Discussion section as follows:

      By analogy, Sox9high/Scxhigh NCCs at the base of semilunar valves may form a structural attachment unit linking cushion tissues to valvular leaflets57.

      (P20, L30–32)

      • Lincoln, J., Lange, A. W. & Yutzey, K. E. Hearts and bones: Shared regulatory mechanisms in heart valve, cartilage, tendon, and bone development. Dev. Biol. 294, 292–302 (2006).* *
      • In order to increase accessibility of the dataset the authors are encouraged to include a browser link. *

      We agree with the reviewer that improving dataset accessibility is important for facilitating exploration of NCC diversity. Therefore, we have already uploaded our original fastq sequence files and count matrices in the DDBJ and GEO servers. In addition, we will upload our gene expression datasets projected onto the UMAP in UCSC cell browser, enabling readers to more easily visualize and interrogate the dataset.

      Minor points:

      *13. The authors could rephrase the title since the term topographical genetic switch is unclear. *

      • *

      We thank the reviewer for this suggestion. To improve clarity, we revised the title as follows:

      Hox–Meis-relayed spatial gene regulatory transition underlies cardiopharyngeal neural crest diversification revealed by multimodal analysis * 14. In the introduction, with reference to the De Bono study, please note that Tbx1 was shown to regulate pharyngeal NCC differentiation stage transitions non-cell autonomously. *

      According to the reviewer’s suggestion, we rephrased the relevant section of the introduction as follows:

      De Bono et al. elaborated the transition of pharyngeal NCCs through multiple differentiating stages toward SMC fates, identifying Tbx2 and Tbx3 as key TFs in this process13. They also showed that Tbx1, the gene for 22q11.2 deletion syndrome, regulates pharyngeal NCC differentiation stage transitions non-cell autonomously13.

      (P5, L1–4)

      Reviewer #1 (Significance (Required)):

      *Iwase et al have used multiomics and spatial transcriptomics to comprehensively map neural crest cell contributions to the mouse heart and great arteries. This careful and detailed analysis reveals changes in the transcriptional profile of neural crest cells as they give rise to different regions and cell types in the heart and great vessels. The study significantly builds on a number of recent scRNA-seq analyses of neural crest cell development and includes development of a new informatic tool for regulatory network investigation. Among the new findings documented are downregulation of Hox gene expression in intracardiac crest cells and regulation of Sox9 by Meis transcription factors. *

      Again, thank you for giving us the opportunity to strengthen our manuscript with your valuable comments and queries. We have worked hard to incorporate your feedback and hope that these revisions persuade and satisfy you.

      To Reviewer #2

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary *In this manuscript, Iwase et al. cleverly make use of different modalities, spatial transcriptomics and single-cell omics datasets, in conjunction with a well-established Wnt1-Cre;R26R-EYFP line to trace neural crest cells (NCCs) contributing to the cardiovascular system during embryonic development in the mouse. By doing so, the authors identified a bifurcation between cardiac NCCs contributing to the OFT cushions and forming the aorticopulmonary (AP) septation complex. Thus, the authors split "intracardiac NCCs" into two different NCC programs/compartments, even though both reside in the broad OFT region. The NCCs that enter and associate with the OFT cushions undergo a Hox off transition (Hox-positive to Hox-negative once intracardiac), with a corresponding shift in Meis binding and GRN wiring. The authors propose these cells pass through a Meis2-Sox9-Scx "skeletogenic progenitor-like" intermediate and contribute to semilunar valves and coronary artery smooth muscle. By contrast, the NCCs assigned to aorticopulmonary septum (APS) formation and great vessel smooth muscle retain a distinct Hox codes. *

      Thank you for your encouraging feedback and comments. We have responded to your comments below.

      Major Comments *1. The manuscript would benefit from clearer delineation between the different NCC contributions, particularly for non-specialist readers. The distinction between (i) CNCCs in OFT cushions and (ii) CNCCs forming the aorticopulmonary (AP) septation complex is not adequately explained. While both populations contribute to OFT septation, according to the authors, they represent distinct compartments with different developmental trajectories. The authors could clarify this using anatomically labelled hearts at the stages they conduct their analysis, along with additional text and schematics explaining what is meant by each compartment. This would greatly enhance the accessibility of the manuscript. *

      To clarify the distinction between the two components of intracardiac NCCs, cushion-associated mesenchyme and cushion-independent aorticopulmonary (AP) septum, we substantially revised the description of cluster characterization (P9, L24–P10, L6). We added new figures (Figure S5) showing their spatial relationships and distinct gene expression signatures, including E12.5 Xenium data demonstrating Vegfc expression in cushion-associated NCCs and Tcf24 expression in AP septum NCCs.

       In addition, we revised the schematic diagram of intracardiac NCC distribution in Figure 4m and added Figure S9, which spatially delineates three major NCC trajectories with distinct Hox codes: (1) migration of preotic NCCs into the outflow tract cushions, (2) migration of postotic NCCs into the outflow tract cushions, and (3) protrusion of the AP septum from the dorsal wall of the aortic sac. We also incorporated additional explanatory text and cited relevant review articles on cardiac outflow tract development. We hope that these revisions substantially improve the clarity and accessibility of the manuscript, particularly for non-specialist readers.
      
      • The Methods state that "Decomposition of cell clusters of scRNA-seq was performed by RCTD to map them onto the Xenium dataset," but this description is insufficient. The authors should clarify whether RCTD was applied separately for each developmental stage (i.e., E11.5 scRNA-seq reference for E11.5 Xenium, E12.5 reference for E12.5 Xenium), or whether a pooled reference was used across stages. This clarification is important because RCTD performance depends critically on the correspondence between reference and target datasets. Using a multi-stage integrated reference to deconvolve stage-specific spatial data could introduce artifacts, as cell states and cluster compositions vary considerably across developmental timepoints. *

      The pooled scRNA-seq data of NCCs or non-NCCs at E11.5 and E12.5 were used for RCTD to decompose for the spatial allocation in the Xenium dataset. We agree with the reviewer that a multi-stage integrated reference may introduce artifacts. In fact, datasets at E11.5 and E12.5 were similarly distributed in UMAP space and exhibited similar transcriptomic signatures, whereas those at E14.5 and E17.5 demonstrated different characteristics in the integrated UMAP in Figure. 1c. Therefore, we used only E11.5 and E12.5 datasets for RCTD decomposition. We added the bellow sentence in Method section.

      The pooled scRNA-seq data of NCCs or non-NCCs at E11.5 and E12.5 were used for RCTD.

      (P33, L6)

      • *

      • Figure 2j-m needs annotations and schematics. It is currently very difficult to identify the different compartments. See Figure C in Chen et al. for an example of this approach. In addition, what are the fine clusters from 1 to 20? Which ones are NCC-derived? *

      We added anatomical annotations to the revised Figure 2a, b, j–m to facilitate identification of the different compartments. The fine clusters labeled 0–20 in the previous Figure 2j–m corresponded to the multiome clusters (NCC and non-NCC) shown in Figure 1d. In the revised manuscript, the same dataset was re-clustered into 23 clusters (0–22), which were subsequently used for decomposition analysis to predict cell compartments with maximum likelihood. We also revised the color scheme of the segmented cells in Figure 2j–m to improve visual distinction between compartments and facilitate interpretation of the spatial distribution patterns.

       Putative NCCs were identified through the following procedure:
      
      1. EYFP expression was estimated across 39 cell types in the Xenium dataset by integration with the single-cell multiome dataset (including both NCCs and non-NCCs) using Tangram.
      2. Xenium clusters enriched for EYFP expression, defined as clusters whose mean estimated EYFP level exceeded the threshold corresponding to the 65th percentile across all spots, and consistent with known neural crest derivatives were extracted as putative NCC populations.
      3. For each spot within these EYFP-enriched Xenium clusters, RCTD was used to estimate the corresponding multiome cluster identity. Thus, the diagrams in Figure 2j–m indicate the most likely multiome subpopulation assignment for each putative NCC spot, rather than categorizing the multiome clusters themselves as NCC- or non-NCC-derived. This description has been included in the Method section (P33, L6–14).

      * The panels show pharyngeal markers, OFT/intracardiac markers, and SMC markers in Figures 2n-y, but could the authors show the proportion of NCC-derived (YFP+) cells for each cluster? Could the authors also map only the YFP+ cells on the Xenium data? It would be useful to see the proportion of YFP-positive (NCC-derived) cells for each delineated compartment. YFP-positive cells appear to exist at the boundary between LV and RV in the septum, this observation would benefit from proper quantification. *

      Our Xenium analysis could not detect EYFP signals; therefore, as noted above, we estimated EYFP expression by integrating the scRNA-seq and Xenium datasets using Tangram. To clarify the relative enrichment of distribution for each Xenium cluster, we summarized the mean estimated EYFP expression as bar plots (Revised Figure S2q, r). Consistent with their established neural crest origin, Xenium clusters annotated as neuron (38), ganglion (32), and Schwann cell (17) showed high estimated EYFP expression. In addition, Xenium clusters 15 (SMC) and 25 (cushion mesenchyme) were also enriched for EYFP expression.

       Unfortunately, currently available algorithms for Xenium data analysis do not reliably allow visualization or extraction of gene expression profiles exclusively from putative EYFP-positive cells. Instead, we improved the visualization of the spatial distribution of putative EYFP-positive cells by replacing centroid-based signal display with segmentation-based rendering (Revised Figure 2f, g).
      
       As pointed out by the reviewer, putative YFP-positive cells appear to be present at the boundary between the LV and RV within the interventricular septum. However, the estimated EYFP signals in this region were substantially lower than those observed in well-established neural crest derivatives, suggesting that most of these signals likely represent background noise, although a minor population of sparsely distributed neural crest-derived cells cannot be excluded. More precise characterization of potential neural crest derivatives in the ventricular region will require future investigation.
      

      * Figure 3 is confusing because it integrates data from multiple overlapping stages (E8.5 to P7). While the authors identify distinct compartments, pharyngeal mesenchyme, intracardiac mesenchyme, and SMCs, it is unclear why stages beyond E14.5 and E17.5 (corresponding to the initial single-cell omics and Visium analysis) were included. Although leveraging additional datasets is a clever approach, the integration of data from such disparate developmental timepoints confounds interpretation. For example, Cluster 6 appears to include cells from both P7 and E12.5 stages. Given that the Visium data represent hearts at E14.5 and E17.5, it is problematic to map clusters derived from other stages onto these spatial datasets. *

      We integrate data from multiple overlapping stages (E8.5 to P7) in Figure 3 to capture diachronic cell identity and also stage-specific features, particularly within intracardiac mesenchyme and SMC populations. This approach enabled us to assess core lineage relationships, including trajectories linking pharyngeal mesenchyme to intracardiac mesenchyme and SMCs, each comprising multiple distinct subpopulations. These findings support the biological relevance of the integration as a framework for understanding lineage relationships across developmental time. However, we agree with the reviewer that it is problematic to map clusters derived from other stages onto these spatial datasets. Indeed, we restricted the spatial mapping analysis to cells from E14.5 and E17.5 within the integrated dataset, thereby ensuring consistency with the developmental stages represented in the Visium data. To address concerns about this issue, we have clarified this point in the Methods section by adding the underlined words in the following sentence.

      Spatial mapping of scRNA-seq data onto Visium sections was performed using only E14.5 and E17.5 datasets and the RCTD algorithm, as described above.

      (P 34, L12–13)

      • *

      • Throughout the manuscript, the authors describe "lineage relationships" between cell populations, but these are in fact developmental trajectories inferred computationally (via UMAP connectivity and RNA velocity), not true lineage relationships. This distinction is critical and should be explicitly stated. *

      • *

      We agree with the reviewer that the present “lineage relationships” described in the original manuscript were primarily inferred from computational analyses, including UMAP connectivity and RNA velocity, rather than being directly demonstrated by lineage-tracing experiments. We also acknowledge that some RNA velocity results were not fully consistent with known in vivo developmental trajectories.

       We strengthened the biological validation of the inferred trajectories by incorporating extensive spatial verification of gene expression using immunohistochemistry and Xenium in situ hybridization analyses to confirm the identity and localization of each cell population. We also refined the computational analyses to better resolve regional differences in NCC dynamics and added comprehensive schematic illustrations based on established models of heart development from previous studies, with appropriate citations throughout the manuscript.
      
       In the revised manuscript, we have added much spatial verification of gene expression by immunohistochemistry and Xenium in situ hybridization data to confirm the identity of each cell population, and also include proper citation in the appropriate context. We also improved computational analysis to clarify the regional difference in NCC dynamics with comprehensive schematic illustration based on heart developmental processes established by previous literatures. In response to the reviewer’s comment and these revisions, we have carefully rephrased the relevant descriptions to clarify that the observed relationships represent computationally inferred developmental trajectories rather than definitive lineage relationships as follows.
      

      (original) RNA velocity analysis in conjunction with developmental context, revealed lineage relationships among these groups (Figure 3c, d).

      (revised) RNA velocity analysis in conjunction with developmental context, inferred global lineage relationships among these groups (Figure 3c, d), consistent with developmental trajectories in vivo.

      (P8, L21–23)

      (original) …, we present a comprehensive map of cardiopharyngeal NCC lineages …

      (revised) …, we present a comprehensive map of cardiopharyngeal NCC populations …

      (P18, L1)

      (original) Overall, this study proposes a new framework for understanding cardiac NCCs heterogeneity based on the association with the cardiac cushion and the accompanying transition in Hox gene expression and regulatory programs. Our findings provide a basis for systematically dissecting the developmental diversity of cardiac NCCs.

      (revised) Overall, this study proposes a new framework for understanding cardiac NCCs heterogeneity based on developmental route, Hox-code retention, and region-specific regulatory programs. Importantly, the developmental relationships and differentiation pathways described here are inferred from integrated computational analyses, including transcriptomic similarity, UMAP connectivity, and RNA velocity, rather than direct lineage-tracing experiments. Within this framework, our findings suggest distinct differentiation trajectories leading to great artery, ductus arteriosus, and coronary artery SMCs, as well as the aorticopulmonary septum and valvular/subvalvular mesenchyme.

      (P20, L33 – P21, L7)

      We also rephrased additional relevant sections throughout the manuscript in accordance with the reviewer’s comment (Please see below).

      * For example, the statement "we propose that intracardiac NCCs within C10 and C22 differentiate via C2 into valvular (C21) and subvalvular (C18) interstitial cells" should be framed as a computational inference, not an established lineage relationship. Without clonal lineage tracing data, these claims cannot be verified. *

      • *

      We have changed the word “propose” to “infer” (P10, L24).*

      Similarly, the claim that "NCCs contributing to the AP septum are distinct from other intracardiac NCCs in that they do not populate the cardiac cushions but remain continuous with NCCs populating the distal outflow tract cushion" lacks direct lineage evidence. What experimental data support this assertion? *

      We agree with the reviewer that the original statement lacked direct lineage evidence, although the anatomical distinction between the outflow tract cushions and the AP septum—a protruding structure arising from the dorsal wall of the aortic sac—is well established. In the revised manuscript, we therefore removed this speculative statement and rewrote the section to more accurately describe the developmental process with appropriate references. The spatial and temporal features of AP septum formation are now also illustrated in Figure S9a (UMAP plots and schematic illustration) and described as follows:

      *From E12.5 to E14.5, postotic cells showed a progressive emergence of the aorticopulmonary septum–associated cluster C16 from transitional states. * (P11, L32 – P12, L1)

      In addition, we further validated the identity of the C16 cluster as AP septum–associated NCCs by incorporating additional marker analyses together with spatial verification using Xenium in situ analysis. In addition to Penk and Sfrp2, which were previously reported by Chen et al., C16 also highly expressed Postn, similar to other intracardiac clusters. Furthermore, C16 was characterized by relatively high expression of Tcf24 and low expression of Vegfc. Based on the reviewer’s comment and these additional experimental data, we revised the relevant Results section as follows:

      (original) Within the intracardiac mesenchyme group, C16 exhibited high expression of Penk and Sfrp2 (Figure S3i and Table S6), corresponding to the cluster annotated as the aorticopulmonary septum in the previous study by Chen et al14. This annotation was further supported by enriched expression of Hox4 and Hox5 paralogs, consistent with its origin between PA4 and PA6 (Figure 3t, u). The aorticopulmonary septum originates as a protrusion from the dorsal wall of the aortic sac and is primarily derived from NCCs6,18–20. This septal structure fuses with the distal outflow tract cushions to divide the common arterial trunk into the aortic and pulmonary channels. Notably, NCCs contributing to this septum are distinct from other intracardiac NCCs in that it does not populate the cardiac cushions but remain continuous with NCCs populating the distal outflow tract cushion, suggesting that C16 represents this distinct NCC-derived population.

      (revised) C16 was distinguished by high expression of Penk and Sfrp2 (Figure S3i and Table S6), corresponding to the cluster annotated as the aorticopulmonary septum in the previous study by Chen et al14. This cluster also exhibited robust expression of mesenchymal markers, including Postn, similar to other intracardiac clusters (Figure S3i). In addition, C16 showed relatively high expression of Tcf24 and low expression of Vegfc compared with the other intracardiac clusters (Figure S5a-e). These gene expression features of the aorticopulmonary septum were further validated by Xenium in situ hybridization (Figure S5f-j).

      • Unlike other intracardiac NCCs that populate the distal outflow tract cushions, the aorticopulmonary septum originates as a protrusion from the dorsal wall of the aortic sac and is primarily derived from NCCs residing in PA4 and PA66,22–24. This septal structure subsequently fuses with the distal outflow tract cushions to partition the common arterial trunk into the aortic and pulmonary channels. Consistent with this developmental origin, C16 was enriched for the expression of Hox4 and Hox5 paralogs (Figure 3t, u), indicating that NCCs in this population retain their Hox code, in contrast to other intracardiac NCCs, in which most Hox genes were downregulated (see later details).*

      (P9, L24 – P10, L6)

      * In addition, the authors state that what De Bono et al. identified as "outflow smooth muscle" corresponds in their dataset to early intracardiac mesenchymal clusters C2, C10, and C22, present as early as E10.5, when mesenchymal NCC derivatives express immature SMC markers, and that these cells later differentiate into coronary artery SMCs around E14.5 (C23) as well as other non-muscle components. This claim is not verified, whether these cells are indeed the ones differentiating into coronary artery SMCs is based solely on computational inference from C2, C10, C22 to C23. *

      • *

      To validate that C23 corresponds to coronary artery SMCs, we performed additional immunostaining analyses for Myh11, Sost, and Reln, which distinguish great artery SMCs, ductus arteriosus SMCs, and coronary artery SMCs, respectively, consistent with their transcriptomic identities. Regarding differentiation of NCC-derived mesenchymal cells into coronary artery SMCs, we previously demonstrated using chick–quail chimera experiments and specific Cre-reporter mouse lines that proximal coronary artery SMCs originate from preotic NCCs rather than postotic NCCs (Arima Y et al. Nature Communications 3:1267, 2012). In addition, coronary artery SMCs have been reported to differentiate through pericyte intermediates (Volz KS et al., eLife 4:1–22, 2015). Consistent with these findings, our present lineage-tracing analyses using Sox9-CreERT2; Ai14 and Scx-CreERT2; Ai14 mice demonstrated that progenies of Sox9high and Scxhigh intracardiac cushion mesenchymal cells contributed to the coronary artery SMCs as well as the surrounding mesenchyme.

       We agree with the reviewer that the relationship between the early intracardiac mesenchymal clusters (C2, C10, and C22) and C23 is primarily inferred from computational trajectory analyses and is not demonstrated by direct clonal lineage tracing. Accordingly, we revised the Discussion to avoid overstatement and to clarify that these lineage relationships are inferred based on computational analyses together with prior experimental findings and the additional validation data described above. The revised text is as follows:
      
      • *

      (original) Our integrated map incorporates previously published lineage analyses of cardiac NCCs at early and late stages13,14, providing continuity through complementary single-cell and spatial transcriptomic data, although our interpretation of certain clusters differs from those of prior studies. For example, the cell population identified by De Bono et al. as outflow smooth muscle13corresponds in our dataset to early intracardiac mesenchymal clusters C2, C10, and C22, present as early as E10.5, when mesenchymal NCC derivatives express immature SMC markers. These cells later differentiate into coronary artery SMCs around E14.5 (C23) as well as other non-muscle components. Despite such differences in interpretation, the integrated map robustly captures lineage relationships, supported by accumulated developmental and anatomical evidence regarding cardiac outflow tract formation, particularly in relation to the outflow tract cushion.

      (revised) Our integrated map incorporates previously published lineage analyses of cardiac NCCs at early and late stages13,14, providing continuity through complementary single-cell and spatial transcriptomic data. The present study further extends these datasets by resolving the heterogeneity of intracardiac mesenchymal populations and their lineage relationships. For example, the cell population identified by De Bono et al. as outflow smooth muscle13 corresponds in our dataset to early intracardiac mesenchymal clusters expressing immature SMC markers, which subsequently diverged into multiple derivatives including coronary artery SMCs. In addition, we identified distinct SMC populations corresponding to great artery SMCs, ductus arteriosus SMCs, and coronary artery SMCs, each characterized by unique molecular signatures such as Sost, Tfap2b/Ptger4, and Reln/Gja4, respectively. The continuity between intracardiac mesenchyme and coronary artery SMCs through a pericyte-like intermediate state is consistent with previous developmental studies showing that proximal coronary artery SMCs originate from preotic NCCs and may arise through pericyte intermediates8,21. Together, these findings provide a refined framework for understanding the diversification of cardiac NCC derivatives during outflow tract remodeling.

      (P18, L14–29)

      * Claims regarding marker expression in specific compartments (for example Hapln1 and Postn in cushions) require additional spatial validation at higher resolution than what is currently provided by the Visium data. Moreover, it is unclear whether these data are single-cell resolution; the authors need to clarify this. HCR staining would be ideal to confirm these expression patterns. Currently, all conclusions are based solely on gene expression without orthogonal spatial confirmation. At minimum, the authors should provide references from the literature supporting these expression patterns. *

      As reviewer suggested, the confirmation of spatial context for the gene expression patterns of scRNA-seqs data is important to validate. We further investigated the spatial expression patterns through Xenium in situ hybridization system. Among cardiac mesenchyme subpopulation, Postn was dominantly expressed, however, Tcf24 was specifically expressed in AP septum (C16) not in cushion (C2, 10, 18, 21 and 22) in scRNA-seq data. On the other hand, Vegfc was expressed except in the AP septum. To confirm these opposing expression patterns, we newly added Figure S5, showing Tcf24 and Vegfc expression revealed by Xenium.

      According to the reviewer’s suggestion, we added the sentence in the revised manuscript as follows:

      C16 was distinguished by high expression of Penk and Sfrp2 (Figure S3i and Table S6), corresponding to the cluster annotated as the aorticopulmonary septum in the previous study by Chen et al14. This cluster also exhibited robust expression of mesenchymal markers, including Postn, similar to other intracardiac clusters (Figure S3i). In addition, C16 showed relatively high expression of Tcf24 and low expression of Vegfc compared with the other intracardiac clusters (Figure S5a-e). These gene expression features of the aorticopulmonary septum were further validated by Xenium in situ hybridization (Figure S5f-j).

      (P9, L24–31)

      * Could the authors confirm the absence of the Sox9high/Scxhigh population in AP septum descendants? *

      Sox9high/Scxhigh NCCs are enriched not only in intracardiac NCC clusters C2, C10, and C14, but also in the AP septum-associated cluster C16, as stated in the manuscript as follows:

      In the integrated UMAP, Sox9high/Scxhigh NCCs were enriched in C2, C10, C14 and C16.

      (P17, L7–8)

      However, based on our previous finding that proximal coronary artery SMCs originate from preotic rather than postotic NCCs (Arima Y et al. Nature Communications 3:1267, 2012), we infer that the intermediate population contributing to coronary artery SMCs is more likely derived from intracardiac NCC clusters C2 and C10 than from the pharyngeal arch 4/6-derived AP septum-associated cluster C16. To clarify this interpretation, we have added the following statement to the final paragraph of the Results section:

      Together with our previous report that proximal coronary artery SMCs originate from preotic rather than postotic NCCs8, these results suggest that the intermediate population contributing to coronary artery SMCs likely represents a subset of Hox-downregulated intracardiac NCCs corresponding to clusters C2 and C10.

      (P17, L29–33)

      Minor Comments *Could the authors better justify their choice of stages (E11.5 to E17.5) for the single-cell multiomic assay? Given that OFT cushions are already populated by NCCs by E10.5 and that AP septum formation is already underway at this stage (see Development (2007) 134(8): 1593-1604), the rationale for beginning at E11.5 should be explicitly stated. *

      We agree that NCCs have already populated the OFT cushions and that AP septum formation is underway by E10.5. Our selection of stages from E11.5 to E17.5 was intended primarily to enable synchronous comparisons between pharyngeal and intracardiac NCC populations across developmental stages, rather than to capture the earliest spatiotemporal events of cardiopharyngeal NCC lineage establishment. We have clarified this rationale in the revised manuscript by revising the statement as follows:

      • *

      (original) To elucidate the spatiotemporal dynamics of cardiopharyngeal NCC lineages, we performed single-cell multiome analysis on pharyngeal and cardiac tissues from E11.5 and E12.5 and ...

      • *

      (revised) To compare pharyngeal and intracardiac NCCs across developmental stages and characterize their temporal changes, we performed single-cell multiome analysis on pharyngeal and cardiac tissues from E11.5 and E12.5 and ...

      (P6, L3-5)

      • *

      • *

      *Spelling errors in Figure 2e: "ambious" should be "ambiguous"; "atrium venticle" should be "atrium/ventricle"; "ventricle" is misspelled in several locations. The clustering resolution is very high, yielding many clusters that are difficult to distinguish based on the colour code alone. What does "ventricle_CL" refer to? What is the "marginal layer"? A clearer legend or supplementary table defining each cluster would be helpful. *

      To improve the clarity of the high-resolution clustering, we added Xenium cluster numbers to Figures 2c and 2d, enabling clusters to be distinguished not only by color but also by their anatomical locations. We also revised the cluster annotations in Figure 2e and corrected all spelling errors, including “ambiguous,”. In addition, we replaced “ventricle_CL” with “Cardiac muscle 1” and the formerly misspelled “atrium ventricle CM” with “Cardiac muscle 2.” The annotation “marginal layer” was also revised to “Spinal cord, marginal layer” for clarity.

      *Figure 2j: The image is very dim. *

      • *

      We have improved the image quality and added regional annotations to enhance visibility in the revised figure.

      * Figure 3b: What do the numbers correspond to? Additionally, "mesenchyme" is misspelled. *

      We have revised Figure 3b to clarify different characteristics of mesenchymal subpopulations. We have also corrected the misspelling of “mesenchyme.”

      **Referees cross-commenting**

      *I also agree with the other reviewers' comments. Many thanks. *

      Reviewer #2 (Significance (Required)):

      *The overall approach is sound and the datasets generated are valuable resources for the field. The manuscript presents interesting findings regarding NCC heterogeneity in cardiac development.

      As I understand the authors' model: OFT cushion-associated NCCs enter and associate with the OFT cushions, undergoing a Hox-positive to Hox-negative transition, accompanied by a shift in Meis binding and GRN rewiring. These cells pass through a Meis2-Sox9-Scx "skeletogenic progenitor-like" intermediate state and contribute to semilunar valves and coronary artery smooth muscle. Aorticopulmonary septum NCCs, by contrast, retain distinct Hox codes (Hox4/5 enrichment) rather than becoming Hox-negative, and contribute to the AP septum and great vessel smooth muscle.

      Strengths: The integration of multiple omics modalities with lineage tracing is a powerful approach, and the identification of a Hox-dependent bifurcation in NCC fate is a novel conceptual advance.

      Limitations: The reliance on computational trajectory inference without orthogonal lineage validation, combined with the integration of datasets spanning very different developmental stages, limits the strength of some conclusions. The analysis also required more precise anatomical annotations to facilitate accessibility to the readers - to visualise better the distinguishable contribution of the cardiac NCCs to the OFT.

      Advance The study extends knowledge in the field by providing novel mechanistic insight into neural crest diversification in the context of cardiovascular development. The nature of the advance is primarily mechanistic, identifying a Hox-Meis regulatory switch that distinguishes cushion-associated from cushion-independent NCC lineages.

      Audience This work will be of interest to a specialised audience interested in neural crest cells and developmental biologists using omics approaches to address cell fate diversification in complex tissues.

      Reviewer Expertise Developmental biology, lineage analysis, mouse genetics. I do not have the expertise to assess the computational methods used in this paper. *

      Again, thank you for giving us the opportunity to strengthen our manuscript with your valuable comments and queries. We have worked hard to incorporate your feedback and hope that these revisions persuade and satisfy you.

      To Reviewer #3

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: *Iwase et al. presents a comprehensive multiomics analyses of cardiac neural crest cell (NCC) differentiation during cardiopharyngeal development. Using Wnt1Cre;R26R-EYFP mice, the authors isolated NCCs and non-NCCs at E11.5, E12.5, E14.5 and E17.5 stages and performed single-cell RNA-seq, ATAC-seq, spatial transcriptomics analyses. Spatial resolution of NCC-derived populations was achieved using Xenium (E11.5-E12.5) and Visium (E14.5-E17.5) platforms. Integration of single cell and spatial datasets identified distinct NCC-derived populations with defined spatial organization within the pharyngeal and intracardiac regions. The study concludes that Hox gene patterning underlies NCC subpopulation identity during cardiopharyngeal development and reveals a developmental transition from Hox-dependent to Hox-independent transcriptional regulation. Furthermore, the authors identify a Meis2-Sox9-Scx gene regulatory network making a skeletogenic progenitor-like intermediate that contribute to coronary smooth muscle and semilunar valve formation.

      While the dataset is comprehensive and technically strong, several key conclusions are not always convincingly supported by enough data. As a result, some claims appear speculative and would benefit from additional experimental validation to strengthen the proposed developmental models.

      I would strongly encourage authors to consider the following points to provide additional details that will strengthen their study: *

      Thank you for your encouraging feedback and comments. We have responded to your comments below.

      Major comments:

      *- The authors should provide detailed FACS gating strategies and sorting conditions used to selectively isolate EYP-positive and EYP-negative NCC populations, including representative plots and information on exclusion criteria (e.g., doublets, dead cells). *

      • *

      We added the sorting gates to the revised Figure S1 and described the detailed FACS gating strategy and sorting conditions in the revised Methods section as follows:

      • *

      (original) EYFP-positive and -negative single cells were sorted using a FACSAria II or FACSMelody (BD Biosciences), freshly processed or cryopreserved.

      • *

      (revised) Single-cell suspensions were stained with 7-AAD (BD Pharmingen) for 3 min at 4℃, and EYFP-positive and -negative single cells were sorted using a FACSAria II or FACSMelody (BD Biosciences). The sorting strategy was as follows:

      *Step 1. All events were gated by forward scatter (FSC) and side scatter (SSC) including area (A), height (H), and width (W) to obtain FSC singlets and remove doublets. *

      *Step 2. FSC singlets were gated for the 7-AAD negative fraction to isolate viable cells. *

      *Step 3. Viable cells were gated to isolate EYFP-positive NCCs or EYFP-negative non-NCCs. *

      Sorted cells were freshly processed or cryopreserved for the following procedure.

      (P28, L21–30)

      * - Although the authors isolated nuclei for both scRNA-seq and ATAC-seq, the number of cardiomyocytes within the EYFP-negative population is unexpectedly low. The authors should clarify potential technical or biological reasons for this underrepresentation (e.g., nuclei isolation efficiency, sorting strategy, filtering criteria, or developmental stage-specific composition). *

      • *

      The unexpectedly low proportion of cardiomyocytes within the EYFP-negative population likely reflects the restricted sampling region used in this study. Specifically, we dissected cardiopharyngeal tissue and the outflow tract region rather than the whole heart as shown in Figure 1a, which likely introduced a sampling bias that reduced the representation of cardiomyocytes in the dataset.

      Although the FACS gating strategy could potentially influence the recovery of specific cell types, we consider this possibility unlikely because cardiomyocyte populations were successfully detected in our dataset. In addition, during the quality-control process for scRNA-seq data, we applied a mitochondrial gene threshold of 25% to exclude low-quality cells. While mature cardiomyocytes typically exhibit high mitochondrial gene expression, embryonic cardiomyocytes at the analyzed developmental stages are immature and therefore were unlikely to be disproportionately excluded by this criterion. Consistent with this interpretation, distinct cardiomyocyte clusters remained detectable after filtering, indicating that cardiomyocytes were retained through the quality-control process.

      Based on these considerations, we conclude that the low abundance of EYFP-negative cardiomyocytes primarily reflects the limited anatomical region sampled. We have clarified this point in the revised Results section as follows:

      The overall proportion of cardiomyocytes was low, likely reflecting the restricted sampling of the cardiac outflow tract region (Figure 1a).

      (P6, L23–25)

      * - In figure 1, the authors present results from unsupervised clustering of 9,420 cells into 21 distinct clusters, many of which are broadly labeled as "mesenchymal cells". The authors should refine this nomenclature by providing more specific annotations or defining criteria, as this broad classification limits interpretability of the identified subpopulations. *

      • *

      In the revised manuscript, we further refined the clustering analysis by subdividing the original clusters C5 and C13 into two subclusters each, resulting in a total of 23 clusters in the UMAP shown in Figure 1d. This refinement improved the resolution and interpretability of the identified cell populations. In addition, we replaced the broad “mesenchymal cells” annotation with more specific classifications, including pharyngeal mesenchymal cells, intracardiac mesenchymal cells, and smooth muscle–like cells. To further clarify lineage relationships, we also provided UMAPs separately displaying NCC and non-NCC populations in Figure 1e, as suggested by the reviewer.

      *- To integrate spatial annotated Xenium datasets with scRNA-seq data, the authors used Tangram, enabling estimation of the spatial distribution of EYFP-positive NCCs within the pharyngeal region of E11.5 and 12.5 embryos. However, the E11.5 section show a relatively low number of EYFP-positive cells (Figure 2f). The authors should clarify whether this reflects technical limitations (e.g., probe design, segmentation efficiency, or integration parameters) or biological factors and explain how this affects interpretation of the spatial analyses. *

      *- The author used RCTD tool to decompose the scRNA-seq dataset into NCC and non-NCC components and mapped these onto the Xenium dataset. However, panels j and l in Figure 2 show low signal in the E11.5 sections. The authors should clarify whether this reflects technical limitations of the RCTD deconvolution, differences in sampling, or biological factors, and discuss how this result impacts of interpretation of the spatial mapping results at this stage. *

      • *

      In the original manuscript, estimated EYFP expression levels were visualized using a minimum cutoff of 0.1, with the remaining values mapped onto a 100-step color scale. However, this approach resulted in apparently weak signal intensity in the E11.5 sections because high-level noisy signals, including signals detected in the atrial lumen and outside the embryo, broadened the dynamic range of the visualization. In the revised manuscript, we applied an upper cutoff at the 90th percentile to reduce the influence of these noisy signals, resulting in improved visualization of EYFP-positive regions in Figure 2f, with signal intensity now comparable to that observed in Figure 2g. In addition, we further improved the spatial visualization of putative EYFP-positive cells by replacing centroid-based signal display with segmentation-based rendering in the revised Figure 2f, g. Importantly, these revisions affected only the visualization method and did not alter the underlying analyses or conclusions, as the integration and downstream analyses were performed using the original quantitative data.

       We also clarified the procedure used for NCC estimation and spatial mapping in the Methods section as follows:
      

      Putative NCCs were identified through the following procedure:

      Step 1. EYFP expression was estimated across 39 cell types in the Xenium dataset by integration with the single-cell multiome dataset (including both NCCs and non-NCCs) using Tangram.

      Step 2. Xenium clusters enriched for EYFP expression, defined as clusters whose mean estimated EYFP level exceeded the threshold corresponding to the 65th percentile across all spots, and consistent with known neural crest derivatives were extracted as putative NCC populations.

      Step 3. For each spot within these EYFP-enriched Xenium clusters, RCTD was used to estimate the corresponding multiome cluster identity.

      (P33, L7–16).

      *- The authors integrated their data with publicly available scRNA-seq datasets of NCCs from E8.5 to P7 hearts and present results from unsupervised clustering of 67,208 cells into 28 distinct clusters. Figures 3a and 3b show that cardiomyocyte (C26) is included in NCC-derivatives. The authors should clarify whether this reflects technical issue when they made FACS. *

      • *

      In our original datasets in Figure 1d, only a small number of NCCs were detected within the cardiomyocyte cluster (corresponding to C26 in Figure 3b), which was otherwise predominantly composed of non-NCCs. Cardiomyocytes assigned to C26 were also present in both publicly available scRNA-seq datasets included in the integrated analysis. Previous studies have reported that a limited subset of NCCs can differentiate into cardiomyocytes (Tomita, Y. et al., J. Cell Biol. 170:1135–1146, 2005; Tamura, Y. et al., Arterioscler. Thromb. Vasc. Biol. 31:582–589, 2011). Therefore, we consider that C26 likely represents a small population of NCC-derived cardiomyocytes rather than contamination caused by technical issues during FACS isolation, although the low cell number precluded further characterization.

       To clarify this point, we added the following statement to the Results section:
      
      • *

      *Only a few NCCs were detected within cardiomyocyte clusters, which were predominantly composed of non-NCCs, consistent with previous reports demonstrating NCC differentiation into cardiomyocytes15,16. *

      (P6, L20–23)

      * - The authors used RNA-velocity to infer relationship among the identified clusters. However, this analysis requires particular caution given that data were generated from multiple datasets obtained under different conditions. Several conclusions drawn from the RNA-velocity analysis are not convincing, as illustrated in Figures 3c and 3d, where the inferred velocity directions appear inconsistent with the proposed developmental model (e.g., trajectory from cluster 23 toward 18 or from 4 toward 6). The authors should clarify these discrepancies, justify the integration of heterogenous datasets and reassess the interpretation of the inferred lineage relationships. *

      In the default setting, the integration workflow provided by Seurat which is widely used for scRNA-seq analysis employs canonical correlation analysis (CCA). CCA effectively corrects batch effects across datasets generated from different experimental platforms. However, it sometimes causes overcorrection to attempt to forcibly integrate different cell populations that are not shared among datasets (Andreatta, Bioinformatics, 2021). To minimize overcorrection for multiple datasets obtained under different experimental conditions, we applied reciprocal principal component analysis (RPCA) method recommended for comparative integration of heterogeneous scRNA-seq datasets (Luecken et al., Nature Methods, 2021). This selection is suitable for the integration of multiple datasets provided by different independent studies as in case of the present study.

       To infer relationship among the identified clusters, we then used RNA velocity analysis of scRNA-seq data fundamentally based on splicing dynamics. The original framework assumes that transcriptional induction and repression persist long enough for cells to reach active (transcribing) or inactive (silenced) steady-state equilibrium. However, this assumption is often violated during cell differentiation, where transient cell populations frequently exhibit rapidly changing mRNA levels that do not reach steady-state equilibrium. To address this limitation, the scVelo method was developed (Bergen et al., 2020, *Nature Biotechnology*), and we applied this approach to the integrated NCC datasets in the present study. This analysis successfully inferred directional flows from the pharyngeal mesenchyme toward SMCs and intracardiac mesenchyme through transitional states (Figure 3c,d). However, as the reviewer correctly pointed out, the analysis predicted a directional flow from C23 to C18, apparently opposite to the biological directionality supported by previous findings (see P9, L15–19).
      
       We consider this discrepancy to reflect intrinsic limitations of RNA velocity analysis. As discussed by Bergen et al., in systems containing multiple lineages and cellular processes, differences in gene regulatory networks among heterogeneous cell states can generate multiple trajectories in phase space owing to distinct splicing kinetics. In addition, incompletely captured splicing kinetics may represent only a limited portion of the overall dynamics, particularly near terminal differentiation states. In such cases, phase portraits of unspliced versus spliced transcripts may appear nearly linear rather than curved, potentially leading to erroneous assignment of positive or negative RNA velocities. Consistent with this limitation, we obtained opposite directionalities between C18 and C23 depending on whether steady-state or dynamical models were applied and according to different parameter settings. Through these repeated computational re-evaluations of lineage directionality, we concluded that RNA velocity analysis is suitable for capturing the global landscape of differentiation flow, but that accurate inference of local lineage directionality may require careful model selection and parameter optimization to ensure consistency with established biological evidence.
      
       To avoid arbitrariness and potential confusion, we removed the arrow between C18 and C23 from the revised Figure 3d. Instead, we now describe the observed continuity between these populations in the Results and Discussion sections as follows:
      

      Notably, the UMAP revealed a continuum between C23 and C18 within the intracardiac mesenchyme population. Given previous findings that the proximal coronary artery SMCs originate from preotic NCCs8 and that pericytes give rise to coronary artery SMCs23, this connection likely represent a differentiation trajectory from intracardiac mesenchyme to coronary artery SMCs via a pericyte-like intermediate stage.

      (P9, L15–19)

      The continuity between intracardiac mesenchyme and coronary artery SMCs through a pericyte-like intermediate state is consistent with previous developmental studies showing that proximal coronary artery SMCs originate from preotic NCCs and may arise through pericyte intermediates8,21.

      (P18, L24-27)

      *- The authors should provide more detail on how they identified bifurcation points and more clearly explain the transition from intracardiac mesenchyme to smooth muscle cells (SMC). Additionally, they should clarify what distinguishes the three clusters (C4, C23, C27) in terms of transcription programs, marker expression, or functional states, to better support their proposed differentiation trajectories. *

      To clarify the distinctions among the three SMC clusters (C4, C23, C27), we added a heatmap showing differentially expressed genes, violin plots for the mature SMC marker Myh11, and feature plots with immunostaining images for Myh11, Sost, and Reln expression (Figure S4). These additional analyses further validate the molecular and spatial characteristics of the three SMC clusters. Based on these data, we revised the relevant section of the Results as follows:

      The SMC clusters, which were continuous with the pharyngeal mesenchyme via transitional populations in the UMAP, were identified by high expression of the mature SMC marker Myh11 (Figure 3i). Differential gene expression analysis further distinguished individual clusters (Figure S4a-g). Among these, C27 displayed a transcriptomic profile characteristic of the great artery SMCs, including high expression of Sost (Figure S3j). C4 was enriched for Tfap2b and Ptger4 (Figure S3j), markers of the ductus arteriosus SMCs21,22, supporting its annotation. C0 and C7 likely represent transitional states between pharyngeal mesenchyme and differentiated lineages, potentially bifurcating toward great artery SMCs or cardiac cushion mesenchyme (Figure S3a and Table S6). C23 was characterized by high expression of Gja4, a marker of coronary artery SMCs, along with pericyte markers Kcnj8 and Rgs5 (Figure 3j and Figure S3k), corresponding to the cluster similarly annotated by Chen et al14. In addition, C23 was also distinguished from C4 and C27 by its expression of Reln (Figure S4d).

      • Immunostaining supported these cluster annotations. Sost expression is observed in great artery SMCs but not in ductus arteriosus and coronary artery SMCs, whereas Myh11 expression was higher in ductus arteriosus and coronary artery SMCs than in aortic SMCs (Figure S4f-m). Furthermore, Reln expression was restricted to coronary artery SMCs (Figure S4n-s).*

      (P8, L30 – P9, L14)

      To further clarify the identification of bifurcation points and the transition from intracardiac mesenchyme to SMCs, we additionally stratified the integrated UMAP according to distinct Hox expression profiles and inferred lineage trajectories corresponding to different neural crest and pharyngeal arch origins (Figure S9). Based on the inferred differentiation trajectory from C18 to C23 (P9, L15–19), together with the identification of C2 and C10 as Sox9high and Scxhigh intracardiac cushion mesenchymal populations contributing to coronary artery SMCs (P17, L27–33), we incorporated these lineage relationships into the schematic model presented in Figure 9.

      Minor comments:

      *- The authors convincingly demonstrate a switch in Meis-binding motifs across NCC populations, supporting a model in which cardiac cushion-associated NCCs transition from Hox-dependent to Hox-independent transcriptional regulation via alternative cofactor interactions and DNA-binding preferences. However, the authors should provide evident on whether GATA motifs are enriched within Meis peaks, as this could further clarify potential cooperative interactions during this transition. *

      Although GATA-binding motifs were enriched within Meis-associated open chromatin regions in intracardiac NCCs compared with many other motifs, a substantial proportion of GATA motifs were located in peaks distinct from those containing Meis motifs. This observation raises the possibility that GATA and Meis transcription factors may cooperate through interactions across separate regulatory elements to modulate enhancer activity. However, we did not directly investigate this possibility in the present study. Instead, we found that the Meis-associated peaks identified in intracardiac NCCs, including the distal Sox9 enhancer containing a Meis2-binding site (chr11:112850240–112851186), more prominently contained several other consensus motifs, including predicted Hand2- and Nfatc1-binding sites. To our knowledge, however, these transcription factors have not previously been described as non-Hox cofactors of Meis proteins. Their potential recruitment to this enhancer, as well as possible cooperative interactions with Meis transcription factors during intracardiac NCC differentiation, was not examined in the current study and remains an important subject for future investigation.

      * - In Figure 5 panels g, j and k are difficult to interpret. The authors should provide clearer annotations, labeling, or additional explanations to improve readability and facilitate understanding of the data. *

      • *

      We added the annotations to UMAP in Figure 6h (and 6i) corresponding to Figure 4a and included color bars in Figure 6k as well as in 6i. To further improve readability and facilitate understanding of the data, we added the explanation of the perturbation scores in the legends for Figure 6i and k.

      (original)

      *(g) Pseudotime trajectory analysis of integrated NCC clusters inferred using CellOracle. *

      *(h-k) Sox9 (h, i) and Osr1 (j, k) knockout simulation presented as altered differentiation vector flows (h, j) and perturbation scores which was inner product of perturb simulation (i, k). *

      • *

      (revised)

      *(g) Pseudotime developmental flow of integrated NCC clusters from the neural tube, inferred using CellOracle and projected onto the UMAP space shown in Figure 4a. *

      *(h-k) Sox9 (h, i) and Osr1 (j, k) knockout simulations presented as altered differentiation vector flows (h, j) and perturbation scores, defined as the inner product between the simulated perturbation vectors and the original developmental flow (i, k). Green and magenta color bars indicate normal developmental flow and reverse flow induced by perturbation of the indicated genes, respectively. *

      (P41, L3–L9)

      *- In Figure 6, results support the role of hexameric Meis-binding motif-containing region as a distal enhancer of Sox9. The authors should provide additional results from a ChIP-qPCR experiment to further validate this model. *

      • *

      We attempted ChIP-seq experiments on O7-1 neural crest cell line using two different anti-Meis antibodies. However, we were unable to detect specific binding of Meis proteins to this enhancer region, although the luciferase assays clearly demonstrated the enhancer activity that was significantly attenuated by deletion of the Meis-binding motif. This discrepancy may reflect differences between endogenous chromatin and plasmid-based reporter contexts, including epigenetic modifications and chromatin accessibility. We are now investigating experimental conditions that would allow direct verification of endogenous Meis binding to this region.

      * - Panel l in Figure S3 requires better annotation. *

      • *

      We added annotations including the aorta, pulmonary valve, left coronary artery, and its septal branch.

      * - Correct the typo errors in Figure 5a. *

      • *

      The typographical errors “consercvation” and “Visuzalization” were corrected to “conservation” and “Visualization”, respectively.

      * - The authors should refer to previous studies showing the role of Hoxa1 and Hoxb1 in the development of great arteries or semilunar valves. *

      We have added the following discussion on the roles of anterior Hox genes in cardiovascular development, together with appropriate references:

      • The regional identities of pharyngeal NCCs that contribute to cardiac development are established by Hox genes and their associated regulatory networks. Genetic studies have demonstrated essential roles for the anterior Hox genes in patterning the pharyngeal arch artery system and semilunar valve structures. Loss of Hoxa1 and Hoxb1 results in severe defects in pharyngeal arch artery development42, whereas ectopic or sustained expression of Hoxb1 in NCCs disrupts cardiovascular morphogenesis and causes malformations of the great arteries and semilunar valves43. Likewise, Hoxa3 contributes to proper patterning of the pharyngeal arch region and its NCC-derived derivatives44,45. These findings underscore the importance of precise spatial and temporal regulation of Hox genes during cardiovascular development. Among pharyngeal NCCs contributing to cardiac development, cushion-independent NCC derivatives (great artery SMCs and the aorticopulmonary septum) retain their origin-specific Hox-codes. In contrast, cushion-associated NCC derivatives (coronary artery SMCs and valvular/subvalvular interstitial cells) downregulate Hox expression and transition toward region-specific GRNs involving TFs such as Tbx20 and Gata4, whose expression is known to be induced by BMP signaling in cardiomyocytes46,47. Bmp2 and Bmp4 are expressed in the regions of the pericardial reflection traversed by NCCs en route to the cardiac cushion48. Together, these observations suggest that appropriate repression of Hox programs, coupled with activation of cardiac-specific regulatory networks, is required for normal differentiation of cushion-associated NCC derivatives.*

      (P18, L30 – P19, L16)

      • Roux, M. et al. Hoxa1 and Hoxb1 are required for pharyngeal arch artery development. Mech. Dev. 143, 1–8 (2017).*
      • Zaffran, S., Odelin, G., Stefanovic, S., Lescroart, F. & Etchevers, H. C. Ectopic expression of Hoxb1 induces cardiac and craniofacial malformations. genesis 56, (2018).*
      • Chisaka, O. & Capecchi, M. R. Regionally restricted developmental defects resulting from targeted disruption of the mouse homeobox gene hox-1.5. Nature 350, 473–479 (1991).*
      • Kameda, Y., Watari-Goshima, N., Nishimaki, T. & Chisaka, O. Disruption of the Hoxa3 homeobox gene results in anomalies of the carotid artery system and the arterial baroreceptors. Cell Tissue Res. 311, 343–352 (2003).*

      **Referees cross-commenting**

      *Having read the comments of the other reviewers, I totally agree with them. All our comments converge and should allow the authors to improve their manuscript. *

      Reviewer #3 (Significance (Required)):

      *The study provides high-resolution spatial and temporal mapping of NCC-derived populations and proposes mechanistic insights into Hox-dependent versus Hox-independent transcriptional regulation, as well as a Meis2-Sox9-Scx gene regulatory network contributing to smooth muscle and semilunar valve formation.

      Strengths and limitations: The datasets are rich and well-integrated, offering valuable resources for the field. However, several key conclusions rely on correlative analyses and heterogeneous datasets, making some claims speculative. Technical details, such as FACS gating, low representation of cardiomyocytes, and interpretation of RNA velocity, require further clarification, which currently limits the strength of the mechanistic inferences.

      Advance: This work advances the understanding of NCC lineage diversification and gene regulatory dynamics in cardiopharyngeal development, particularly highlighting potential transcriptional switches and intermediate progenitor states that guide structural formation in the heart.

      Audience: The study will be of interest to researchers in developmental biology, cardiovascular biology, and single-cell multi-omics, particularly those studying neural crest cell differentiation and cardiac morphogenesis.*

      Again, thank you for giving us the opportunity to strengthen our manuscript with your valuable comments and queries. We have worked hard to incorporate your feedback and hope that these revisions persuade and satisfy you.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      In this manuscript, Iwase et al. cleverly make use of different modalities, spatial transcriptomics and single-cell omics datasets, in conjunction with a well-established Wnt1-Cre;R26R-EYFP line to trace neural crest cells (NCCs) contributing to the cardiovascular system during embryonic development in the mouse. By doing so, the authors identified a bifurcation between cardiac NCCs contributing to the OFT cushions and forming the aorticopulmonary (AP) septation complex. Thus, the authors split "intracardiac NCCs" into two different NCC programs/compartments, even though both reside in the broad OFT region. The NCCs that enter and associate with the OFT cushions undergo a Hox off transition (Hox-positive to Hox-negative once intracardiac), with a corresponding shift in Meis binding and GRN wiring. The authors propose these cells pass through a Meis2-Sox9-Scx "skeletogenic progenitor-like" intermediate and contribute to semilunar valves and coronary artery smooth muscle. By contrast, the NCCs assigned to aorticopulmonary septum (APS) formation and great vessel smooth muscle retain a distinct Hox codes.

      Major Comments

      1. The manuscript would benefit from clearer delineation between the different NCC contributions, particularly for non-specialist readers. The distinction between (i) CNCCs in OFT cushions and (ii) CNCCs forming the aorticopulmonary (AP) septation complex is not adequately explained. While both populations contribute to OFT septation, according to the authors, they represent distinct compartments with different developmental trajectories. The authors could clarify this using anatomically labelled hearts at the stages they conduct their analysis, along with additional text and schematics explaining what is meant by each compartment. This would greatly enhance the accessibility of the manuscript.
      2. The Methods state that "Decomposition of cell clusters of scRNA-seq was performed by RCTD to map them onto the Xenium dataset," but this description is insufficient. The authors should clarify whether RCTD was applied separately for each developmental stage (i.e., E11.5 scRNA-seq reference for E11.5 Xenium, E12.5 reference for E12.5 Xenium), or whether a pooled reference was used across stages. This clarification is important because RCTD performance depends critically on the correspondence between reference and target datasets. Using a multi-stage integrated reference to deconvolve stage-specific spatial data could introduce artifacts, as cell states and cluster compositions vary considerably across developmental timepoints.
      3. Figure 2j-m needs annotations and shcematics. It is currently very difficult to identify the different compartments. See Figure C in Chen et al. for an example of this approach. In addition, what are the fine clusters from 1 to 20? Which ones are NCC-derived?
      4. The panels show pharyngeal markers, OFT/intracardiac markers, and SMC markers in Figures 2n-y, but could the authors show the proportion of NCC-derived (YFP+) cells for each cluster? Could the authors also map only the YFP+ cells on the Xenium data? It would be useful to see the proportion of YFP-positive (NCC-derived) cells for each delineated compartment. YFP-positive cells appear to exist at the boundary between LV and RV in the septum, this observation would benefit from proper quantification.
      5. Figure 3 is confusing because it integrates data from multiple overlapping stages (E8.5 to P7). While the authors identify distinct compartments, pharyngeal mesenchyme, intracardiac mesenchyme, and SMCs, it is unclear why stages beyond E14.5 and E17.5 (corresponding to the initial single-cell omics and Visium analysis) were included. Although leveraging additional datasets is a clever approach, the integration of data from such disparate developmental timepoints confounds interpretation. For example, Cluster 6 appears to include cells from both P7 and E12.5 stages. Given that the Visium data represent hearts at E14.5 and E17.5, it is problematic to map clusters derived from other stages onto these spatial datasets.
      6. Throughout the manuscript, the authors describe "lineage relationships" between cell populations, but these are in fact developmental trajectories inferred computationally (via UMAP connectivity and RNA velocity), not true lineage relationships. This distinction is critical and should be explicitly stated.

      For example, the statement "we propose that intracardiac NCCs within C10 and C22 differentiate via C2 into valvular (C21) and subvalvular (C18) interstitial cells" should be framed as a computational inference, not an established lineage relationship. Without clonal lineage tracing data, these claims cannot be verified.

      Similarly, the claim that "NCCs contributing to the AP septum are distinct from other intracardiac NCCs in that they do not populate the cardiac cushions but remain continuous with NCCs populating the distal outflow tract cushion" lacks direct lineage evidence. What experimental data support this assertion?

      In addition, the authors state that what De Bono et al. identified as "outflow smooth muscle" corresponds in their dataset to early intracardiac mesenchymal clusters C2, C10, and C22, present as early as E10.5, when mesenchymal NCC derivatives express immature SMC markers, and that these cells later differentiate into coronary artery SMCs around E14.5 (C23) as well as other non-muscle components. This claim is not verified, whether these cells are indeed the ones differentiating into coronary artery SMCs is based solely on computational inference from C2, C10, C22 to C23. 7. Claims regarding marker expression in specific compartments (for exmaple Hapln1 and Postn in cushions) require additional spatial validation at higher resolution than what is currently provided by the Visium data. Moreover, it is unclear whether these data are single-cell resolution; the authors need to clarify this. HCR staining would be ideal to confirm these expression patterns. Currently, all conclusions are based solely on gene expression without orthogonal spatial confirmation. At minimum, the authors should provide references from the literature supporting these expression patterns. 8. Could the authors confirm the absence of the Sox9high/Scxhigh population in AP septum descendants?

      Minor Comments

      Could the authors better justify their choice of stages (E11.5 to E17.5) for the single-cell multiomic assay? Given that OFT cushions are already populated by NCCs by E10.5 and that AP septum formation is already underway at this stage (see Development (2007) 134(8): 1593-1604), the rationale for beginning at E11.5 should be explicitly stated. Spelling errors in Figure 2e: "ambious" should be "ambiguous"; "atrium venticle" should be "atrium/ventricle"; "ventricle" is misspelled in several locations. The clustering resolution is very high, yielding many clusters that are difficult to distinguish based on the colour code alone. What does "ventricle_CL" refer to? What is the "marginal layer"? A clearer legend or supplementary table defining each cluster would be helpful. Figure 2j: The image is very dim. Figure 3b: What do the numbers correspond to? Additionally, "mesenchyme" is misspelled.

      Referees cross-commenting

      I also agree with the other reviewers' comments. Many thanks.

      Significance

      The overall approach is sound and the datasets generated are valuable resources for the field. The manuscript presents interesting findings regarding NCC heterogeneity in cardiac development.

      As I understand the authors' model: OFT cushion-associated NCCs enter and associate with the OFT cushions, undergoing a Hox-positive to Hox-negative transition, accompanied by a shift in Meis binding and GRN rewiring. These cells pass through a Meis2-Sox9-Scx "skeletogenic progenitor-like" intermediate state and contribute to semilunar valves and coronary artery smooth muscle. Aorticopulmonary septum NCCs, by contrast, retain distinct Hox codes (Hox4/5 enrichment) rather than becoming Hox-negative, and contribute to the AP septum and great vessel smooth muscle.

      Strengths: The integration of multiple omics modalities with lineage tracing is a powerful approach, and the identification of a Hox-dependent bifurcation in NCC fate is a novel conceptual advance.

      Limitations: The reliance on computational trajectory inference without orthogonal lineage validation, combined with the integration of datasets spanning very different developmental stages, limits the strength of some conclusions. The analysis also required more precise anatomical annotations to facilitate accessibility to the readers - to visualise better the distinguishable contribution of the cardiac NCCs to the OFT.

      Advance

      The study extends knowledge in the field by providing novel mechanistic insight into neural crest diversification in the context of cardiovascular development. The nature of the advance is primarily mechanistic, identifying a Hox-Meis regulatory switch that distinguishes cushion-associated from cushion-independent NCC lineages.

      Audience

      This work will be of interest to a specialised audience interested in neural crest cells and developmental biologists using omics approaches to address cell fate diversification in complex tissues.

      Reviewer Expertise

      Developmental biology, lineage analysis, mouse genetics. I do not have the expertise to assess the computational methods used in this paper.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Iwase et al have used multiomics and spatial transcriptomics to comprehensively map neural crest cell contributions to the mouse heart and great arteries. This careful and detailed analysis reveals changes in the transcriptional profile of neural crest cells as they give rise to different regions and cell types in the heart and great vessels. The study significantly builds on a number of recent scRNA-seq analyses of neural crest cell development and includes development of a new informatic tool for regulatory network investigation. Among the new findings documented are downregulation of Hox gene expression in intracardiac crest cells and regulation of Sox9 by Meis transcription factors. Addressing the following points would improve clarity and accessibility.

      1. In Figure 1C it is difficult to visualize all the colors given the mixed contribution of NCC and nonNCC cells to mesenchyme. Please also show YFP transcript distribution in NCC versus nonNCC plots. In addition, it would be helpful to show plots for both NCC and nonNCC for Gata4 and Tbx20.
      2. The authors identify a cardiomyocyte cell cluster in their integrated NCC scRNA-seq plots. Are these cells labelled by Wnt1-Cre in the authors' own dataset? Is the trajectory analysis informative as to the steps preceding acquisition of cardiomyocyte fate?
      3. Linked with this point, is it possible that there are nonNCC cells in the integrated plots? Of note, many of the NCC genes overlap with genes that have also been shown to be expressed in mesodermal cardiac progenitors (including Osr1, Pparg, Dlk1, Tcf21, Ebf2, Tbx20, Sox9). For example, is it possible to distinguish NCC derived smooth muscle within the heart from cells originating from the second heart field that may express smooth muscle genes? Cluster 27 for example appears broadly expressed in the region of ventricular outlets in Figure 3. Comparison with YFP transcript distribution may be helpful here.
      4. Can the authors add any validation of key expression patterns, for example using fluorescent in situ hybridization?
      5. Please elaborate on the decoded Hox code patterns that appear to be indicative of arch origins. Do the results allow determination of whether the trajectories to different cardiac fates inferred in Figure 3D differ in different arches?
      6. The authors need to explain why the authors place an arrow from mesenchymal cluster 18 to 23 in Figure 3D while the trajectory analysis in 3C predicts the opposite direction.
      7. The authors nicely show downregulation of Hox gene expression in NCC cells entering the heart. Can they add discussion of any insights into this from prior studies of loss or gain of Hox gene function?
      8. Figure 3Y could be simplified to more clearly distinguish the two types of Meis binding sites. For example, it may be helpful to reorder the mesenchymal cell types based on Hox expression.
      9. The authors provide nice in vitro and in vivo evidence for an upstream role of Meis transcription factors in regulating Sox9 expression. Can the authors identify from the enhancer sequence (or their transcriptomic dataset) any of the non-Hox transcription factors that Meis may be working with here? Please discuss the significance of Sox9 expression in epicardium driven by the same enhancer. Might this regulation also operate in second heart field progenitor cells where both genes are expressed? It is not evident in Figure 7 that Sox9-EGFP is also expressed in epicardium.
      10. Could this approach yield similar data for Osr1? Please clarify if there is any experimental evidence supporting the predicted negative regulation of Sox9 by Osr1 in the heart illustrated in Figure 8.
      11. Concerning the links between valve mesenchyme and skeletogenic programs it would be relevant to cite the earlier work of Lincoln and Yutzey (reviewed in PMID: 16643886):
      12. In order to increase accesibility of the dataset the authors are encouraged to include a browser link.

      Minor points:

      1. The authors could rephrase the title since the term topographical genetic switch is unclear.
      2. In the introduction, with reference to the De Bono study, please note that Tbx1 was shown to regulate pharyngeal NCC differentiation stage transitions non-cell autonomously.

      Significance

      Iwase et al have used multiomics and spatial transcriptomics to comprehensively map neural crest cell contributions to the mouse heart and great arteries. This careful and detailed analysis reveals changes in the transcriptional profile of neural crest cells as they give rise to different regions and cell types in the heart and great vessels. The study significantly builds on a number of recent scRNA-seq analyses of neural crest cell development and includes development of a new informatic tool for regulatory network investigation. Among the new findings documented are downregulation of Hox gene expression in intracardiac crest cells and regulation of Sox9 by Meis transcription factors.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This manuscript presents a useful computational framework for systematically characterising how heterogeneity in initial conditions or biophysical parameters shapes the dynamic behaviour of protein signalling networks, with potential relevance to understanding adaptive drug resistance. While the approach represents a significant methodological contribution, the extent to which its conclusions are biologically informative remains debated, as the model is not qualitatively or quantitatively validated against experimental data. As a result, the strength of evidence supporting the mechanistic claims is viewed as incomplete.

      We thank the editors and reviewers for their further assessment of the manuscript. The revised public review raises several issues that overlap with points addressed in our previous response, particularly around the intended scope of MDN modelling, the interpretation of parameter sampling, and the qualitative nature of the experimental comparison. In this final revision, we have made targeted clarifications in the main text, Methods, figure legends, and Supplementary Information to make these points more explicit for readers. We emphasise that the present work is intended as a theoretical and exploratory framework for mapping the qualitative dynamic behaviours accessible to a fixed network topology, rather than as a quantitatively calibrated model of a specific tumour or cell line.

      Joint Public Review:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors study the Early Cell Cycle (ECC) network as a proof of concept, focusing on pathways involving PI3K, EGFR, and CDK4/6 with the aim of identifying mechanisms that may underlie resistance to CDK4/6 inhibition in cancer. The biochemical reaction model comprises 50 state variables and 94 kinetic parameters, implemented in SBML and simulated in Matlab. A central component of the study is the generation of large ensembles of model instances, including 100,000 randomly sampled parameter sets intended to represent intra-tumour heterogeneity. On the basis of these simulations, the authors conclude that heterogeneity in kinetic rate parameters plays a stronger role in driving adaptive resistance than variation in baseline protein expression levels, and that resistance emerges as a network-level property rather than from individual components alone. The revised manuscript provides additional clarification regarding aspects of the simulation and filtering procedures and frames the comparison with experimental data as qualitative. Nonetheless, the study is best interpreted as a theoretical and exploratory analysis of the model's behaviour under heterogeneous conditions. Consequently, questions remain regarding the biological grounding of the sampled parameter regimes and the extent to which the reported frequencies of resistance-associated behaviours can be directly interpreted in physiological terms.

      While the authors propose a potentially useful computational framework to explore how heterogeneity shapes dynamic responses to drug perturbation, a number of important conceptual and methodological concerns remain to be addressed:

      (1) The sampling of kinetic parameters constitutes the backbone of the manuscript, yet important concerns remain regarding its biological grounding and transparency. Although the revised version provides additional clarification on the exploration of "model instances", it is still not sufficiently clear how parameter values and initial conditions are generated, nor how the chosen ranges relate to biological measurements. The kinetic rates are sampled over broad intervals without explicit justification in terms of experimentally measured bounds or inferred distributions. As a consequence, it remains uncertain whether the ensemble of simulated behaviours reflects physiologically plausible cellular regimes or primarily the properties of the assumed parameter space. In this context, the large-scale sampling (100,000 parameter sets) resembles a Monte Carlo exploration of the model rather than a biologically calibrated representation of tumour heterogeneity.

      Parameters were sampled from a uniform distribution spanning values 10-5 to 104. Conserved totals were sampled from the range 100 to 104. Each of these is roughly in line with measured spans of orders of magnitude for parameter values and protein expression (REF). Again, we would like to point out that we intentionally kept our ranges broad, and sampled from uniform distributions, to assess upper bounds of heterogeneity, not biologically informed heterogeneity. We also comment on the likely effects of expanding these ranges in our response to (26) in our original rebuttal.

      Main text has been updated to include this information. LINES: 175-179

      Furthermore, the adequacy of the sampling strategy in such a high-dimensional space (94 free parameters) remains open to question. In the absence of biologically informed constraints, the combinatorial space of possible parameter configurations is vast, and it is unclear to what extent the sampled ensembles can be considered representative. This issue is particularly relevant because the manuscript interprets the frequency of resistance-associated behaviours as indicative of their likelihood.

      This was addressed extensively in our original rebuttal, response to point (3). A new section was added to the supplementary text, along with new figures demonstrating the validity of the claims.

      The validation presented in Figure 7 does not fully resolve these concerns. The comparison with experimental data is qualitative, and the simulations are performed in arbitrary time units, which complicates direct interpretation alongside time-resolved experimental measurements. Moreover, certain qualitative discrepancies between simulated and experimental trends (e.g., persistent versus decreasing CDK4/6 activity) are not thoroughly discussed. As this figure represents the primary empirical reference point in the manuscript, the extent to which the model captures experimentally observed dynamics remains uncertain.

      This was addressed in the original rebuttal, response to point (12). The actual time units are arbitrary in the sense that they are determined by the units of the parameters in our model. It is important to understand that the meta-dynamic analysis is not calibrated to data and so the meaning of time units is far less important than the distribution of behaviours. We have updated the figure to reflect the arbitrary units of time in our simulations.

      Finally, aspects of presentation continue to limit transparency. Parameter ranges are described at different points in the manuscript but are not consolidated clearly in the Methods, and the definition of initial conditions remains ambiguous - particularly whether these correspond to conserved quantities or to the dynamic variables used to initialise simulations. In addition, the exact number of model instances underlying specific analyses and figures is not always explicit. Greater clarity on these issues is essential for assessing reproducibility and for interpreting the quantitative claims of the study.

      (2) A central conclusion of the manuscript is that heterogeneity in protein-protein interaction kinetics is a stronger driver of adaptive resistance than heterogeneity in protein expression levels. To assess the latter, the authors fix a nominal set of kinetic parameters and generate 100,000 random initial concentrations for the 50 model species. However, according to the simulation protocol described in the manuscript, each trajectory includes three phases: (i) simulation under starvation conditions to equilibrium, (ii) mitogenic stimulation to a second ("fed") equilibrium, and (iii) application of drug treatment. The equilibrium concentrations reached in phases (i) and (ii) are determined by the kinetic parameters of the model and are independent of the initial concentrations, provided the system converges to a stable steady state. In dynamical systems terms, stable equilibria are defined by the parameter set and attract all initial conditions within their basin of attraction. Since the kinetic parameters are fixed in this experiment, the pre-treatment equilibrium that serves as the starting point for drug application should likewise be fixed. Under these conditions, it is therefore not unexpected that sampling a large number of initial concentrations has limited influence on the treated dynamics.

      This raises conceptual questions about the interpretation of the comparison between kinetic and expression heterogeneity. If the system converges to a unique stable steady state prior to treatment, then variability in initial concentrations does not propagate into variability in drug response, and the observed dominance of kinetic heterogeneity may partly reflect this structural property of the model rather than a biological principle. Clarification is needed regarding whether multiple steady states exist under the nominal parameter set, and if so, how basins of attraction are explored.

      More broadly, it remains unclear why initial protein concentrations can be sampled independently of the kinetic parameters. In biological systems, steady-state expression levels are typically determined by the underlying kinetic rates. A more consistent approach might require constraining initial concentrations to correspond to equilibrium states of the chosen parameter set, thereby introducing relationships between at least some of the 50 initial conditions and the 94 kinetic parameters. Finally, the manuscript employs a non-standard terminology regarding "initial conditions," which may further obscure interpretation of these results and would benefit from clarification.

      This was addressed in the original rebuttal, response to point (4). Text was modified to clarify what was meant by initial conditions to clarify that this meant the conserved total for the protein species. A supplementary figure (supp. fig. 4) was added to demonstrate that changes to the conserved totals of protein species does, in fact, shift the dynamics and steady state equilibria of protein species. Text was updated throughout the paper to ensure that our definition of ‘initial conditions’ was consistent throughout the text.

      (3) The technical implementation of the modelling and simulation framework remains difficult to evaluate due to insufficient methodological detail. Although the authors state that kinetic parameters are randomly sampled, the manuscript does not specify the distributions from which parameters are drawn, nor whether potential correlations between parameters are considered or explicitly ignored. Without this information, it is not possible to assess how implicit modelling assumptions shape the ensemble of simulated behaviours. Given that the conclusions rely on frequency-based interpretations across sampled parameter sets, greater transparency regarding the sampling procedure is essential.

      Updated the main text to clarify random sampling from a log transformed uniform distribution. LINES: 175-179

      A further concern relates to the parameter filtering step. The authors report that the "vast majority" of sampled parameter sets produced systems that were "too stiff," and that these were excluded on the grounds that stiff dynamics are not biologically plausible. However, the manuscript does not clearly define how stiffness is assessed, nor why stiffness is interpreted as biologically unrealistic rather than as a numerical property of the formulation. In standard practice, stiff systems are typically handled using appropriate implicit solvers rather than being discarded. Similarly, parameter sets that produce negative state values are excluded, yet such behaviour may arise from numerical artefacts rather than from intrinsic model inconsistency. The rationale for excluding these parameter sets, rather than adapting the numerical scheme, is not sufficiently justified.

      The reported rejection rate - approximately 90% of sampled parameter sets - is substantial and raises questions regarding the interplay between model structure, parameter ranges, and numerical methods. As currently described, the filtering step appears to select parameter sets based primarily on computational tractability rather than on experimentally motivated biological criteria. The manuscript would be strengthened by clarifying whether the retained parameter sets are representative of biologically meaningful regimes, and by distinguishing clearly between exclusions based on biological plausibility and those arising from numerical considerations.

      This was extensively addressed in the original rebuttal, response to points (6) and (7). Main text was updated to clarify that a solver specific for stiff systems was used. Furthermore, we addressed this issue but consequential analysis revealed that lack of drug response and not achieving steady state in the simulated time period now accounted for the majority of filtering. This had no effect on the distributions of behaviours identified in our analyses. Main text was updated to reflect these changes. Rejection rate was explicitly discussed in main text.

      Finally, important aspects of the simulation protocol require clarification. The model is simulated under "fasted" and "fed" conditions until equilibrium is reached, yet the criterion used to determine convergence is not specified. It would be important to describe how equilibrium is assessed (e.g., based on the norm of the time derivatives). Additionally, it remains unclear whether the mitogenic stimulus applied in the "fed" phase is assumed to be constant over time and, if so, how this assumption relates to biological experimental conditions. Greater detail on these implementation choices is necessary to ensure interpretability and reproducibility.

      This was addressed in the original rebuttal, response to point (8). Clarification about simulations were added to main text, including explicitly stating that mitogenic and drug inputs were continuous stepwise functions and how steady state equilibrium was defined/calculated.

      (4) The manuscript states that the modelling conclusions are strongly supported by existing literature; however, the validation presented does not fully substantiate this claim. As noted above, the comparison with CDK2 and CDK4/6 experimental data remains qualitative, and the use of arbitrary simulation time units complicates interpretation of temporal agreement. The extent to which the model quantitatively or mechanistically recapitulates experimentally observed dynamics therefore remains uncertain.

      This was addressed in the original rebuttal, response to points (13) and (14). Wording was changed to remove the suggestion of strong evidence and the tone was shifted to reflect reasonable qualitative support for our analysis, not strong evidence.

      The claim that the model reproduces known resistance mechanisms is also difficult to assess in light of Figure S10, where a large fraction of network nodes (~80%) appear implicated in resistance under some conditions. If most components of the network can, in at least some parameter regimes, be associated with resistance phenotypes, the resulting lack of selectivity weakens the strength of model-based validation. It becomes challenging to distinguish specific mechanistic insights from generic consequences of network connectivity.

      In addition, the Supplementary Information notes that certain components of the mitogenic and cell-cycle pathways were abstracted or excluded in order to maintain computational tractability. While such abstraction is understandable in a large ODE framework, it raises interpretative questions. Proteins identified as potential resistance drivers within the model may, in some cases, represent aggregated or simplified pathway effects. Clarifying in the main text how such abstractions may influence the attribution of resistance mechanisms would strengthen the biological interpretation of the results.

      This was addressed in the original rebuttal, response to points (15). The discussion was significantly revised to reflect our reasoning with respect to our conclusions. We completely understand that more work could be done to verify our claims, however, our intention is to demonstrate the generalised relationship between network heterogeneity and drug resistance, not to predict patient-specific resistance mechanisms.

      Drug inhibition is central to the manuscript's conclusions. The revised version clarifies that inhibition is implemented as a fixed fractional modification of specific kinetic rate laws. This abstraction is appropriate for exploring network-level responses, but it represents a stylised perturbation rather than a pharmacologically calibrated model of drug action. For full interpretability and reproducibility, the mathematical form of the modified rate laws, as well as the timing of inhibition relative to network equilibration, should be specified unambiguously. The biological implications of the findings depend critically on understanding this modelling choice.

      All equations were included in the supplementary model files, including typeset ODEs, as requested by the reviewers. R15 and R27 contain the relevant equations, which specify the exact implementation of the drug inhibition. Number of time units per simulation phase now included in main text. LINES: 166 – 168

      The one-at-a-time perturbation analysis presented in Figure 5 provides an interpretable ranking of first-order control points across the ensemble and offers mechanistic insight into primary sensitivities of the network. However, many targeted therapies act on multiple components, and resistance frequently arises through combinatorial mechanisms. The reported rankings should therefore be interpreted as identifying primary influences under isolated perturbations, rather than as a comprehensive account of multi-target drug behaviour.

      Overall, the manuscript succeeds in presenting a conceptual and exploratory framework for analysing how signalling network topology can shape the qualitative landscape of adaptive responses under heterogeneous kinetic conditions. Its principal contribution lies in establishing a systematic platform for large-scale in silico exploration. At the same time, the current limitations in biological calibration, parameter grounding, and validation constrain the extent to which the conclusions can be interpreted as predictive or quantitatively representative of specific tumour contexts. Addressing these issues would further strengthen the connection between the theoretical landscape described here and experimentally observed resistance dynamics.

      Joint Recommendations for the authors:

      (1) Supplementary Figure S4 is not sufficiently explained in its current form. The structure of the figure, the meaning of its colour coding, and the intended interpretation are not clearly described, making it difficult for readers to extract the key message without substantial inference. Given that the manuscript relies heavily on large-scale ensemble analyses, clear visual communication is essential. A more detailed legend, explicit definition of axes and colour scales, and improved visual labelling would substantially enhance clarity, accessibility, and reproducibility.

      Supp. Fig. 4 legend updated with additional detail. LINES: Supp. Text. 256 - 263

      (2) The approximately 90% rejection rate of sampled parameter sets should be reported explicitly in the main text of the manuscript rather than only in the Supplementary Information. Given the central role of large-scale parameter sampling in the study, this level of exclusion is a critical aspect of the modelling workflow and directly affects the interpretation of robustness and representativeness. Clear disclosure in the main text would allow readers to properly evaluate the effective size of the analysed ensemble and the implications of the filtering procedure for the generality of the conclusions.

      This was explicitly addressed in the original rebuttal.

      (3) The model would benefit from quantitative validation against experimental data. In Figure 7C, the authors note in the response letter that the simulations are performed in arbitrary time units. However, the figure itself labels the time axis in hours, which may lead readers to infer a direct quantitative correspondence between simulated and experimental time courses. If the simulations are not calibrated to real time, this labelling is potentially misleading and should be corrected. Either the model should be explicitly time-calibrated and quantitatively compared to experimental measurements, or the figure should clearly indicate that the time axis is dimensionless. Clarifying this point is essential to avoid overinterpretation of the agreement between model and data.

      Label updated.


      The following is the authors’ response to the original reviews.

      Joint Public Reviews:

      In this manuscript, the authors proposed an approach to systematically characterise how heterogeneity in a protein signalling network affects its emergent dynamics, with particular emphasis on drug-response signalling dynamics in cancer treatments. They named this approach Meta Dynamic Network (MDN) modelling, as it aims to consider the potential dynamic responses globally, varying both initial conditions (i.e., expression levels) and biophysical parameters (i.e., protein interaction parameters). By characterising the "meta" response of the network, the authors propose that the method can provide insights not only into the possible dynamic behaviours of the system of interest but also into the likelihood and frequency of observing these dynamic behaviours in the natural system.

      The authors studied the Early Cell Cycle (ECC) network as a proof of concept, specifically focusing on PI3K, EGFR, and CDK4/6, with particular interest in identifying the mechanisms that cancer could potentially exploit to display drug resistance. The biochemical reaction model consists of 50 equations (state variables) with 94 kinetic parameters, described using SBML and computed in Matlab. Based on the simulations, the authors concluded the following main points: a large number of network states can facilitate resistance, the individual biophysical parameters alone are insufficient to predict resistance, and adaptive resistance is an emergent property of the network. Finally, the authors attempt to validate the model's prediction that differential core sub-networks can drive drug resistance by comparing their observations with the knock-out information available in the literature. The authors identified subnetworks potentially responsible for drug resistance through the inhibition of individual pathways. Importantly, some concerns regarding the methodology are discussed below, putting in doubt the validity of the main claims of this work.

      While the authors proposed a potentially useful computational approach to better understand the effect of heterogeneity in a system's dynamic response to a drug treatment (i.e., a perturbation), there are important weaknesses in the manuscript in its current form:

      (1) It is unclear how the random parameter sets (i.e., model instances) and initial conditions are generated, and how this choice biases or limits the general conclusions for the case studied. Particularly, it is not evident how the kinetic rates are related to any biological data, nor if the parameter distributions used in this study have any biological relevance.<br /> (2) Related to this problem, it is not clear whether the considered 100,000 random parameter samples sufficiently explore parameter space due to the combinatorial explosion that arises from having 94 free parameters, nor 100,000 random initial conditions for a system with 50 species (variables).<br /> (3) Moreover, the authors filter out all the cases with stiff behaviour. This filtering step appears to select model parameters based on computational convenience, rather than biological plausibility.<br /> (4) Also, it is not clear how exactly the drug effect is incorporated into the model (e.g., molecular inhibition?), nor how it is evaluated in the dynamic simulations (e.g., at the beginning of the simulation?). Moreover, in a complex network, the results may differ depending on whether the inhibition is applied from the start or after the network has reached a stable state.<br /> (5) On the same line, the conclusions need to be discussed in the context of stability, particularly when evaluating the role of initial conditions. As stable steady states are determined by the model parameters, once again, the details of how the perturbation effect is evaluated on the simulation dynamics are critical to interpret the results.<br /> (6) The presented validation of the model results (Fig. 7) is only qualitative, and the interpretation is not carefully discussed in the manuscript, particularly considering the comparison between fold-change responses without specifying the baseline states.

      We thank the reviewers for their thoughtful and constructive comments. In response to their comments, we have undertaken a substantial revision to address all the comments, improve clarity, transparency, and robustness while preserving the paper’s core contribution: a principled, scalable framework (MDN) for mapping how molecular heterogeneity and network architecture shape adaptive drug-response dynamics. At a high level, we clarified the study design and analysis goals, tightened definitions, and added methodological detail where it most advances interpretability. Importantly, these updates leave the analytical pipelines and major conclusions unchanged.

      Conceptually, we now make explicit that our objective is coverage of the output space of qualitative dynamics supported by the network topology, not exhaustive enumeration of parameter space. To support this, we added a convergence analysis and clarified that “triplicates” refers to independent ensembles used to demonstrate reproducibility. We also refined how we describe and implement initial conditions (as conserved total abundances that encode expression heterogeneity) and reframed filtering as minimal numerical/feasibility checks, using rejection sampling to obtain the prespecified ensemble size. Solver choices and input modelling (constant step mitogen/drug) are now spelled out succinctly.

      We expanded the model specification and rationale (complete reaction list with rate laws and brief biological justifications in the Supplement) and unified terminology throughout. Figures and legends have been overhauled for readability and accuracy, with missing labels added and ordering corrected. For validation, we clarified the nature of the single-cell reporter readout, improved Figure 7’s presentation, and emphasised - consistent with our aims - that comparisons are qualitative.

      Finally, we have rewritten the Discussion to centre on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe these revisions materially strengthen the manuscript and fully address all the reviewers’ comments. A detailed, point-by-point response follows.

      Joint Recommendations for the Authors:

      (1) It is confusing exactly what are the different sets evaluated in each cases, e.g. "generated 100,000 model instances, each with the same set of ICs but a unique set of randomly generated parameter values" (lines 299-300), "generated 100,000 model instances (in triplicate), each with the same set of 'nominal' parameter values (see supplementary Table S1), and a unique set of ICs, and repeated the analysis as performed previously" (lines 366-368), "combined the 1000 IC sets with each parameter set to create 1000 model instances" (lines 382-383), "repeated for 1000 parameter sets, allowing us to observe how frequently IC variation induced adaptive resistance independent of the chosen parameter set" (lines 386-387). A small table or just a clearer explanation is needed.

      In response to these comments, we have revised the main text to clarify the process of model instance generation. Specifically, we have made changes at page 7: line 297 - page 8: line 302, page 8: lines 305 - 310, page 9: lines 372-378, and page 9: line 384 – page 10: line 399 in the revised main text.

      We have also added a new Figure (Figure S1) to the supplementary file to allow readers to visualise the model generation process for each relevant set of experiments. Supplementary figures are referenced in the main text where appropriate.

      (2) The authors mentioned performing each simulation in triplicate, which is puzzling as the model is based on deterministic ODEs with fixed parameters for each simulation. Under such conditions, one would anticipate identical results from multiple simulations with the same initial conditions and fixed parameters. Perhaps the authors expect the model to exhibit chaos or aim to assess the precision of the parameter estimates through triplicate simulations. Further clarification from the authors would be valuable to comprehend the rationale behind conducting triplicate simulations in a deterministic setting.

      We agree that repeating deterministic ODE simulations with identical inputs would be redundant. In our study, “triplicate” referred instead to generating three independent ensembles of 100,000 unique model instances each, where model parameters (or initial conditions) were randomly resampled. These ensembles were analysed separately to assess whether the inferred meta-dynamic distributions converged robustly. Indeed, the distributions from the three replicates were nearly indistinguishable, confirming that the results are reproducible and not artefacts of a particular random draw.

      We have revised the main text to clarify this distinction (page 8: lines 305 - 310) and added an extended explanation for meta-dynamic behaviour convergence in the new section Error Convergence in the supplementary text (page 6: lines 184 - 210).

      (3) While the lack of a connection between model parameters and biological data (mentioned in the public review) may not be a fatal flaw in the manuscript, the concern about the 100,000 random samples being insufficient to explore the parameter space is valid. In a thought experiment, considering the high and low rate for each parameter and the combinatorial explosion of possibilities (2^94), the number of simulations performed (100,000) represents only an extremely small fraction of the entire parameter space (~1/10^(23)). This limitation might not accurately capture the true heterogeneity present inside a solid tumour. One potential solution is to determine biological bounds on model parameters through data fitting, which can provide more meaningful constraints for the simulations. Alternatively, increasing the number of simulations and adopting more efficient sampling techniques can enhance the coverage of possible parameter sets.

      We thank the reviewer for this insightful comment. We agree that the 94-dimensional parameter space is vast, and that 100,000 simulations represent only a fraction of the total combinatorial possibilities. However, the objective of our study is not to exhaustively sample the entire parameter space, but rather to sufficiently sample the ‘output space’ - that is, the complete spectrum of qualitative dynamic behaviours the network topology can generate. The key question is whether 100,000 model instances are sufficient for the distribution of these output dynamics to converge.

      To formally address this, we have performed a convergence analysis, which is now detailed in the new supplementary section "Error Convergence" (Supplementary text page 6: lines 184 - 210) and illustrated in Supplementary Figure S12. This analysis demonstrates that the mean squared error (MSE) between dynamic distributions from N and 2N simulations exponentially decreases as N increases, and the distribution of protein dynamics changes negligibly well before reaching 100,000 instances. Furthermore, performing the entire analysis in triplicate with independent random seeds yielded nearly identical meta-dynamic maps (average standard deviation < 0.04%), giving us high confidence that we have robustly captured the network's behavioural repertoire.

      We believe this convergence occurs because the system is degenerate: many distinct parameter sets within the high-dimensional space map to the same qualitative outcome (e.g., 'rebound' or 'decreasing'). Our goal was to capture the set of possible outcomes, not every unique parameter combination that leads to them.

      Regarding the parameter range, we intentionally chose a broad, unbiased range (10<sup>-5</sup> to 10<sup4></sup>)as a proof-of-concept to delineate the theoretical upper limit of heterogeneity the network can support, thereby capturing even rare but potentially critical resistance dynamics. We agree with the reviewer that a future direction is to constrain these ranges using biological data. Such an approach would transition from defining what is possible (the focus of this manuscript) to predicting what is probable in a specific biological context. We have added this important point to the Discussion (page 16: lines 663-679) to highlight this avenue for future work.

      (4) One of the manuscript's main results indicates that protein interactions play a more significant role in driving adaptive resistance than protein expression. To explore the impact of protein expression, the authors fixed a nominal parameter set and generated 100,000 initial concentrations of the 50 proteins in the ODE model. However, the simulations' equilibrium concentrations in the "starvation" and "fed" phases, which form the initial condition for the treated phase, are uniquely determined by the nominal model's kinetic parameters and not the initial conditions, which remain identical for each simulation. From a dynamical systems perspective, stable steady states are determined by the model parameters and attract all initial conditions within their basin of attraction. As a result, a random sampling of the initial conditions has a limited impact on the model dynamics. The authors' conclusion that "the ability of expression to induce resistance also seems to be dependent on the master parameter set" can be explained by this dynamical systems perspective, where the resistance state corresponds to a stable steady state determined by the master parameter set. Considering this, the evidence presented in the manuscript may not fully support the authors' conclusion regarding the importance of protein expressions relative to protein dynamics. The discrepancy might be attributed to a possible misunderstanding of this point, and further clarification from the authors could be helpful.

      We thank the reviewer for the thoughtful perspective. We agree that, in a monostable system with fixed kinetic parameters and fixed conserved totals, varying only the initial split among moieties (e.g., X vs pX) will not change the final steady state; trajectories converge to the same attractor. In our analysis, however, “initial conditions” predominantly refer to total protein abundances (e.g., X_tot = X + pX + complexes), used as a proxy for expression heterogeneity. These totals are invariants on the simulated timescale (no synthesis/degradation in the pre-equilibration phases), and therefore alter the value of the steady state under a given parameter set. In other words, our IC sampling mostly varies conserved totals rather than merely redistributing a fixed total; hence the equilibrium reached after the starvation/fed pre-equilibrations depends on the sampled totals and the kinetics. This can be seen in the new Supplementary Figure S4, showing that changing the ICs does shift the eventual steady state even when kinetic parameters are fixed.

      We have revised the text to: (1) define ICs explicitly as total abundances for multi-state species, (2) distinguish “initial split” from “conserved totals,” and (3) clarify that expression effects are context-dependent rather than universally dominant (page 4: lines 139-141 and page 10: lines 413-416)

      (5) Additionally, it is important to note that the random sampling of 100,000 initial concentrations might not sufficiently explore the vast space of possible initial conditions. In the thought experiment mentioned earlier, where each protein can have high or low expression concentrations, there are approximately 2^(50) = ~10^(15) possible combinations of initial concentrations. Thus, the 100,000 random simulations only represent around ~1/10^(10) of the possible initial conditions in this simplistic scenario. Consequently, this limited sampling of initial conditions may not provide enough information to draw meaningful conclusions, even if the initial conditions were more directly linked to kinetic rates.

      Please see our response to Comment (3). Briefly, our ICs are continuous total abundances (conserved moieties), not binary high/low states; many IC configurations converge to the same qualitative attractors, so we estimate distributional properties rather than enumerate all combinations. Our convergence diagnostics (independent replicates and sample-size doubling) show that the meta-dynamic distributions stabilise well before N=100,000 (see Supplementary Figure S12). We have clarified this in the Supplementary Information (Error Convergence section) with the new convergence results.

      (6) The authors implement a parameter selection step in the manuscript, where they filter out parameter sets that lead to what they term non-biological simulations. However, the rationale for determining if a given parameter set results in a stiff system of ODEs remains unclear. The authors cite references [38,39] to support the claim that stiff equations are not biologically plausible. Still, upon review, it is evident that [38] does not include the term "stiff," and [39] discusses using implicit methods to simulate stiff ODE models without specifically commenting on the biological plausibility of stiff systems. The manuscript lacks direct evidence to justify the conclusion that filtering out parameter sets that result in stiff ODE systems is reasonable. Since the filtering step accounts for the majority of discarded parameter sets, a stronger foundation is required to support the statement that stiff equations are non-biological.

      We thank the reviewer for pointing out the issue in our original justification. The reviewer is correct: stiff systems are a common feature of biological models, and our claim that they are likely ‘biologically implausible’ was not well substantiated. The filtering of these model instances was, in fact, due to a computational limitation rather than a biological principle. The issue was that these parameter sets produced systems of ODEs that were so numerically stiff they were unsolvable within a reasonable timeframe by the SUNDIALS ODE solver suite, which is specifically designed for such systems.

      Following the reviewer's comment, we investigated the source of this prohibitive stiffness. We discovered it was not an intrinsic property of the parameter sets themselves, but rather an artifact of our simulation setup. The extreme stiffness occurred almost exclusively during the initial integration timesteps, caused by the large initial discrepancy between the concentrations of active and inactive protein forms. This large discrepancy created the conditions for overtly stiff solutions i.e. unsolvable with implemented ODE solve settings. To overcome this problem, we set a large maximum number of steps in the ODE solver for the first couple of time points, enabling the solver to overcome the excessively stiff portion of the solve. We found that the vast majority of the previously 'unsolvable' model instances could now be successfully simulated. Consequently, the number of parameter sets discarded due to solver failure is now negligible (< 1%), and this filtering step no longer accounts for the majority of discarded parameter sets. Most importantly, the distributions of dynamics were not significantly altered by this adaptation.

      We have revised the " Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section to reflect this more accurate understanding. We have corrected our original claim regarding the biological plausibility of stiff systems and corrected our use of the references. Ref [38] was included to demonstrate that models of biological systems are stiff, which was a major conclusion of that paper, and [39] was originally included to demonstrate that solving ODEs is reliant on solvers that can integrate stiff systems. Upon review, ref [39] has been removed.

      Overall, this investigation has made our analysis more robust by allowing us to include a wider, more representative range of parameter sets, and has tangibly improved the quality of our study.

      (7) Additionally, it is important to consider the standard method for accounting for stiff systems, as presented in [39], which involves using implicit numerical methods for ODE simulation. The authors mention using numerical methods from the SUNDIALS suite, which includes implicit methods, but the specific numerical method used remains unclear. Furthermore, it would be valuable for the authors to disclose the number of parameter sets that were filtered to obtain the final set of 100,000 accepted parameter sets. This information would provide insights into the extent of filtering and the proportion of parameter sets that were excluded during the selection process.

      We apologise for the lack of specific detail and have now updated the text. To clarify, all ODE simulations were performed using the CVODE solver from the SUNDIALS suite. This solver employs an implicit, variable-order, variable-step Backward Differentiation Formula (BDF) method, which is robust and specifically designed for handling the stiff systems common in biological network modelling. We have now explicitly stated this in the "ODE model construction, modelling, and simulations (page 4: lines 162 – 164)" section of the Methods.

      Regarding the filtered parameters, we have included a revised and detailed discussion of this in the "Sampling and filtering of model instances (page 5: lines 174 – 189)" part in the Methods section (see our response to comment (6) above). Briefly, after applying the filters, ~40–45% of instances did not reach steady state within the simulation timeframe, and ~50–55% did not meet the minimum drug-response criterion. Approximately 10% satisfied all criteria and were retained for analysis. Importantly, we employed ‘rejection sampling’ and continued drawing until we had N = 100,000 accepted instances that satisfied all the criteria.

      (8) An important step in the simulation process described by the authors is the simulation of the "fasted" and "fed" states until an equilibrium is reached. However, it is not clear how the authors determine if the system has reached an equilibrium. It would be helpful if the authors could provide more information regarding the criteria used to assess equilibrium in the simulations. Regarding the "fed" state, it is not explicitly stated whether the mitogen stimulus is assumed to be constant throughout the "fed" experiment. Considering the dynamic nature of mitogen stimulation in biological systems, it would be beneficial if the authors could clarify this assumption and discuss its biological relevance.

      We apologise for the lack not specifying this in the original text. A simulation was considered to have reached equilibrium when the concentration of every protein species changed by < 1% over the final 100 time steps of the simulation phase. We have now added this criterion to the "Sampling and filtering of model instances (page 5: lines 177 – 179)" part of the Methods section.

      Regarding the second part of the comment, in our simulations, both the mitogenic and the drug inputs were modelled as constant, stepwise functions that, once turned on, remained at a fixed concentration for the remainder of the simulation. The biological rationale for this choice was to rigorously test for bona fide adaptive resistance. By maintaining a constant mitogenic and drug pressure, we can ensure that any observed recovery in the activity of downstream proteins is due to the internal rewiring and adaptation of the signalling network itself, rather than an artefact of the removal or decay of the external stimulus/drugs. We have now clarified this rationale in the "ODE model construction, modelling, and simulations (page 4: lines 168 – 171)" part of the Methods section.

      (9) The "Description of Model Scope and Construction" section in the Supplementary Information should include explicitly the model reactions and some discussion about their specific form (e.g., why is '(((kc2f1*pIR*PI3K) / (1 + (pS6K/Ki2))) + (kc2f2*pFGFR*PI3K))' representing the phosphorylation rate of PI3K, with pS6K in the denominator?).

      The reviewer is right to ask for model justification. We have expanded the Supplementary “Description of Model Scope and Construction” section (page 2: line 63 – page 5: line 185) to include a complete reaction list with rate laws and a brief rationale for each. We also explain the specific PI3K phosphorylation term: activation by pIR and pFGFR is attenuated by pS6K via a denominator, which captures the well-described S6K-mediated negative feedback that reduces activation (e.g., via IRS1 phosphorylation).

      (10) In line 349, the statement "Given that CDK46cycD is only strongly suppressed in just under 60% of the model instances (Figure 3C)" lacks clarity regarding where to look to interpret the 60% value. If this means that 4 out of the 7 model instances are resistant, and the other 2 proteins also have the same percentage of resistance, then there is no apparent reason to focus solely on CDK46cycD.

      The reviewer is correct; the figure reference was an error, which has been rectified in the main text (page 9: line 355). The actual figure reference was to Supplementary Figure 2A, which shows the heatmap of all the frequencies for each protein dynamics for all the active protein forms. CDK4/6cycD shows a sustained decreasing dynamic for 59.93% of model instances, which is where this number was derived. We have also now explicitly referenced this number in the supplementary Figure 2A legend.

      We focus on CDK4/6cycD because it is the direct pharmacological target of CDK4/6 inhibitors. Our point was to suggest that even when the target is suppressed in the majority of instances (~60%), this does not reliably propagate to uniform downstream inhibition across the network, thus highlighting emergent, network-driven adaptive responses.

      (11) We observed that in Fig. 5A, the authors show that multiple pathways are blocked. However, it is unclear whether they reduced the value of one parameter in the experiment or simulated multiple combinations of parameter inhibition. Considering the large number of parameters (94) in the model, if the authors simulated all possible combinations of parameter inhibition, the number of combinations would be significantly more than 94. An actual inhibitor typically has an inhibitory effect on multiple molecules. Therefore, it would be necessary to identify the parameters that lead to drug resistance when multiple molecules are inhibited. However, examining the inhibition patterns for all 94 parameters would be practically impossible. As a potential approach, we suggest using ensemble learning techniques, such as random forests, to handle this problem efficiently. With a dataset of binary outputs indicating the presence or absence of resistance for a sufficient number of inhibition patterns, ensemble learning can be applied to find the parameters that contribute to drug resistance. Popular feature selection algorithms like Boruta could be utilised to identify the most relevant parameters. The results obtained by ensemble learning are similar to the ranking in Fig. 5C, potentially providing a more robust validation of the authors' findings. By incorporating these additional analyses, the authors could strengthen the reliability and significance of their results related to parameter inhibition and drug resistance.

      We appreciate the suggestion and the opportunity to clarify. Figure 5A depicts multiple pathways were interrogated, but in the analysis, parameters were inhibited one at a time (OAT) - not in combination. We have revised the figure legend and added a section named “Protein knockdown perturbation analyses (page 6: lines 228 – 233)” in the Methods section to make this explicit. Moreover, some additional text in the main text has been slightly modified to make this clearer (page 11: lines 462-463, page 24: lines 856-857).

      We chose the OAT design intentionally to obtain causal, first-order attribution of control points across a broad parameter ensemble without confounding from simultaneous co-inhibition. This provides an interpretable ranking of primary drivers (Figure 5C) that is consistent with the paper’s mechanistic focus. We agree that a multi-target inhibition approach could be a useful next step; however, an exhaustive combinatorial screen is beyond the scope of this proof-of-concept. In such future studies, the ensemble learning, as suggested by the reviewer, could be layered onto our MDN framework to assess robustness of the ranking under co-inhibition.

      (12) In explaining the parameterization of the model, we find an implication of a quantitative model. However, upon examining the results in Fig. 7D, we observe that they are only qualitatively correct. When comparing Figs. 7A and 7C, we note that many model instances are immediately suppressed, and the time scale remains unknown. We believe it would be essential for the authors to explain how the model of this study maintains its quantitative nature despite the results in Fig. 7. If such an explanation cannot be provided, it raises concerns regarding the biological reliability of several findings within this study.

      While our framework is built on quantitative ODEs, the validation we present in Figure 7 is indeed qualitative. This is an intentional and key feature of our study's design. Our goal was not to build a calibrated, quantitative model of a specific cell line (e.g., MCF10A), but rather to establish a proof-of-concept theoretical framework that systematically explores the full spectrum of dynamic behaviours a given network topology can possibly generate. To achieve this, we intentionally sampled parameters from a very broad, unbiased range to delineate the theoretical upper limit of heterogeneity. This in silico population is therefore designed to be far more heterogeneous than any single isogenic cell line.

      The striking qualitative agreement seen between our meta-dynamic distributions and the single-cell data in Figure 7D is thus not a failure of quantitative prediction, but rather a strong validation of our core premise: that a significant degree of signalling heterogeneity exists in cell populations and that our framework can effectively capture its emergent properties.

      Regarding the specific comment on Figure 7C, we apologise for the lack of clarity. Nominally, we chose to simulate for 24 hours however, the x-axis in our simulations represents arbitrary time units, as the timescale is dependent on the meaning/units of the parameter values. The goal is to compare the qualitative shape of the response (e.g., rebound, sustained decrease), not the absolute time in hours. Moreover the rapid initial suppression seen in many of our model instances (Fig 7C) is a direct parallel to the rapid suppression seen in the experimental data (Fig 7A). This initial phase is followed by a wide variety of adaptive behaviours (or lack thereof) in both our simulations and the real cells, which is the key phenomenon we are studying.

      We have revised the text (page 14: lines 598-601) and Figure 7’s legend to state more explicitly that our validation is qualitative and to clarify the purpose of our broad, uncalibrated approach. We have also added a note in the Discussion (page 18: lines 744-747) that calibrating this framework with cell-line-specific data is a natural next step for generating quantitative, context-specific predictions.

      (13) Related to the previous point, the experimental data is presented as fold-change during CDK4/6 inhibition, and we notice that the initial fold-change at time 0 varies between 1 and 1.8. The difference in initial fold-change is unclear to us, as our understanding of fold-change typically corresponds to the change from baseline, typically represented by the protein concentration at time 0.

      Furthermore, while the experimental data exhibits uniformly decreasing CDK4/6 activity, a substantial number of simulations indicate constant CDK4/6cycD, showing a significant qualitative discrepancy between the simulations and experimental findings. This disparity makes it difficult for us to interpret the comparison between the two datasets effectively, given the complexities in comprehending the experimental fold-change figure.

      As Figure 7 serves as the primary validation of model simulations in the manuscript, we believe that the current presentation may not provide a compelling reason to believe that the model accurately captures experimental data. To enhance clarity and validation, we suggest overlaying the experimental data over the simulations or considering the median and 10/90% percentile of the experimental data, which may potentially offer improved readability and facilitate a more robust interpretation of the comparison.

      The experimental data from Yang et al. (ref 55, main text) measures kinase activity using a nucleus-to-cytoplasm translocation reporter system, wherein a bait protein is phosphorylated by the target kinase causing it to translocate from the nucleus to the cytoplasm. Hence, the y-axis represents the ratio of nuclear vs. cytoplasmic fluorescence, not a fold-change from a t=0 baseline. The variation in the starting value (between 1 and 1.8) reflects the inherent heterogeneity in the reporter's localization across individual cells even before the drug is added. We have updated the y-axis label and revised Fig. 7’s legend to state this explicitly.

      The most likely explanation for the discrepancy between experimental dynamics and our simulation dynamics is that the experimental data comes from an isogenic cell line that is largely sensitive to CDK4/6 inhibition. Our simulations are derived from a very wide parameter sweep, where the intent is to represent all possible cell states. It is quite striking that that there is such a high correlation between the experimental data and simulations, indicating that perhaps the heterogeneity of even isogenic cell lines is significantly greater than might be intuited; a point we now mention in the revised Discussion (page 17: lines 716-727).

      It is worth noting again, that our analysis is intentionally constructed to be as heterogeneous as possible, and is not trained on any biological data that might otherwise constrain the output-behaviour space. The isogenic cell line almost certainly represents a much more constrained output-behaviour space than our analysis.

      The y-axis label has also been updated accordingly. As mentioned in (12) this result is intended as a qualitative validation, showing that cell lines indeed have highly variable signalling dynamics. Given the range of parameters tested, we think it is surprising that the degree of agreement between the experiment and our analysis is as high as it is. Again, we believe this suggests that heterogeneity may be more prevalent than is intuited. We do not believe we have made any strong quantitative claims in the main text, and we certainly aim to work towards biological, quantitative validation in the future. Finally, we altered the wording of the results heading (page 14: line 562) to make it clear that we are only making qualitative claims and removed the claim that the evidence was strong.

      With these clarifications and corrections, we believe the validation is now much more compelling. The key point is not a perfect quantitative match, but the strong similarity in the distribution of heterogeneous behaviours.

      (14) The authors mention simulating treatment with 10nM of CDK4/6i or Ei, but specific details on how this treatment is included in the model simulations are not provided. This lack of information makes it challenging to fully evaluate the comparison between model simulations and experimental evidence in Figure 7. It would be highly appreciated if the authors could clarify how the treatment with CDK4/6i or Ei is incorporated into the simulations to facilitate a better understanding and interpretation of the results.

      To clarify, the effects of the inhibitors were incorporated directly into the kinetic rate laws of their respective target reactions.

      CDK4/6 inhibitor (CDK4/6i): This was modelled as an inhibitor of the formation of the active CDK4/6-cyclin D complex. We have now explicitly detailed this in the description for reaction R27 in the "Description of Model Scope and Construction" section of the Supplementary Information.

      Estrogen Receptor inhibitor (Ei): This was modelled as an inhibitor of the estrogen-dependent activation of the Estrogen Receptor. This is now explicitly detailed in the description for reaction R15 in the same supplementary section.

      It is however important to reiterate that our goal in Figure 7 is qualitative, shape-based comparison; therefore, we used a fixed fractional inhibition (reported in Methods) rather than a calibrated IC50/Hill model.

      (15) The authors state strong support for their modelling conclusions based on the literature. However, we still have concerns regarding the validation of the model against CDK2 or CDK4/6 data in Figure 7, as it appears less convincing to us. Furthermore, the authors list known resistance mechanisms that are replicated in their modelling. Nevertheless, we find the conclusion somewhat weakened by Figure S10, where approximately 80% of the nodes are implicated in some form of resistance pathway. This raises questions about the model's selectivity, as many proteins included in the model seem to drive resistance in some manner. In the Supplementary Information, the authors mention excluding or abstracting some protein species from the mitogenic and cell cycle pathways to manage computational resources effectively. This abstraction makes it difficult to determine if the proteins identified as potential drivers of resistance genuinely drive resistance or might represent abstractions of other potential drivers. To enhance the manuscript's clarity and address potential concerns about the model's selectivity and abstraction, we suggest providing more details and discussion in the main text.

      The reviewer's observation that a large number of nodes are implicated in resistance pathways in Figure S10 is correct. However, we argue this is not a weakness of the model's selectivity, but rather a key finding that reflects the biological reality of adaptive resistance. The literature is replete with a wide and growing number of distinct mechanisms of resistance even to a single class of drugs (1,2), which supports the idea that cancer can co-opt a wide variety of network nodes to survive.

      Figure S10 is not a binary map where every implicated node is equal, instead it is a likelihood map, where the colour and weight of the connections represent how often a particular interaction participates in driving resistance across the theoretical full range of possible network dynamics. The figure shows that while many nodes can contribute to resistance, they do so in a hub-like manner i.e. small subsets of nodes coordinate to drive resistance. This provides a rationalised, data-driven prioritisation of the most dominant and recurrent resistance strategies. We draw two important conclusions from this work 1) Resistance likely occurs due to resistance hubs, not individual proteins, and 2) that the frequency of a resistance hub in an MDN analysis is likely proportional to the frequency of that hub emerging as a resistance mechanism in a population of cells and patients.

      Regarding the issue of abstraction, the reviewer is correct that this is an inherent feature of any tractable systems model. In our case, several species in the mitogenic/cell-cycle pathways are module-level proxies to control model size. The highly implicated "hub" nodes in our model likely represent critical cellular processes that are themselves composed of several individual protein interactions.

      To address these concerns, we have significantly revised the Discussion (page 16: lines 681 – 694) to: (1) frame resistance as a network-level phenomenon; (2) show that our frequency-based ranking is selective, prioritising the most probable, recurrent mechanisms; and (3) clarify that - given model abstraction -our findings implicate critical processes (modules), not just single proteins, as the drivers.

      Overall, these changes do not alter our main conclusions: adaptive resistance is an emergent, network-level property; many routes exist, but a smaller set of nodes/modules consistently carry the largest influence across heterogeneous contexts.

      (16) We consider that the figures and legends, including the supplementary information, are inadequately explained. The information provided is insufficient for us to comprehend the figures fully, leading to the need for interpretation on our part as readers. This could potentially introduce biases when trying to understand the claims made by the authors. To improve our understanding, it would be essential for the authors to assign appropriate labels to the figures and provide comprehensive explanations in the legends. For example, in Fig 3, we suggest labelling the tree diagrams in panels A and B, as well as the colour bars. We also recommend applying the same approach to other figures, adding accurate axis labels and descriptions of colour gradients to enhance clarity.

      We thank the reviewer for this critical feedback. To address this comment, the figure legends have been revised where appropriate and greatly expanded to improve their comprehension. Moreover, we have added explicit labels to all previously unlabelled components, such as the cluster dendrograms and colour code bars in Figure 3A, B.

      (17) To enhance readability, we recommend interchanging the order of Figures 1 and 2 in the sequence they appear in the main text. Alternatively, the text can be adjusted to refer to the figures in the correct order. Additionally, attention should be given to the bottom of Fig 1, which appears to be cropped or cut off. Furthermore, the incorrect word spacing in some figure elements, such as Fig. 3A title, Fig. 5B title, and Fig. 6B y-label, should be corrected for improved visual presentation.

      Following the reviewer’s comment, the order of Figures 1 and 2 has been switched to reflect the order in which they are referred to in the main text. These Figures have been re-exported to fix unintentional word spacing errors.

      (18) We recommend that the language used to refer to the initial conditions in the manuscript is clarified and homogenised. Currently, the authors use different terms such as "basal expression," "protein expression," "state variable values," or "initial conditions" to refer to them. This variation in terminology can be confusing for readers. In particular, the use of "basal expression" is problematic, as it typically refers to the leaky value of a reaction in the absence of an inducer, making it another biophysical parameter of the system rather than an initial condition. To enhance clarity and consistency, we suggest the authors decide on a single term to refer to the initial conditions throughout the manuscript and provide a clear explanation of its meaning to avoid any confusion. This will help readers better understand the concept being discussed and prevent any potential misinterpretations.

      We thank the reviewer for this very helpful suggestion. To resolve this and improve clarity, we have homogenized the language throughout the manuscript. We now clarify the use the following 3 terms in their specific contexts:

      We use “protein abundances” exclusively for the conserved total abundances of multi-state species (e.g., Xtot = X + pX + complexes) that are sampled across instances to represent expression heterogeneity.

      We use ‘initial conditions’ to refer to initial values of the state variables in a model simulation. This term is related to protein abundance as the setting of initial conditions for conserved species sets the protein abundance. This is explicitly stated in the text (page 3: lines 87 - 91).

      We use “state variables” to refer to the time-dependent model species.

      We avoid the term “basal expression” in technical descriptions. Where a biology-facing phrase is helpful, we use “protein expression level”. This is used when referring to the biological concept that the initial conditions are intended to represent, i.e. the heterogeneity in protein amounts across a cell population.

      We have performed a thorough search-and-replace to ensure this new convention is applied consistently and have removed the potentially confusing term "basal expression" from the revised manuscript.

      (19) Why are saturable functions (e.g., Michaelis-Menten functions) ignored in the model? What are the potential consequences?

      The main objective of this work was to perform a large-scale, systematic exploration of a high-dimensional parameter space (94 parameters) to map the full repertoire of qualitative dynamic behaviours a network topology can support. Using saturable functions like Michaelis-Menten kinetics would have roughly doubled the number of parameters to be explored (from k to Vmax and Km for each enzymatic reaction), making a parameter sweep of this scale computationally intractable. We therefore prioritised the breadth of the parameter search over the depth of kinetic detail, which we believe is the appropriate choice for a proof-of-concept study focused on heterogeneity.

      This simplification has potential consequences. A major one is that our model cannot capture phenomena that arise specifically from enzyme saturation, such as zero-order kinetics or certain forms of ultrasensitivity (switch-like responses). However, we argue that this is an acceptable trade-off for two main reasons: (1) Our analysis is based on classifying broad, qualitative response shapes (increasing, decreasing, rebound, etc.). Mass-action kinetics are fully capable of generating this rich spectrum of behaviours; and (2) by varying the mass-action rate constants over nine orders of magnitude (from 10<sup>-5</sup> to 10<sup4></sup>), our parameter sweep effectively samples a vast range of reaction efficiencies. A very low rate-constant can approximate the behaviour of a saturated, low-efficiency enzyme, while a high rate-constant can approximate a highly efficient, non-saturated one. In this way, the broad sweep of the rate parameter partially reflects the effects that would be captured by varying Vmax and Km.

      For transparency, we have added a brief rationale to the “ODE model construction, modelling, and simulations” part of the Methods (revised main text, page 4: lines 153-155) and the "Description of Model Scope and Construction" section in the Supplementary file (Supplementary text page 2: lines 63-73).

      (20) Given the relevance of the concept of "heterogeneity" in this work, a short discussion about biochemical noise and its implications on the analysis (e.g., why it is not included, and if it will be a next step) would be appreciated.

      Our MDN modelling framework represents heterogeneity by creating an ensemble of deterministic models, where each model instance has a unique set of kinetic parameters and/or initial protein abundances. We propose that this is a powerful way to mechanistically represent the functional consequences of all sources of cellular variation. Over time, the effects of genetic mutations, epigenetic states, and even the time-averaged impact of intrinsic biochemical noise will manifest as changes in the effective interaction strengths and protein concentrations within a cell. Our large-scale parameter/IC sweep is designed to systematically explore the full range of dynamic behaviours that can emerge from this underlying biological variation. Therefore, our approach does not compete with stochastic modelling but is complementary to it. While stochastic simulations can capture the dynamic trajectories of single cells, our framework provides a panoramic view of the entire spectrum of possible stable phenotypes that can emerge at the population level. We agree that modelling intrinsic biochemical noise (stochasticity arising from finite copy numbers), e.g. using chemical Langevin or SSA, is a possible extension in future work but expected to be very computationally expensive. We have added a brief discussion on this as future direction in the revised Discussion.

      (21) We have noticed that the first four paragraphs of the Discussion section overlap with the Introduction, as they mainly reiterate the significance of the study itself rather than focusing on the specific results obtained. To avoid redundancy and provide a more cohesive and informative discussion, we recommend that the authors shift the focus of the Discussion section towards presenting potential interpretations, even if they are not definitive, of the results obtained. By doing so, the Discussion will serve as a valuable platform for deeper analysis and insightful observations, allowing readers to better comprehend the implications and significance of the research findings.

      We thank the reviewer for this structural feedback. Following the reviewer's feedback, we have significantly rewritten and restructured the Discussion section. The redundant introductory material has been removed.

      The rewritten Discussion centres on interpretation, implications, and connect our findings to the literature. It now: (i) frames MDN as a systems-level framework that links molecular heterogeneity to qualitative signalling “meta-dynamics” and adaptive escape under constant drug pressure; (ii) highlights two key findings: an asymmetry in control (interaction kinetics exert stronger, more consistent influence than protein abundance) and a topology-driven convergence whereby a vast parameter space funnels into a finite set of recurrent behaviours; (iii) shows that resistance is a network-level property, with many possible routes but a small set of recurrent hubs/modules dominating; and (iv) provides a qualitative alignment with single-cell reporter data while clarifying the intent and limits of that comparison. Moreover, we now explicitly discuss limitations (rate-law simplifications, broad priors, determinism, and modular abstractions) and outline next steps for future research, including data-constrained priors and stochastic extensions.

      We believe this substantial revision has transformed the Discussion into a much more insightful and valuable part of the manuscript that directly addresses the reviewer's concerns.

      (22) The supplemental text file containing the model equations can be a bit challenging to read and understand. It would be greatly beneficial if the authors could consider generating a file using a typesetting program.

      We have now included a typeset list of state variable equations and ODEs, along with the original model files.

      (23) The authors mentioned that some model parameterizations result in negative solutions, which is surprising. Access to the model equations would help understand why this happens and is crucial for researchers who may want to use this approach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.ach. Clarifying the model equations' presentation would enhance transparency and aid other researchers in applying this method for similar research questions.

      The reviewer is correct to be surprised by the mention of negative solutions, as negative concentrations are physically impossible. We clarify that these are not a result of any structural flaw in our model's equations but are a well-known, although rare, numerical artifact of floating-point arithmetic in computational solvers.

      Our model is constructed using standard mass-action and first-order kinetics, which structurally guarantee non-negativity. However, when a species' concentration approaches the limits of machine precision (i.e., becomes a very small number extremely close to zero), the ODE solver can, in rare instances, numerically undershoot zero, resulting in a small negative value. If this occurs, it can lead to instability in subsequent integration steps.

      This is not a biological phenomenon but a computational one. Therefore, the standard and appropriate procedure, which we follow, is to implement a filter that discards any simulation trajectory where such a numerical instability occurs.

      (24) The reference listed for the CDK4/6 and CDK2 measurements is Yang et al. [55] in the figure caption, but as Xe et al. in lines 559-561 of the manuscript.

      The text has been updated to match citation.

      (25) We suggest that the authors revise and cite a previous study conducted by Yamada et al. (Scientific Reports, 2018), which presents an approach to expressing cell heterogeneity as a probability distribution of model parameters.

      Following this suggestion, we have revised the Discussion (see response to comment (21)) to include and discuss Yamada et al. (Scientific Reports, 2018), which models cell heterogeneity as a probability distribution over parameter values.

      (26) In the manuscript, on line 677, the authors state, "This indicates that there is an upper limit to the degree to which parameter sets can influence the qualitative shape of a protein's dynamic within a given network topology." We wish to highlight that this finding may not be particularly surprising. Given that the parameters were randomly determined within a specific range, it is understandable that altering the number of parameter samples would not substantially impact the distribution of model instances.

      We thank the reviewer for this insightful comment, which allows us to clarify the significance of this finding. While it is true that any sampling from a fixed distribution will eventually converge statistically, our conclusion is not about statistics but about the intrinsic, constraining properties of the network's topology. The novelty is not that the distribution converges, but that it converges to a surprisingly limited and finite repertoire of qualitative dynamic behaviours. A complex, non-linear network with nearly 100 free parameters could theoretically generate an almost endless variety of complex dynamics. Our finding is that this specific biological topology acts as a powerful filter, robustly channelling the vast majority of the near-infinite parameter combinations into a small, recurring set of functional outputs (increasing, decreasing, rebound, etc.).

      The reason for this finite limit is mechanistic, as the reviewer's comment prompted us to investigate further. Our parameter sweep already covers an extremely wide, 9-order-of-magnitude range. As we pushed parameter values to even greater extremes in exploratory simulations, we found they do not generate novel, complex dynamic shapes. Instead, they tend to drive network nodes into saturated states- either permanently "on" (maximally activated) or permanently "off" (minimally activated). In both cases, the node becomes unresponsive to upstream perturbations.

      Therefore, further expanding the parameter range would be unlikely to uncover new behavioural categories; it would simply increase the proportion of model instances classified as "no-response." This demonstrates a fundamental principle: the network topology itself enforces an upper limit on its dynamic complexity. We think this inherent robustness is what allows for reliable cellular signalling in the face of constant biological variation. We believe this is a non-trivial finding, and we have revised the Discussion (page 16: lines 664 - 680) to state this conclusion and its implications more clearly.

    1. Rapport de Synthèse : Les Influences Étrangères dans le Secteur de l'Éducation et de la Recherche

      Résumé Exécutif

      Ce document synthétise les témoignages recueillis lors d'une audition parlementaire portant sur les stratégies de détection, de caractérisation et de riposte face aux influences étrangères malveillantes au sein du système éducatif français.

      Les principaux vecteurs d'influence identifiés incluent les interventions humaines directes, l'incitation à des comportements hostiles aux valeurs républicaines et la manipulation massive de l'information via les réseaux sociaux.

      Le ministère de l'Éducation nationale et le ministère de l'Enseignement supérieur renforcent leurs dispositifs de défense à travers trois piliers :

      • Un contrôle accru des intervenants étrangers (passage des ELCO aux EILE).

      • Une éducation systématisée aux médias et à l'information (EMI) pour développer l'esprit critique des élèves.

      • Une protection renforcée du patrimoine scientifique et technique dans les universités, incluant désormais les sciences humaines et sociales.

      Malgré une vigilance accrue, les institutions font face au défi complexe de concilier la liberté académique et l'autonomie des établissements avec la nécessité de contrer des ingérences de plus en plus subtiles et diffuses.

      --------------------------------------------------------------------------------

      1. Enseignement Scolaire : Canaux d'Influence et Dispositifs de Vigilance

      Le Directeur général de l'enseignement scolaire identifie trois canaux principaux par lesquels s'exercent les influences étrangères : les personnes physiques, les incitations comportementales et la diffusion de fausses informations.

      Le Contrôle des Intervenants : Du dispositif ELCO aux EILE

      Le passage des enseignements de langue et de culture d'origine (ELCO) aux enseignements internationaux de langues étrangères (EILE) a marqué un tournant dans la maîtrise des interventions étrangères :

      • Contrôle linguistique : Un niveau B2 en français est désormais exigé et certifié.

      • Inspection pédagogique : 100 % des nouveaux intervenants doivent être inspectés dès leur première année pour vérifier la conformité des contenus aux valeurs républicaines.

      • Honorabilité : Le cadre a été renforcé pour permettre de mettre fin immédiatement aux interventions en cas de manquement.

      Éducation aux Médias et à l'Information (EMI)

      Face au "torrent" d'informations manipulées sur les réseaux sociaux, l'institution privilégie une approche préventive plutôt que curative :

      • Parcours scolaire : Une sensibilisation commence dès le primaire et se poursuit jusqu'à la terminale.

      • Partenariats : Collaboration avec le CLEMI et des journalistes professionnels (Semaine de la presse touchant 4,5 millions d'élèves).

      • Outil PIX : Généralisation de la certification des compétences numériques, incluant la cybersécurité et la vérification des sources, pour tous les élèves de 3ème et de Terminale.

      Protection des Personnels et Veille Numérique

      Une cellule ministérielle de veille et d'alerte identifie quotidiennement les mises en cause d'agents sur les réseaux sociaux :

      • Protection fonctionnelle : Désormais accordée de manière présomptive et automatique dès qu'un agent est nommé ou menacé.

      • Réaction rapide : Demandes de suppression de contenus, mesures de sécurisation des établissements en lien avec la police, et interventions pédagogiques en classe pour rétablir les faits.

      --------------------------------------------------------------------------------

      2. Enseignement Supérieur et Recherche : Souveraineté et Ingérences

      Le secteur de l'enseignement supérieur est confronté à des menaces plus ciblées visant le patrimoine scientifique ou la construction de narratifs idéologiques.

      Cartographie de la Menace

      Bien que les informations précises relèvent souvent du secret défense, les " Usual Suspects " sont identifiés :

      • Puissances dominantes : Chine et Russie (interventions les plus intrusives).

      • Zones émergentes : Afrique du Nord (notamment le Maroc), Turquie, Azerbaïdjan et Moyen-Orient.

      • Évolution des méthodes : Les opérations sont moins frontales et plus subtiles, se déplaçant vers les sciences humaines et sociales pour influencer les récits nationaux et internationaux.

      Protection du Patrimoine Scientifique et Technique (PPST)

      Le dispositif PPST, traditionnellement axé sur les "sciences dures", s'adapte aux nouvelles formes d'ingérence :

      • Extension aux SHS : Inclusion progressive d'unités de recherche en sciences humaines (ergonomie, cognitique, géographie) dans le périmètre des Zones à Régime Restrictif (ZRR).

      • Réseau FSD : Consolidation du réseau des Fonctionnaires de Sécurité et de Défense dans chaque établissement.

      • Équilibre délicat : Difficulté de restreindre les partenariats scientifiques internationaux (notamment avec la Chine) sans nuire à la qualité de la recherche française.

      | Dispositif | Objectif Principal | Champ d'Application | | --- | --- | --- | | PPST / ZRR | Protection contre le pillage technologique et scientifique. | Laboratoires sensibles, technologies duales, et désormais certaines SHS. | | FSD / HFDS | Pilotage de la sécurité et détection des ingérences. | Ensemble des universités et établissements de recherche. | | Contrats pluriannuels | Intégration de clauses de vigilance stratégique. | Établissements spécifiques (ex: INALCO). |

      --------------------------------------------------------------------------------

      3. Défenses des Valeurs Républicaines et de la Laïcité

      Les atteintes à la laïcité et les phénomènes de radicalisation font l'objet d'un suivi statistique rigoureux, révélant une corrélation avec l'activité sur les réseaux sociaux.

      Analyse des Atteintes à la Laïcité

      • Tendances : Une baisse notable des signalements a été observée entre le premier et le deuxième trimestre de l'année scolaire en cours (passant de 3 306 à 1 731 faits), suite à la clarification de l'interdiction de l'abaya et du qamis.

      • Premier degré : Inquiétude sur l'augmentation des contestations chez les élèves plus jeunes, souvent influencées par le milieu familial.

      • Formation : Un plan massif vise à former 100 % des personnels (environ 300 000 par an) au respect de la laïcité et aux valeurs de la République d'ici deux ans.

      Antisémitisme et Liberté Académique

      Depuis les événements du 7 octobre, la vigilance s'est accrue dans l'enseignement supérieur :

      • Signalement systématique : Création d'un canal numérique unique pour centraliser les faits d'antisémitisme.

      • Instrumentalisation politique : Si des mouvements étudiants ou des partis sont impliqués dans certains blocages (ex: Sciences Po), aucune preuve formelle de pilotage direct par une puissance étrangère n'a été établie à ce stade, bien que l'influence de certains réseaux sociaux soit suspectée.

      --------------------------------------------------------------------------------

      4. Points de Vigilance et Recommandations du Sénat

      L'audition souligne plusieurs zones de vulnérabilité persistantes :

      • Données et Équipements : Les établissements sont contraints par le code de la commande publique, les empêchant parfois d'exclure des fournisseurs de matériels physiques (serveurs, terminaux) provenant de pays à risque.

      • Transparence des Intérêts : La question de l'instauration de déclarations d'intérêts pour les chercheurs et intervenants extérieurs reste un sujet de débat pour améliorer la transparence sans enfreindre la liberté académique.

      • Attribution : La difficulté majeure demeure l'attribution des campagnes de désinformation à des États étrangers, les frontières entre influenceurs isolés et stratégies étatiques étant de plus en plus floues.

      --------------------------------------------------------------------------------

      Citations Clés

      "Nous ne sommes pas en mesure de déconstruire en classe chaque fausse information qui circule tous les jours sur les réseaux sociaux ; en revanche, nous avons une action d'éducation qui se déroule sur l'intégralité de la scolarité." — Directeur Général de l'Enseignement Scolaire

      "Le champ de construction du narratif et d'influence qui serait plutôt sur les secteurs géographie ou de sciences sociales généralistes... nous sommes encore au début de ces choses-là." — Représentant de l'Enseignement Supérieur et de la Recherche

      "On voit plutôt des syndicats étudiants, des partis politiques qui instrumentalisent... je n'ai pas eu connaissance de mouvement manifestement piloté ou motivé par un pays étranger." — Représentant de l'Enseignement Supérieur (sur les tensions universitaires récentes)

    1. Rapport d'Information : Financement des Politiques Publiques par la Philanthropie et les Fonds Privés

      Ce document de synthèse analyse les mécanismes de financement des politiques publiques par des organismes privés, sur la base des auditions menées par la commission d'enquête du Sénat.

      Il explore les dynamiques de croissance du secteur, les cadres juridiques, les enjeux de transparence et l'équilibre entre initiative privée et intérêt général.

      Résumé Exécutif

      Le paysage de la philanthropie en France a connu une transformation structurelle profonde depuis deux décennies, passant d'environ 1 000 structures en 2000 à plus de 6 000 aujourd'hui.

      Le volume financier annuel est estimé entre 9 et 10 milliards d'euros, soutenu par un effort fiscal de l'État (défiscalisation) évalué à près de 3,5 milliards d'euros.

      Si la philanthropie est historiquement pionnière dans l'innovation sociale (logement social, éducation), elle fait face aujourd'hui à un "effet ciseau" : l'augmentation des besoins sociaux conjuguée à la contraction des financements publics.

      Cette situation renforce la dépendance des associations envers les fonds privés, soulevant des interrogations sur la transparence financière, les risques d'influence des grands donateurs sur les politiques publiques et la définition même de l'intérêt général.

      --------------------------------------------------------------------------------

      1. État des Lieux et Évolution du Secteur Philanthropique

      Une croissance exponentielle

      Le secteur n'est plus un élément "préhistorique" de l'État social, mais un acteur économique et social majeur.

      • Volume financier : Environ 10 milliards d'euros annuels, répartis entre les dons des particuliers (env. 3 milliards) et le mécénat d'entreprise.

      • Nombre de structures : Une explosion du nombre d'organismes, portée notamment par la création du statut de "fonds de dotation" en 2008.

      • Professionnalisation : On observe une montée en compétences des acteurs associatifs dans la collecte de fonds (fundraising) pour compenser la raréfaction des subventions.

      Tendances sociologiques

      • Concentration des richesses : L'émergence de nouveaux philanthropes est liée à l'accumulation de capital (notamment dans la technologie et la finance).

      • Philanthropie populaire : Paradoxalement, les citoyens les plus modestes (1er et 2e déciles) contribuent proportionnellement plus de leurs revenus que les catégories les plus aisées.

      --------------------------------------------------------------------------------

      2. Cadre Juridique et Typologies d'Acteurs

      Le secteur est structuré autour de plusieurs véhicules juridiques aux exigences de contrôle disparates :

      | Type de Structure | Caractéristiques Clés | Modalités de Contrôle | | --- | --- | --- | | Fondation Reconnue d'Utilité Publique (FRUP) | Statut historique, exige une dotation importante (env. 1,5 M€). | Contrôle a priori strict par le Conseil d'État et présence d'un commissaire du gouvernement. | | Fonds de Dotation | Créé en 2008 pour démocratiser la philanthropie. Se crée avec 15 000 €. | Contrôle a posteriori. Soupçons de "coquilles vides" (36% des fonds restent au plancher de 15k€). | | Fondation Abritée | Structure sans personnalité morale, gérée par une fondation "abritante" (ex: Fondation de France, Fondation de Lille). | Bénéficie de l'expertise et de la rigueur de gestion de la structure mère. | | Fondation Territoriale | Modèle émergent (26-27 en France) visant à regrouper acteurs publics et privés à l'échelle locale. | Flexibilité pour répondre aux besoins spécifiques d'un territoire. |

      --------------------------------------------------------------------------------

      3. Transparence et Mécanismes de Contrôle

      La question centrale de la commission est la "boîte noire" que peut représenter le financement privé pour les citoyens et les élus.

      Les garanties existantes

      • Traçabilité fiscale : Tout don ouvrant droit à une réduction d'impôt génère un reçu fiscal, permettant un contrôle par la Direction Générale des Finances Publiques (DGFIP).

      • Auto-régulation : Le secteur a développé des labels (ex: "Don en Confiance") pour rassurer les donateurs après des scandales historiques comme celui de l'ARC.

      • Évolutions législatives : La "loi séparatisme" de 2021 a renforcé l'obligation pour les fonds de dotation de fournir un rapport d'activité et des comptes annuels.

      Les limites identifiées

      • Manque de données publiques : L'INSEE et l'appareil statistique public sont peu mobilisés.

      Les données proviennent majoritairement d'organismes privés (France Générosité).

      • Invisibilité des dons : Une partie importante de la "générosité de la main à la main" échappe à toute comptabilisation.

      • Opacité des fonds de dotation : Un rapport de l'Inspection Générale met en avant que 36% des dotations sont "inconnues ou non renseignées".

      --------------------------------------------------------------------------------

      4. L'Interaction avec l'Intérêt Général et la Démocratie

      Une définition circulaire de l'intérêt général

      En France, l'intérêt général est largement défini par le code des impôts.

      Cela permet un pluralisme, mais pose question lorsque des fonds privés financent des structures en marge du giron public, comme les écoles privées hors contrat.

      Risques d'influence et de substitution

      • Préférences des plus aisés : La philanthropie permet aux donateurs de flécher l'impôt vers leurs préférences personnelles (culture, éducation spécifique), ce qui peut entrer en tension avec les priorités démocratiques.

      • Désengagement de l'État : Bien que les experts réfutent une substitution totale (la philanthropie ne représente que 5% du budget des associations employeuses), elle devient vitale pour la survie de certains secteurs.

      • Légitimité de l'innovation : La philanthropie est souvent mieux placée que l'État pour tester des solutions innovantes et "essaimer" des projets qui deviendront les politiques publiques de demain.

      --------------------------------------------------------------------------------

      5. Analyse Financière et Fiscale

      Le coût pour l'État

      La défiscalisation est analysée comme une dépense socio-fiscale.

      • Chiffres 2023 : Environ 3,48 milliards d'euros de coût pour l'État au titre des incitations fiscales pour les particuliers et entreprises.

      • Débat sémantique : Certains y voient un "manque à gagner" pour les finances publiques, tandis que les acteurs du secteur le défendent comme un investissement social délégué à la société civile.

      Le recours au prêt

      Une singularité notable a été soulevée : la possibilité pour des fonds de dotation de recourir à l'emprunt (pour l'acquisition de matériel ou d'immobilier), ce qui les rapproche des logiques économiques classiques tout en bénéficiant de cadres fiscaux avantageux.

      --------------------------------------------------------------------------------

      Citations Clés

      « Si la transparence est une boîte noire pour les citoyens comme pour les élus, nous avons un problème de démocratie et de construction de l'intérêt général. » — Rapporteur de la Commission

      « Le privé est un acteur clé de l'intérêt général... le mot privé ne rime pas uniquement avec intérêt personnel, il rime aussi avec intérêt général. » — Damien Baldin

      « La philanthropie, c'est la capacité d'expression de catégories de la population qui ne sont pas situées de manière homogène dans l'espace social... ce sont les préférences des plus aisés qui s'expriment. » — Nicolas Duvoux

      « Être d'intérêt général... à vouloir le casser ou le reformuler, c'est ouvrir une boîte de Pandore. » — Arthur Gautier

      --------------------------------------------------------------------------------

      Recommandations Émergentes de l'Audition

      • Renforcement de la Statistique Publique : Mobiliser l'appareil d'État (INSEE) pour obtenir une cartographie indépendante et exhaustive des flux financiers privés.

      • Éducation à la Philanthropie : Structurer un enseignement dès le primaire sur la générosité et l'engagement citoyen pour éviter la concentration élitaire du secteur.

      • Encadrement des Fonds de Dotation : Améliorer le suivi des structures "planchers" à 15 000 € pour s'assurer qu'elles remplissent effectivement leur mission d'intérêt général.

      • Évaluation Pluraliste : Développer des mesures d'impact social partagées entre financeurs publics et privés pour garantir l'efficacité des dépenses engagées.

    1. Reviewer #2 (Public review):

      Summary:

      This paper attempts to examine how rare, extreme events impact decision-making in rats. The paper used an extensive behavioural study with rats to evaluate how the probability and magnitude of outcomes impact preference. The paper, however, provides limited evidence for the conclusions because the design did not allow for the isolation of the rare, extreme events in choice. There are many confounding factors, including the outcome variance and presence of less-rare, and less-extreme outcome in the same conditions.

      Strengths.

      (1) The major strength of the paper is the significant volume of behavioural data with a reasonable sample size of 20 rats.

      (2) The paper attempts to examine losses with rats (a notoriously tricky problem with non-human animals) by substituting time-outs as a proxy for losses. This allows for mixed gambles that have both gain and loss possible outcomes.

      (3) The paper integrates both a behavioural and a modelling approach to get at the factors that drive decision-making.

      (4) The paper takes seriously the question of what it means for an event to be rare, pushing to less frequent outcomes than usually used with non-human animals.

      Weaknesses:

      (1) The primary issue with this work is that the primary experimental manipulation fails to isolate the rare, extreme events in choice. As I understand the task, in all the conditions with a rare extreme event (e.g., 80 pellets with probability epsilon), there is also a less-rare, less-extreme event (e.g., 12 pellets with probability 5). In addition, the variance differs between the two conditions. So, any impact attributable to the rare, extreme event could be due to the less rare event or due difference in the variance (or other statistical moments, like skew or kurtosis). That the distributions can be shown to be different under specific assumption to value maximizing agents (e.g., with Jensen Gaps and Table 2) is not really relevant to what rats are sensitive and what drive their behaviour. The design here does not support the conclusions. Finally, by deliberately confounding rarity and extremity, the design does not allow for assessing the impact of either aspect on rat behaviour.

      (2) The RL modelling work also fails to show a specific impact of the rare extreme event. As best as I can understand Eq 2, the model provides a free parameter that adds a bonus to the value of either the two options with high-variance gains (A and V in the paper) or to the two options with high-variance losses (F and V in the paper). Or equivalently to the ones with "Jackpots" vs the ones with "Black Swans" (see Point 1 above as to how these different aspects are all confounded in this design). This parameter seems to only depends on whether this option could have possibly yielded the rare, extreme outcome (i.e., based on the generative probability) and was not connected to its actual appearance. [This point is unclear as the text says this, but the rebuttal states otherwise; plus some options never received the REE, see Table S11]. That makes it a free parameter that just bumps up (or down) the probability of selecting a pair of options. That may be due to presence of the REE or the other rare event or just the variance difference. Moreover, in the case of the "black swan" or high-variance loss conditions, this seems very much like a loss aversion parameter, but an additive one instead of a multiplicative one. Is there a theoretical claim here that "extreme losses" need an additive loss-aversion parameter?

      (3) The paper presented the methods and results with lots of neologisms and fairly obscure jargon (e.g., fragility, total REE sensitivity). That might it very hard to decipher exactly what was done and what was found. For example, on p. 4, the use of concave and convex was very hard to decipher; the text even has to repeat itself 3 times (i.e., "to repeat" and "in other words") and is still not clear. It would be much clearer (and probably accurate) to say that the options varied along the variance dimension, separately for gains and losses. Option A was low-variance gains and losses. Option B was low-variance losses and high-variance gains. Option C was high-variance losses and low-variance gains, and Option D was high-variance losses gains. That tells much more clearly what the animals experienced without the reader having to master a set of new terminologies around fragility and robustness, which brings a set of theoretical assumption unnecessarily into the description of the experimental design. Alternatively, if the authors are wary of using the term "variance" because other moments of the distribution also differ, they could use "high-value gains" or "high-value losses" or something else which does not obscure the experimental design with jargon. Again, this goes back to point 1 above, whereby the different options differ on so many dimensions (as is made even more apparent in the rebuttal) that the design cannot isolate the impact of the variables of interest.

      (4) Were the probabilities shuffled or truly random (seem to be fixed sequences, so neither)? What were the experienced probabilities? Given the fixed sequences, these experienced ("ex-post") probabilities, could differ tremendously from the scheduled ("ex ante") probabilities. It's quite possible than an animal never experienced the rare, extreme event for a specific option. From Table S11, that is guaranteed to have happened in that 4 animals only ever experienced the "black swan" outcome once. It's even possible (if they only picked a specific option on the 10th/60th choices by chance), that they only ever experienced that rare extreme event. This point still cannot be known given the information provided, which does not break down outcomes by options. The Supplemental in Table S11 only gives overall numbers but does not indicate what the rats experienced for each choice/option-which is what matters here. A simple table that indicates for each of the 4 options, how often they were selected, and how often the animals experienced each of the 6-8 possible outcome would make it much clearer how closely the experience matched the planned outcomes. In addition, by restricting the rare outcome to either the 10th or 60th activations in a session, these are not random. Did the animals learn this association? The text states that they did not, but no evidence is provided.

      (5) The choice data are generally presented in an overprocessed fashion with a sum and a difference (in both figures and tables). The basic datum (probability/frequency of selecting each of the 4 options) is not provided directly in the main text, even if it can theoretically be inferred from the sum and the difference. New right side of Table S4 is probably the most valuable piece in terms of explaining what rats did and should be highlighted a lot more. Inspection of that table reveals some interesting (and potentially worrying) results. Most notably, the vast majority of responding happens on the "anti-fragile" and "robust" option, often totalling around 90% of all selections, especially amongst the most common blue rats. Alas, those were all those the two options that were deliberately assigned to the two most preferred holes in the training phase (see p. 26). Does this reflect genuine preference for reward distributions or does this reflect a spatial hole bias? The assignment strategy makes this impossible to tell apart.

      (6) There is insufficient detail provided on the inferential statistical tests (e.g., no degrees of freedom or effect sizes), and only limited information on exactly what tests were run and how (bootstrapping, but little detail). Without code or data (only summary information is provided in the supplement), this is difficult to evaluate. In addition, the studies seem not to pre-registered in any way, leaving many research degrees of freedom. Not all studies need to be pre-registered and sometimes discovery of new things requires exploratory work, but preregistration does provide additional safeguards against overemphasizing post-hoc detected patterns-a serious issue in behavioural science. Moreover, this promotes transparency in reporting results and analyses, allowing for a better assessment of the strength of evidence for a claim. For example, here, were any alternative analysis pipelines attempted? Also, there were many sub-groupings of the animals and subsequent comparisons between them which all seemed post-hoc. On what grounds were these divisions made-were other divisions examined as well?

      (7) On p. 12 (Fig 4), there is an attempt to look at the impact of a rare, extreme event by plotting a measure of preference for the 10 trials before/after the rare, extreme event. In the human literature, the main impact of experiencing a rare, extreme event is what is known as the wavy recency effect (See Plonsky et al. 2015 in Psych Review for example, now cited). What this means is that there tends to there tends to be some immediate negative recency (e.g., avoiding a rare gain) followed by positive recency (e.g., chasing the rare gain). Typically, this refers to the specific option that yielded that outcome. First, as the other analyses do, the current analysis combines choice of the option that yielded the rare outcome with choice of other options, so that cannot directly assess the impact of the rare, extreme event on choice. Also, using a 10-trial window would thus obscure any impact of this rare, extreme event. There is mention of the very next trial, but an analysis that looks at the 10-trial time course trial-by-trial could reveal any impact that might be predicted from the human literature.

      (8) As I understood the method (p. 31), the assignment of options to physical locations was not random or counterbalanced, but deliberately biased to have one of the options in the preferred location. This would seem to create a bias towards a particular option and a bias away from the other options, which confounds the preference data in subsequent analyses. Table S4 reinforces this concern where the vast majority of response are clustered in the two most preferred options from training.

      (9) Are delays really losses? This is a big assumption. Magnitude and delay are different aspects of experience, which are not necessarily commensurable and can be manipulated independently. And, for the model, how were these delays transformed into outcomes for the model. Eq 1 skips over that. Is there an assumption of linearity? In addition, I was not wholly clear if the delays meant fewer trials in a session or if the delays merely extended the session and meant longer delays until the next choice period.

      Other points:

      (1) I think the authors still misunderstand the concept of "hot-stove effects". The idea is that the experience of a very bad outcome can lead to avoiding the situation again (i.e., not sampling that option) and can provide the appearance of oversensitivity to that bad outcome. Here, that might be more thought as "black-swan avoidance". Imagine if, to the rat, all options are equal in value, then some initial bad luck in encountering the black swan might make the animal avoid that option, even though with enough experience, then it would have been equal in value.

      (2) I am still not convinced that the Jensen inequalities add to this paper in terms of understanding the rat behaviour. That may be more suited for a different paper about the statistical and mathematical properties of certain generative distributions, but not here given what rats actually choose and experience.

      (3) Providing the data open access is very good. The code, however, should be equally available and not just upon request. Code needs to be available for assessment during peer review and for reproducibility checks. There are substantial enough problems with reproducibility in the field that code availability should be a minimum criterion for publication (see Miske et al., 2026 in Nature for the most recent large-scale evaluation of this problem).

      (4) The paper still somewhat mischaracterizes the literature on rare events, posing it as a series of "exceptions", rather than recognizing that a huge chunk of the literature uses rare events rarer than 10%. Also, there is even existing terminology in that literature for exactly the situation that is being created here-rare treasures (aka jackpots here) and rare disasters (aka Black Swans here).

      (5) Defining the observed behaviour in terms convexity, instead of stating choices more plainly obscures what is done/found. This is especially the case here because convex and concave mean different things when applied to gains/losses in terms of whether or not that option can lead to the REE. The use of the terms obscures rather than clarifies and probably is best left for the discussion (and maybe the intro) when mapping from theoretical distributions to the experiment at hand. In the paper, even the bottom of p.5 seems to incorrectly define "Total Sensitivity" as the combined proportion of selecting convex options in either domain, which does not map how convex is defined in Fig 1B or elsewhere in the text.

      (6). Fig 1C is baffling. Why are probabilities drawn moving away from the origin? The standard scientific plotting convention is for numbers to grow when moving away from the origin. That would be vastly clearer. Also, the color coding is confusing. Green-red maps onto convex-concave, but that would naturally seem to indicate gains vs losses, not convex vs concave. And why are probabilities growing larger in both directions from the origin? Much more sensible to communicate the procedure would likely be a standard plot of magnitude vs probability.

      (7) Discussion: I think the main difference between the human situations discussed and this experiment is that humans have not experienced those rare "black swan" outcomes. Rather, they hear about the disasters that are possible and do not incorporate that information, as discussed in the description-experience literature already cited in this paper (though not in that context).

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors investigate the impact of rare and extreme events on rodents' decisionmaking under risk, in gain and loss contexts. They describe the behavior of 20 rats performing a four-armed bandit task, where probabilistic gains (sugar pellets) and losses (time-out punishments) can - in some arms - incorporate extremely large - but rare - outcomes. They report that most rats are sensitive to rare and extreme outcomes despite their infrequent occurrence, and that this sensitivity is primarily driven by extreme loss events which they try to avoid, rather than extreme gains that they seek to obtain.

      They finally propose a modification of standard reinforcement-learning, which features a specific sensitivity to rare and extreme outcomes and can account for the observed behavior.

      Strengths:

      The manuscript really taps into a surprisingly neglected but very relevant aspect of decision-making: the effect of rare and extreme events (REE). The authors have developed an experimental setup that seemingly allows investigation of this aspect, which is not trivial given the idiosyncratic properties of rare and extreme events.

      The parameters of the experimental setup seem also to be well thought off: basically, in the absence of REE, some options are objectively better than others (because, in expectation, they overall deliver more food, or minimize time-out punishments), but this ordering reverses if REE are taken into account. This allows for a clean test of the integration of REE in the rodent's decision-making model.

      The data is presented and analyzed in a very descriptive but exhaustive and transparent way, down to the description of individual rodent's behavior.

      Weaknesses:

      While the description and analyses of the behavioral patterns are rigorously done under the economic lens of risky decision-making, the authors' interpretation heavily relies on the assumption that rodents have built the correct model of the task during the training. Extensive details are provided about the training procedure, and the observed behavior at the end of the training, but it remains virtually impossible to disambiguate choices due to imperfect learning to choices made due to intrinsic preferences for risk or REE.

      As detailed in Material and Methods, the animals were progressively overtrained following standard behavioral procedures. During this process, they experienced all available options, including both positive and negative REE. We assume that repeated exposure to these REE supported learning, as would be expected for any event occurring throughout such an extended training phase. The rats ultimately displayed an asymmetric pattern of choices: they consistently avoided the Black Swan, indicating that they had learned its negative consequences, yet they did not systematically seek the Jackpot. If their behavior were driven solely by incomplete learning or by an inherent preference for risk or REE, we would expect to see the opposite pattern systematic Jackpot seeking or inconsistent avoidance of the Black Swan.

      By nature, gains (food pellets) and losses (time-out punishments) are somewhat incommensurable so the interpretation of the asymmetry due to outcome valence is also subject to interpretation. There might be some additional subtleties due e.g. satiety that could come from gaining REE (i.e. the delivery of 80 pellets from the Jackpot).

      As described in Material and Methods, we used mouse pellets (20 mg) instead of rat pellets (45 mg) to prevent satiety during Jackpot delivery (80 pellets). We also selected gains (sweet pellets) and losses (delays) that we have successfully used in previous rat decision-making paradigms, such as the rat gambling task (Adams et al., 2017; doi: 10.1523/ENEURO.0094-17) and the loss-chasing task (Breysse et al., 2021; doi: 10.1111/ejn.14895). Notably, if the Jackpot induced satiety, one would expect animals to stop seeking it yet this was not systematically observed. Nonetheless, we added a sentence to the Discussion on page 18 of the manuscript to acknowledge that we cannot fully exclude the possibility that satiety contributed to the lack of systematic Jackpot Seeking.

      In its current form, the paper is quite hard to digest. This is naturally the case with interdisciplinary work (here mixing economists and neurobiologists). But I am afraid that with the current frame, the paper is going to miss its target, in terms of audience.

      We have rewritten entirely and the english was corrected thanks to ChatGPT. We hope that the paper is now easier to digest.

      The proposed model seems somewhat disconnected from the behavioral patterns: while the model suggests an effect of REE at the decision stage (i.e. with specific decision weights for those rare events), this formalism seems at odds with the observation that REE (notably in the loss domain) has an impact of subsequent behavior - (Black Swans tend to reinforce Total Sensitivity to REE) which rather suggests an effect at the learning stage.

      We agree with the referee that this may appear surprising at first glance. However, we would first like to emphasize that the general model allows REE to influence learning—that is, to contribute to the updating of the Q-subvalues. Moreover, even when REE are incorporated only as decision weights, as is the case for most rats, this does not imply that REE are unimportant during learning. In fact, the model assumes that REE are learned once and for all when they first occur during a trial of the corresponding option. Unreported simulation exercises indicate that a more gradual learning of maximal and minimal values would likely yield similar results.

      Second, the Before/After analysis shows that the behavioral response to Black Swans is locally small in terms of both total and one-sided sensitivities. This suggests that such effects are likely too subtle to be captured by this class of models for most rats. We have added this clarification to the revised version (page 17).

      Discussion:

      This study convincingly demonstrates that REEs are processed rather uniquely, which makes sense given their evolutionary relevance. REE has indeed been somewhat neglected in previous research, and this study therefore opens an interesting new front on the fundamental aspects of decision under risk. The authors have devised an original theoretical and empirical framework that will be useful for the community, and the combination of economics analysis and rodent behavior constitutes a thoughtprovoking ground to think about the nature of risk preferences. The interpretation and mechanistic account of these aspects, as well as their generalizability outside the specific context of this study, remain to be strengthened.

      We have modified the discussion to further insist on the translational aspect of the study and its interest for various populations (page 22). We hope that the generalizability is now strengthened.

      Reviewer #2 (Public Review):

      Summary:

      This paper attempts to examine how rare, extreme events impact decision-making in rats. The paper used an extensive behavioural study with rats to evaluate how the probability and magnitude of outcomes impact preference. The paper, however, provides limited evidence for the conclusions because the design did not allow for the isolation of the rare, extreme events in choice. There are many confounding factors, including the outcome variance and presence of less-rare, and less-extreme outcomes in the same conditions.

      Strengths:

      (1) The major strength of the paper is the significant volume of behavioural data with a reasonable sample size of 20 rats.

      (2) The paper attempts to examine losses with rats (a notoriously tricky problem with non-human animals) by substituting time-outs as a proxy for losses. This allows for mixed gambles that have both gain and loss possible outcomes.

      (3) The paper integrates both a behavioural and a modelling approach to get at the factors that drive decision-making.

      (4) The paper takes seriously the question of what it means for an event to be rare, pushing to less frequent outcomes than usually used with non-human animals.

      Weaknesses:

      (1) The primary issue with this work is that the primary experimental manipulation fails to isolate the rare, extreme events in choice. As I understand the task, in all the conditions with a rare extreme event (e.g., 80 pellets with probability epsilon), there is also a less-rare, less-extreme event (e.g., 12 pellets with probability 5). In addition, the variance differs between the two conditions. So, any impact attributable to the rare, extreme event could be due to the less rare event or due difference in the variance. The design does not support the conclusions. Finally, by deliberately confounding rarity and extremity, the design does not allow for assessing the impact of either aspect.

      We agree with the referee that both the REE and the rare (≈10% frequency) but non-extreme outcomes are present in the relevant options. However, the rare but non-extreme reward is not large enough to make the convex option attractive and to shift choice away from the concave option. In other words, unlike REE, these outcomes do not reverse stochastic dominance in our design (as noted in Material and Methods). We have explored modified designs for human subjects in which the rare but non-extreme outcomes are removed. Preliminary results indicate that the behavioral phenotypes observed in rats also emerge in humans under these modified conditions, suggesting that REE are the primary drivers. We have added a statement to the Discussion (page 22) to clarify this point.

      We elaborate further in our response to point (3) below on why analyses based solely on variance are insufficient when dealing with REE. To clarify the role of rare and extreme outcomes in distinguishing convex from concave options, we provide two new columns to Table 2 in the Materials and Methods, in our reply to point (3).

      Finally, although a detailed analysis of rare but non-extreme outcomes lies outside the scope of this paper, the symmetric treatment of extreme and frequent outcomes can be addressed straightforwardly using strong First-Order Stochastic Dominance. Classical decision-theoretic approaches indeed satisfy this property.

      (2) The RL-modelling work also fails to show a specific impact of the rare extreme event. As best as I can understand Eq 2, the model provides a free parameter that adds a bonus to the value of either the two options with high-variance gains (A and V in the paper) or to the two options with high-variance losses (F and V in the paper). This parameter only depends on whether this option could have possibly yielded the rare, extreme outcome (i.e., based on the generative probability) and was not connected to its actual appearance. That makes it a free parameter that just bumps up (or down) the probability of selecting a pair of options. In the case of the "black swan" or high-variance loss conditions, this seems very much like a loss aversion parameter, but an additive one instead of a multiplicative one.

      We agree with the referee that the additional parameters, compared to more standard Q-learning models, specifically capture the fact that some options deliver REE while others do not. In our estimation procedure, these parameters become nonzero as soon as REE are observed for the first time for a given option. Therefore, the first step is to estimate a baseline nested model in which REEs contribute only at the learning stage (i.e., they affect the updating of Q-subvalues), while the additional parameters are constrained to zero. The next step is to compare alternative models against this baseline, allowing REEs to enter through the additional parameters. In this respect, our specification is parsimonious, especially given that very little is known about REEs in computational neuroscience. More structural modeling is certainly a promising direction for future research, and this paper constitutes a first step toward that goal.

      We provide the BIC, in addition to the AIC, to account for the presence of additional parameters in model selection and to ensure that the observed improvement in fit is not merely driven by their inclusion.

      Unlike most of the existing literature, our results extend the notion of loss aversion to extreme losses. The negative decision weight on options yielding the Black Swan can be interpreted as a differential treatment of negative REE, an issue we discuss extensively in the Discussion (page 20).

      (3) The paper presented the methods and results with lots of neologisms and fairly obscure jargon (e.g., fragility, total REE sensitivity). That made it very hard to decipher exactly what was done and what was found. For example, on p. 4, the use of concave and convex was very hard to decipher; the text even has to repeat itself 3 times (i.e., "to repeat" and "in other words") and is still not clear. It would be much clearer (and probably accurate) to say that the options varied along the variance dimension, separately for gains and losses. Option A was low-variance gains and losses. Option B was low-variance losses and high-variance gains. Option C was high-variance losses and low-variance gains, and Option D was high-variance losses and gains. That tells much more clearly what the animals experienced without the reader having to master a set of new terminologies around fragility and robustness, which brings a set of theoretical assumptions unnecessarily into the description of the experimental design. In terms of results, "Black Swan" avoidance is more simply known as risk aversion for losses.

      Because our experimental design focuses on REE, outcomes cannot be summarized only by their variance. This is well known from the large literature on so-called fat-tailed statistical distributions. Unlike the Normal distribution that is entirely characterized by its expected value and variance, fat-tailed distributions have nonzero kurtosis. This implies that a fat-tailed distribution (e.g. exponential) with the same expected value and variance as the Normal differs importantly by possessing extreme values that are much more likely in terms of frequency. To illustrate, if the distribution of pellets was assumed to be Normal with expected value set at 3.89 and variance set at 9.37 as for the convex option, the probability of getting 80 pellets would be about 2.10<sup>-16</sup>, practically zero. In contrast, this probability is smaller than, but close to 1% in our design.

      In Material and Methods, we clearly explain how our novel approach in terms of convexity relates to the moments of the reward distributions, including but not limited to the variance. To clarify further, we provide two new tables (Author response table 2 and Author response table 3) to be compared to Table 2 of the manuscript in which we report the first four moments (mean, standard deviation, skewness and kurtosis) of the full concave and convex gain distributions, reproduced for convenience

      Author response table 1.

      In Author response table 2 we report the first four moments when REE are truncated. Comparing convex and concave gains shows that the convex option has a smaller but still close mean compared to the concave option. In contrast, the former has larger variance, skewness and kurtosis compared to the latter. Therefore, interpreting choosing the convex option as reflecting “preference” for variance is at best incomplete.

      Author response table 2.

      First four moments of concave and convex gains when REE are removed

      Author response table 2 further shows that REE alone goes a long way towards explaining the differences between convex and concave options in terms of the first four moments: removing the rare and extreme value results in the concave option having now a larger mean, while the convex option still has larger variance, skewness, and kurtosis but by a smaller margin.

      In Author response table 3 we report the first four moments when both RE and REE are truncated, which shows that the convex and concave options differ only with respect to their mean (which is here also larger for concave).

      Author response table 3.

      First four moments of concave and convex gains when both RE and REE are removed

      In addition, our focus on REE implies that we go beyond mean-variance preferences that apply mostly to Gaussian distributions. It is not clear theoretically what type of utility functions would reflect preferences that combine a taste for variance, skewness and kurtosis, even though all those moments affect expected utility. See for example Phelps, C.E. “A user’s guide to economic utility functions”. J Risk Uncertain 69, 235–280 (2024) for a recent overview (on page 242, Phelps states that “In situations where risk is not normally distributed, it is ill-advised to ignore statistical parameters beyond variance, unless the deviations from normality are relatively small”).

      More importantly, our proposed measure of the convexity of the reward distributions, the Jensen gap, further reveals how even restricting the analysis to the first four moments is incomplete in the sense that it fails to characterize the difference between options: the fifth moment of the concave contributes more the Jensen gap than even kurtosis, while one needs to look at much higher moments to find significant contributions to the Jensen gap for the convex option. In that sense, there is no reason to restrict the analysis to variance, and even to skewness and kurtosis, to compare options, in general and in our particular setup as well. Note that introducing REE would result in convex distributions even in simplified designs, e.g. with 3-value support. Studying REE implies the need to look beyond variance, and our proposal is to use the Jensen gap as a measure of convexity. In the Material and Methods section of the paper, we did not develop an in depth analysis of Jensen gap so as to spare the reader confronted with an already rather technical paper.

      We thank the referee for raising the issue of whether variance is a simpler explanation of our results. To keep the main text as short as possible, we chose to refrain from adding technical complexity. We hope we made clear in our reply that the analysis cannot be restricted to variance when studying REE. We believe that Jensen gap is a useful notion in this regard. As our replies will be made publicly available, we chose not to integrate the above discussion in the main text.

      (4) Were the probabilities shuffled or truly random (seem to be fixed sequences, so neither)? What were the experienced probabilities? Given the fixed sequences, these experienced ("ex-post") probabilities, could differ tremendously from the scheduled ("ex ante") probabilities. It's quite possible that an animal never experienced the rare, extreme event for a specific option. It's even possible (if they only picked it on the 10th/60th choices by chance), that they only ever experienced that rare extreme event. This cannot be known given the information provided. The Supplemental info on p.55 only gives gross overall numbers but does not indicate what the rats experienced for each choice/option-which is what matters here. A simple table that indicates for each of the 4 options, how often they were selected, and how often the animals experienced each of the 6-8 possible outcome would make it much clearer how closely the experience matched the planned outcomes. In addition, by restricting the rare outcome to either the 10th or 60th activations in a session, these are not random. Did the animals learn this association?

      Probabilities are not random and a limited number of fixed sequences has been used, as stated in Material and Methods. We have chosen sequences that satisfy our assumptions about ex-post stochastic dominance reversal of convex over concave options when REE are added. We have added in Table S4 the choice frequencies for all four options. If the animals had learnt the 10th and 60th activation, they would exhibit a strategy in their choice that would tend to be more optimized than what is observed. For example, the options offering the possibility to obtain the Jackpot are not optimal in terms of gains for the frequent events, therefore the animals should tend to select these options only around the 10th and 60th choice. Most of their other choices should favor the options delivering the larger gains in the frequent domain. This is not what is observed. We have added this important point in the discussion (page 18).

      (5) The choice data are only presented in an overprocessed fashion with a sum and a difference (in both figures and tables). The basic datum (probability/frequency of selecting each of the 4 options) is not provided directly, even if it can theoretically be inferred from the sum and the difference. To understand what the rats actually do, we first need to see how often they select each option, without these transformations.

      As described in Material and Methods, the 4 options are combinations of 2 convex and concave sub-options for gains and losses, which is why our analysis of the behavioral data focuses on convexityrelated total and one-sided sensitivities to REE. The third dimension needed to fully characterize rats’ behavior is simply 1−ff<sub>FF</sub>, the fraction of non-Fragile choices. In addition, we also provide in Table S4 of the Supplementary Material an alternative interpretation in terms of Black Swan Avoidance and Jackpot Seeking. We have added in Table S4 the choice frequencies for all four options. Finally, all the raw data will be made available with open access and no access codes.

      (6) There is insufficient detail provided on the inferential statistical tests (e.g., no degrees of freedom or effect sizes), and only limited information on exactly what tests were run and how (bootstrapping, but little detail). Without code or data (only summary information is provided in the supplement), this is difficult to evaluate. In addition, the studies seem not to be pre-registered in any way, leaving many researchers with degrees of freedom. Were any alternative analysis pipelines attempted? Similarly, there were many sub-groupings of the animals, and then comparisons between them - were these post-hoc?

      We understand the concern of the referee for pre-registration of the referee, as an epistemic safeguard to make empirical claims more falsifiable, more transparent, and less dependent on post hoc rationalization. But the contemporary push for preregistration is often presented as an “epistemic improvement,” but in practice it functions largely as a norm of moral regulation, not a scientific necessity. The rhetoric is moralistic: preregistered research is “clean,” “transparent,” “credible,” while non-preregistered work is viewed with suspicion—even when the methodology is sound. This language is not epistemologically neutral; it enforces ought to be done, irrespective of the diversity of legitimate scientific practices.

      From a philosophy of science perspective, this is historically and conceptually problematic. Scientific progress has never followed a uniform, rule-based method. As e.g. Feyerabend has argued, major discoveries have emerged precisely because researchers were not bound by predetermined plans: they followed anomalies, improvised, reinterpreted data, and revised methods and hypotheses in light of new evidence — practices that a rigid preregistration ethos can suppress and that are not aligned with how genuine discovery often occurs.

      Even from a statistical standpoint, preregistration is far from a panacea. It reduces some degrees of freedom (mainly in confirmatory statistics), but it does not eliminate flexibility; researchers can still choose models, transformations, exclusion rules, stopping rules, etc. And more importantly: reducing flexibility is not inherently epistemically virtuous. Flexibility is often necessary to understand data properly—especially in new paradigms or first-of-their-kind experiments, which is the case for this study. Science needs exploration, opportunism, and theoretical plasticity. Preregistration is compatible with these only if it is treated as one optional tool among many—not as a universal evaluative standard.

      As the referee pointed out, this study “taps into a surprisingly neglected but very relevant aspect of decision-making.” Our work is therefore mainly exploratory: the experimental paradigm reveals new behavioral patterns in how rats cope with rare and extreme events, and much of our analysis is necessarily descriptive. We conduct formal inference only where it is methodologically appropriate — the short-term behavioral response to rare events (for which we now provide more details in the Material & methods section p.35) and the estimation of augmented Q-learning models, which follow a standard econometric approach (documented in the Material & Method section–see also our response to recommendation 4). These inferential results support the descriptive patterns that motivate this new line of research.

      (7) On p. 17, there is an attempt to look at the impact of a rare, extreme event by plotting a measure of preference for the 10 trials before/after the rare, extreme event. In the human literature, the main impact of experiencing a rare, extreme event is what is known as the wavy recency effect (See Plonsky et al. 2015 in Psych Review for example). What this means is that there tends to be some immediate negative recency (e.g., avoiding a rare gain) followed by positive recency (e.g., chasing the rare gain). Using a 10-trial window would thus obscure any impact of this rare, extreme event. An analysis that looks at a time course trial-by-trial could reveal any impact.

      We thank the referee for drawing our attention to the wavy recency effect documented in human experiments. We have added the corresponding reference in the Discussion (page 20). Regarding rats, the Before/After analysis reported in the paper suggests that there is no sizeable immediate recency effect for Jackpots. Even for Black Swans, the immediate recency effect we report remains modest when using a 10-trial window, and the analysis of the choice immediately following a REE does not show evidence of immediate negative recency. This casts doubt on the presence of such an effect in rats.

      (8) As I understood the method (p. 31), the assignment of options to physical locations was not random or counterbalanced, but deliberately biased to have one of the options in the preferred location. This would seem to create a bias towards a particular option and a bias away from the other options, which confounds the preference data in subsequent analyses.

      We agree that the design incorporated an intentional bias toward the anti-fragile option as a proof of concept. Nevertheless, Figure 8 demonstrates that animals substantially altered their choices between training and final testing, with a median change of approximately 35% across sessions. This indicates that behavior was driven by the structure of possible outcomes rather than by a stereotyped location-based preference.

      (9) Are delays really losses? This is a big assumption. Magnitude and delay are different aspects of experience, which are not necessarily commensurable and can be manipulated independently. And, for the model, how were these delays transformed into outcomes for the model? Eq 1 skips over that. Is there an assumption of linearity? In addition, I was not wholly clear if the delays meant fewer trials in a session or if the delays merely extended the session and meant longer delays until the next choice period.

      Consistent with established rodent decision-making paradigms (Adams et al., 2017 doi: 10.1523/ENEURO.0094-17; Breysse et al., 2021 doi: 10.1111/ejn.14895), we employed sweet pellets as gains and imposed delays as losses. Delays are operationalized as losses because they preclude the animal from engaging in reward-generating behavior; thus, increasing the delay duration proportionally increases the magnitude of the opportunity cost.

      (10) The paper does not sufficiently accurately represent the existing literature on human risky decision-making (with and without rare events). Here are a few examples of misrepresented and/or missing literature:

      Most studies on decision-making do not only rely on p > 10% (as per p. 2). Maybe that is true with animals, but not a fair statement generally. Some do, and some don't. There is substantial literature looking at rarer events in both descriptions (most famously with Kahneman & Tversky's work), but also in experience (which is alluded to in reference 19). That reference is not only about the situation when choices are not repeated (e.g. the sampling paradigm), but also partial feedback and full-feedback situations.

      We have corrected that statement in the main text (page 3) and we thank the referee for pointing this out.

      The literature on learning from rewarding experiences in humans is obliquely referenced but not really incorporated. In short, there are two main findings - firstly people underweight rare events in experience; second, people overweight extreme outcomes in experience (both contrary to description). Some related papers are cited, but their content is not used or incorporated into the logic of the manuscript.

      One recent study systematically examined rarity and extremity in human risky decision-making, which seems very relevant here: Mason et al. (2024). Rare and extreme outcomes in risky choice. Psychonomic Bulletin & Review, 31, 1301-1308.

      There is a fair bit of research on the human perception of the risk of rare events (including from experience) and important events like climate. One notable paper is Newell et al (2015) in Nature Climate Change.

      We agree with the referee that the related literature on REE in animal Decision Making is scant and that it is more developed in humans. We thank the referee for pointing at Mason et al. (2024), who clarify where the literature on humans stands and why combining rarity and extremity, as we also do, is important and highly relevant. We have added a new statement and references in the Introduction and Discussion (pages 3, 20, 22).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) As said above, I think the manuscript would really benefit from a rewriting, to replace some technical terms with more readable ones, and maybe rebalance the focus from the current focus on the framework (heavily loaded with economics concepts, which will be hard to digest for the eLife readership) to a higher weight on information that is critical to understand and interpret the behavior (e.g. information about training & training behavior, etc.).

      We have revised the entire manuscript to improve readability and have clarified in the main text: (1) why convexity of exposures to REE could, beyond variance, be useful for experiments in other settings that our own; (2) why the associated notion of antifragility may be applicable to other settings and therefore of broader interest; (3) what was done in the training sessions compared to the final sessions.

      (2) From Figure 8, it seems that rodent behavior is more clustered after the training (i.e. before the sessions) than after the sessions. Could that be a sign of imperfect learning?

      Figure 8 mostly suggests that there is some flexibility in the choices made and that the intended initial bias towards the antifragile choice in the design of the task could be over ridden by the rats.

      (3) The modelling section seems incomplete. I think the authors want to tease apart where REE enters the model and should propose an alternative where REE affects the learning rather than the decision.

      In fact, the general model allows REE to have an effect at the learning stage only (i.e. to contribute to the updating of the Q subvalues), when the specific decision weights attached to options delivering REE are both zero. However, our analysis shows that such a model is rejected by the behavioral data for all rats. We have clarified this point in the revised version.

      (4) Also, parameter and model recovery exercises seem mandatory (Wilson & Collins, 2019).

      We thank the referee for highlighting this valuable reference in computational modeling, particularly in the context of model identification and estimation in computational biology. In the present research, we adopted an econometric perspective on model identification—especially with regard to the integration of Q-values for gains and losses. The softmax choice function is formally equivalent to a multinomial logit model, and as is well known in econometrics, identification in such models presents non-trivial challenges. The standard approach in classical Q-learning is to multiply the Q-value by an inverse temperature parameter (also known as a precision parameter in random utility models). When extending the model to include separate Q-values for gains and losses, specifying the model in an identifiable way becomes more complex.

      To address this issue, we considered several alternative model specifications and conducted grid-based estimation of starting parameter values. This approach allowed us to examine the shape of the loglikelihood function and assess whether the parameters are globally identified, rather than only identifiable up to a linear combination. We found that the most parsimonious and empirically identified specification in our experimental paradigm is one in which Q-values for gains and losses are summed, each weighted by distinct decision weights (see our Equation 2 in the paper).

      The inclusion of decision weights for REE for each option (Equation 2) is then structurally equivalent to introducing constant terms in a logit model. The identification of these parameters follows standard econometric results on discrete choice models (e.g., Davidson & MacKinnon, 2003): since we model choices among four options, three free parameters can be estimated, leaving one degree of freedom in the specification. As mentioned in the "Modelling and Statistical Analysis" section, we further guarded against the presence of local maxima by applying a two-step estimation procedure, combining two optimization algorithms with multiple sets of starting values for the baseline model (i.e., the model without decision weights for REE). We also tested the addition of a global optimization method— simulated annealing—but found that it did not significantly improve upon our two-step procedure. This is not surprising, as our preliminary investigation of model identification, based on grid searches over starting parameter values, confirmed that all parameters were identified in our simple specification. Our intuition is that simulated annealing may yield different estimates than gradientbased methods primarily in cases where the model is not theoretically identified—suggesting that the need for such global optimization techniques can be indicative of underlying identification issues in Qlearning models.

      Regarding model comparison, we have used penalized information criteria to account for additional parameters. Although we do not report confusion or inversion matrices for our nested models, we verified that the estimated models replicate observed behaviors across all phenotypes, as shown in the main text (see bottom left panel of Figure 5 for the Total and One-Sided sensitivities). Most importantly, we conducted 100 additional simulations of 40 artificial sessions for each phenotype using the “winning” models and the median fitted parameters. These simulated rats—playing the task 100 times over 40 sessions—offer strong evidence that the selected models are valid: they quantitatively capture the behavior of all phenotypes in terms of our key metrics, Total and One-Sided sensitivities (see bottom right panel of Figure 5).

      Taken together, this methodical econometric approach to model specification and estimation gives us strong confidence in the identification and robustness of our model. Overall, while Wilson & Collins (2019) provide an interesting framework for model estimation in computational biology, we believe that a more formal theoretical analysis of model identification in Q-learning models would be a valuable addition to the field—though it lies beyond the scope of the present work. In our view, computational biologists should complement simulation-based validation and empirical fit with formal methods for assessing theoretical identifiability, particularly when estimating complex choice models.

      Davidson, R. and J.G. MacKinnon (2003) Econometric Theory and Methods. Oxford University Press (New York).

      Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547

      Reviewer #2 (Recommendations For The Authors):

      (1) The paper confuses risk sensitivity and exploration in the opening lines. These are not the same.

      What we have in mind here is that uncertainty about outcomes is one of the main drivers of exploration, in the sense that there would be no need to explore in a counterfactual world with deterministic gains and losses. We have modified the opening lines of the paper to better reflect this dimension (page 2).

      (2) p. 9. "awfully long" is an unnecessary descriptor. Descriptions of methods should be more factual.

      The manuscript has been entirely rewritten.

      (3) p. 13. Most points lie on the left of the square (not right?).

      We thank the referee for pointing at this typo, that is now corrected in the text (page 8).

      (4) p. 13. Last line. "obviously" is patronizing to the readers.

      The manuscript has been entirely modified to address related points.

      (5) p. 23. The avoidance of black swans by not choosing that option sounds like a hot-stove effect (see Denrell & March, 2001). Is this evidenced here?

      To the best of our knowledge, the statement that “people tend to avoid activities they have had a negative experience of, resulting in a negativity bias” (from Jerker Denrell’s website) does not explicitly concern REE. Instead, it appears to refer broadly to reinforcement learning mechanisms driven by negative outcomes, irrespective of their magnitude or frequency. In our task, animals encounter both negative rare events (RE) and negative rare and extreme events (REE; Black Swans). Notably, the task design does not allow rats to completely avoid negative RE unless they cease performing the task altogether—a pattern typically seen in paradigms involving aversive stimuli such as electric foot shocks. The fact that all 20 rats maintained stable performance across the 41 sessions provides evidence against a pronounced hot-stove effect. This point has been incorporated into the revised discussion (page 20).

      (6) "menus" is an odd term. Better described as reward schedules?

      “Menu” has been replaced by “option” in the main text.

      (7) Why are they 20-minute sessions? I thought it was 120 trials per session? And 41 sessions? Or was this only in training?

      Each session ended after 20 minutes had elapsed, which led to approximately 120 trials (but not systematically). The choice of 20 minutes was made in order to limit the number of trials to prevent satiety. The total number of sessions ran with all 20 animals for the final testing was 41, an odd number but there was no justification to remove one session from the analysis. The training was much longer and is not included in the 41 sessions.

      (8) Really not clear why these Jensen inequalities were relevant or even calculated for these options? How is it relevant to what animals chose or experienced? They seem to be based on the generative probabilities for different options, which is not what happened in reality.

      We propose the Jensen gap as a general measure of convexity that relates to all moments of the probability distribution, as described in more detail in our answer to point (3) above. As such, we think it is a characterization of options with stochastic outcomes that could prove useful to other experimenters in alternative settings beyond our own.

      (9) Only some summary data in supplemental materials. No open data or code for recreating the experiment or analyzing the data.

      The data is available on Github (see page 38) and the code will be available upon request.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) The biological and/or mathematical meaning of the Soma and Neurite Density Imaging (SANDI) indices (apparent soma density, apparent soma size, extracellular water signal fraction, extracellular diffusivity, apparent neurite density, fractional anisotropy, and mean diffusivity) should be briefly introduced for those less familiar with this novel technique.

      Further explanations about the biological and mathematical meaning of the SANDI indices were added to the introduction on page 6.

      (2) The study implements a novel biophysical diffusion model that extends up-to-date methodologies and presents a significant potential for quantifying neurodegenerative processes of the grey matter of the human brain in vivo. The authors comment on the usefulness of this technique in other pathologies, but they exemplify it only with multiple sclerosis. Further development of this, building evidence, should be provided.

      Clinical applications of SANDI have primarily focused on MS. However, since preparation of the manuscript, one study has been published reporting reductions in apparent soma density and white and grey matter specific differences in apparent soma size in amyotrophic lateral sclerosis (ALS) (Zeng et al., Eur J Radiol 2025, 10.1016/j.ejrad.2025.111981). These findings accord with the loss of motor neurons and glial responses in ALS. We have added this study to the introduction of SANDI on page 7.

      (3) Why are the basal ganglia compared against thalami? The rationale of this decision is missing.

      The thalami were selected as control regions based on the established trajectory of neurodegeneration in HD, which begins with early loss of medium spiny neurons in the striatum and later extends to surrounding structures, including the putamen and thalamus. Given that most participants in our study were at early disease stages, we assumed the thalami would remain relatively unaffected in this sample. This explanation has been added to the introduction on page 7.

      (4) The use of bullet points is unusual for a scientific paper format.

      Bullet points have been removed throughout the manuscript.

      (5) The authors mention that they eroded the boundaries of the subcortical masks. Providing the details and parameters of this erosion would be beneficial.

      Details of the default parameters of the FSL erode function that was used have been added to the method section on page 13.

      (6) In the conclusion, the authors state that their results will bridge the gap between histopathological findings and in vivo imaging, but it would be helpful if they could briefly explain how they imagine such a bridge (e.g., which kind of comparisons or correlations) and whether there exists any literature in this regard so far.

      We have added the following brief explanation to the conclusion on page 26: “Although conventional MRI lacks the resolution to directly capture histopathology, advanced biophysical models such as SANDI may help bridge this gap by providing biologically interpretable parameters that reflect tissue composition and capture histopathological changes in vivo.”

      (7) The scale is missing in Figure 3.

      The scale has been added to Figure 3.

      (8) In general, the work would benefit from a better organization and potentially a smaller number of figures and tables.

      The manuscript has been re-edited to improve the readability and organization throughout and the number of figures and tables were reduced by moving some of them to the Supplementary Material (old Tables 2 and 5 are now Supplementary Tables 2 and 3, old Figure 3 is now Supplementary Figure 1).

      Reviewer #2:

      Certain aspects of the study would benefit from clarification:

      (1) Scanner and acquisition consistency: While HD data are from the WAND study, it is not clear whether controls were scanned on the same scanner or protocol. Given the use of model-derived metrics (especially SANDI), differences in scanner or acquisition could introduce confounds. From the text, the HD participants are explicitly said to come from the WAND study (a longitudinal HD cohort). On the other hand, while the HC participants are described as age-matched controls, the paper does not clearly state whether they were scanned in the same study (i.e., WAND), on the same scanner, or with the same acquisition protocol. This ambiguity is potentially problematic, especially since they use model-derived diffusion metrics that can be very sensitive to scanner hardware, gradient strengths, and protocol settings. If the WAND HD data were acquired on a specific scanner (e.g., 3T Connectom) and the HCs were not, then differences in SANDI/DTI metrics might reflect scanner bias, not disease pathology. This is particularly critical in SANDI, which is sensitive to high b-values and SNR. It would strengthen the manuscript to explicitly state whether the HD and control data were acquired using the same scanner model, sequence, and protocol, and ideally at the same site. If this were not the case, the authors should include this as a limitation and discuss any harmonization strategies applied (e.g., ComBat, covariate modeling, etc).

      For harmonization and comparison purposes, HD and control data were acquired using the same strong gradient (300mT/m) 3T Connectom MRI system at CUBRIC with the same acquisition protocols and sequences. It should also be noted that the Connectom scanner has not had any software upgrades that could introduce scanner biases in data acquired at different time points. This is now made explicit on page 8 by stating that all MRI data for all participants were acquired on the same MRI system using the same acquisition protocols, and on page 10 by stating that all HD and HC MRI data included in our analyses were acquired on the same 3T Siemens Connectom scanner at CUBRIC using the same acquisition protocols described in this section.

      Also, although it offers novel and biologically informative markers, widespread clinical translation still faces hurdles. For instance, the study used a 3T Connectom scanner (300mT/m gradients), which is not widely available. Reproduction of these results in standard 3T clinical scanners would be a great addition, in scenarios with lower resolution, less precise parameter recovery, and longer scans if SNR needs to be maintained.

      We agree that for clinical adoption it is important to demonstrate that HD-related SANDI differences can also be detected on clinical MRI systems and do not require ultra-strong gradient imaging. While we have not collected such data in people with HD, we have demonstrated the feasibility of modelling SANDI metrics from multi-shell diffusion-weighted imaging acquired on a clinical 3T MRI (maximum b-value of 6,000 s/mm<sup>2</sup>) in healthy adults and people with MS (Schiavi et al 2023, https://doi.org/10.1002/hbm.26416). Furthermore, Zeng et al 2025, reported significant differences in SANDI metrics acquired on a 3T MRI Prisma system between individuals with ALS and healthy controls (maximum b-value of 3,000 s/mm<sup>2</sup>).

      Two additional studies demonstrated that SANDI could be implemented and microstructural differences could be detected in MS using 3T scanners with standard gradient strength (Barakovic et al., 2024; Margoni et al., 2023). Collectively, these findings indicate that SANDI can be applied on clinical scanners, particularly as clinical systems move toward stronger gradient capabilities such as Siemens Magnetom Cima.X. These explanations can be found under the clinical implication section in the Discussion on page 25.

      (2) Limitations of HD-ISS staging resolution and group separation:

      The use of HD-ISS staging to anchor progression analyses is conceptually appropriate, but, in practice, the sample is quite limited.

      (a) Only 26-27 out of 56 gene-positive participants could be assigned HD-ISS stages, and none were classified into stages 0 or 4. This restricts the interpretation of progression to a narrow clinical window (mostly stages 1-3) and excludes over 50% of the cohort.

      (b) Furthermore, visual inspection of the scatter plots (e.g., Figures 3 and 4) reveals substantial overlap between stages 1 and 2, particularly in CAP100 and Q-Motor measures. This suggests that the separation between early disease stages may not be robust in this dataset, potentially due to limited power or phenotypic variability.

      (c) The above may lead to claims based on progression across HD-ISS stages to be overinterpreted or underpowered

      Despite this, the paper treats the staging as a reliable stratification for group comparisons. To improve clarity and transparency, I would recommend that the authors:

      (a) Acknowledge that over 50% of the HD cohort could not be classified.

      (b) Discuss whether those excluded differed from those included in key metrics.

      (c) Explicitly comment on the substantial overlap between stages 1 and 2, and limit claims about progression unless such separation is statistically supported.

      (d) Avoid overinterpreting staging-related effects without statistical support for group separability

      Re a-d) We have added to the study limitations on pages 23 ff that only 54% (30 out of 56) HD participants could be HD-ISS classified due to missing data, and provide an overview of demographic and clinical information for HD-ISS stages and unclassified individuals in Supplementary Table 1. We acknowledge that the combined groups (HD-ISS 0-1 versus HD-ISS 23) for exploratory group analyses did not represent discrete disease stages and that there was some overlap in imaging and behavioural features between them as illustrated in Figures 3, 4, and 7. We state explicitly that these exploratory findings should be interpreted with caution and require replication in larger, prospective cohorts before SANDI metrics can be considered as potential markers of disease progression.

      (3) Clarify regression strategy and interpretational limits of SANDI-derived regressors: While the hierarchical regression strategy is broadly appropriate, several aspects would benefit from clarification to improve both interpretability and robustness of the findings. For example:

      (a) Why were only a subset of SANDI parameters (fis and De) considered in the HC models (Figure 6), while additional metrics (fec and rs) were tested in HD models (Figures 7-8)? Including the same variables across groups could aid comparability?

      The same SANDI indices were included in regression models for HD and HC groups, Figure 7-8 report only significant predictors. This has been clarified in the figure legend and on pages 14 of the manuscript.

      (b) Were any checks for multicollinearity (e.g., variance inflation factors) conducted? Given known interdependencies among some SANDI parameters, I wonder whether some of the reported regression coefficients may be unstable or difficult to interpret.

      Cross-correlation matrices between all imaging metrics for HD, HC, and total samples have been included to Supplementary materials Figure 3.

      To improve transparency and interpretability, I suggest actions such as:

      (a) SANDI metrics included in the models differ between HC and HD groups, reducing comparability. Consider using consistent full models across ROIs for comparison purposes, even if some predictors are not significant.

      (b) Report the correlation structure between SANDI metrics within each group to assess multicollinearity (The potential impact of multicollinearity (e.g., between fis and rs) is not discussed)

      (c) Explicitly acknowledge the limitations imposed by parameter degeneracy in the SANDI model and clarify how the authors ensured the biological interpretability of regression outputs in this context - Beta coefficients could reflect model instability or parameter degeneracy rather than true biological effects.

      (a) The same SANDI metrics and age were included in the first regression models for HD and HC data. The first models only differed by the inclusion of TFC as estimate of disease burden for the HD data. HD and HC participants were not included in a single regression model, as our aim was not to perform formal between-group inference on regression coefficients. Instead, models were fitted separately to explore within-group associations and to descriptively compare patterns of relationships across groups. This approach avoids imposing identical model structures across groups that may differ in variance structure, disease burden, and biological coupling between SANDI metrics. We have clarified these points on page 13/14.

      (b) We agree that multicollinearity is an important consideration when interpreting regression coefficients derived from microstructural models. To address this, we examined pairwise Spearman correlations between all imaging (SANDI, DTI, volume) metrics (averaged across ROIs), shown in the revised Supplementary Figure 2. As can be seen in the healthy control data, SANDI indices of apparent soma and neurite fractions showed a strong inverse correlation (rho = -0.92) and did not correlate with soma radius (rho = 0.1). All SANDI indices correlated only weakly with FA and volume and moderately with MD. This correlation pattern suggests that apparent soma density and radius capture distinct information about grey matter microstructure that differs from neurite fraction and is not captured by FA or volume. We note in HD participants a negative correlation between soma radius and fraction, and stronger correlations between SANDI metrics and volume measures. We would argue that these reflect disease-related reorganization of micro- and macro-structural relationships rather than uniform collinearity across groups. This information has been added to the Methods, Results and Discussion sections on pages 13, 19, and 21, 23ff.

      (c) We agree that regression coefficients derived from interdependent microstructural parameters should be interpreted with caution, as they may reflect shared variance or partial parameter degeneracy rather than fully independent biological effects. For this reason, we do not interpret individual beta coefficients in isolation. Instead, our conclusions focus on the consistency and directionality of associations across regions and metrics, and on the overall feasibility and sensitivity of SANDI to detect biologically meaningful variation in HD. The observed correlation structure (Supplementary Figure 2) provides important context for these interpretations and supports a multivariate, pattern-based rather than univariate reading of the results. These points have been added to the Discussion on pages 23 ff. Please also refer to our response to point (5) below.

      (4) Preprocessing order:

      Gibbs ringing correction was applied after TOPUP and EDDY, which deviates from the commonly recommended order in diffusion MRI preprocessing. Since Gibbs artifacts are introduced by kspace truncation and affect the spatial domain, it is typically advised to perform Gibbs correction prior to geometric corrections like TOPUP and EDDY. This avoids potential blurring or propagation of ringing artifacts during resampling. Could the authors clarify the rationale for this ordering, and whether an early application of Gibbs correction was tested?

      We agree that the application of Gibbs ringing correction prior to TOPUP and EDDY correction deviates from the commonly recommended order in diffusion MRI preprocessing. However, as some of the data included in this paper were preprocessed before this consensus was agreed in the literature, we kept the preprocessing order consistent for all datasets for harmonization and comparison purposes. We have since changed the order for subsequent preprocessing of the HDDRUM data and have found comparable FA maps for data processed with Gibbs ringing correction before and after TOPUP and EDDY correction.

      (5) Expand on SANDI model assumptions:

      SANDI is presented as being used for the very first time in this problem. However, a vague explanation is given: "using all the default settings". Given the novelty of applying SANDI in a clinical HD context, the manuscript would benefit from a discussion of the model's key assumptions and limitations. For instance:

      (a) The potential degeneracy between fis and rs in the absence of protocol features (e.g., long Δ or high b) that can disambiguate them.

      (b) Whether a dot compartment was included, and the implications of excluding it for the interpretation of rs or fis.

      (c) The lack of exchange modeling or fixed stick diffusivity, and how these may bias compartment estimates (particularly in diseased or aging tissue).

      (d) Any steps taken to verify robustness or identifiability (e.g., simulations, synthetic fitting). These issues are not flaws in the method, but they do affect how confident we can be in interpreting fis/rs as markers of neuron loss or glial hypertrophy, especially given the subtle group differences and the potential for biological heterogeneity in HD. Even a brief acknowledgment would strengthen the manuscript and provide useful context to readers less familiar with multicompartment modeling.

      We thank the reviewer for this constructive suggestion and fully agree that, because this is the first application of SANDI in our clinical HD cohort, the manuscript should more explicitly describe the model assumptions, potential identifiability limitations under our protocol, and the implications for biological interpretation.

      We have revised the Methods (pages 11-12) and Discussion (page 24) to (i) specify the exact SANDI implementation used (the SANDI MATLAB toolbox, available at: https://github.com/palombom/SANDI-Matlab-Toolbox-Latest-Release), (ii) describe which components are included in the default formulation and the key modelling assumptions, and (iii) add a dedicated “Limitations and interpretability” paragraph addressing points (a–d) below. We also avoid the previous shorthand “default settings” and provide a clear description of the fitting setup.

      “The SANDI model [Palombo M. et al, NeuroImage 2020] assumes three compartments, namely intra-neurite signal modelled as diffusion inside impermeable randomly oriented sticks, intra-soma signal modelled as restricted diffusion inside spheres, and extra-cellular signal modelled as Gaussian isotropic diffusion. The direction-averaged (or spherical mean) normalized diffusion signal has thus the following expression:

      S(b) = f<sub>is</sub>A<sub>sphere</sub> (b, r<sub>s</sub>, D<sub>is</sub>) + f<sub>in</sub>A<sub>stick</sub> (b, D<sub>in</sub>) + f<sub>ec</sub>A <sub>ball</sub> (b, D<sub>e</sub>)

      where f<sub>in</sub> + f<sub>is</sub>+ f<sub>ec</sub> = 1; A<sub>stick</sub> and A<sub>sphere</sub> are the normalized, directionally-averaged (or spherical mean) signals for restricted diffusion within neurites and soma, respectively and A<sub>ball</sub> is the normalized, directionally-averaged (or spherical mean) signal of the extra-cellular space. The specific expressions are given in [Palombo M. et al. NeuroImage 2020]. The parameters estimated from the direction-averaged (or spherical mean) data are D<sub>in</sub>, proxy of the intra-neurite effective axial diffusivity; D<sub>e</sub>, proxy of the extracellular effective mean diffusivity; r<sub>s</sub, a proxy of apparent soma radius as well as the signal fractions subject to the constraint f<sub>in</sub> + f<sub>is</sub> + f<sub>ec</sub> = 1, proxy respectively of the relaxation-weighted neurite, soma and extracellular volume fractions. The bulk diffusivity inside the sphere D<sub>is</sub> is fixed to 3 μm<sup>2</sup>/ms. The parameters were fitted using a Random Forest regression algorithm (TreeBagger Matlab®) with 200 trees, trained on simulated data, using the code publicly available at https://github.com/palombom/SANDI-Matlab-Toolbox-Latest-Release. The training data consisted of simulated signals for 10<sup>5</sup> parameter combinations, uniformly sampled: f<sub>in</sub> and f<sub>is</sub> ∈ [0, 1], D<sub>in</sub> ∈ [0.5, 3] μm<sup>2</sup>/ms, D<sub>e</sub> ∈ [0.5, 3] μm<sup>2</sup>/ms and r<sub>s</sub> ∈ [1, 12.5] μm. Rician noise with a distribution of standard deviations randomly sampled from the voxels within the brain mask of the noise map obtained using MPPCA denoising was added to account for realistic SNR levels and rectified noise floor. The loss function of the training was the mean squared error between predicted parameters and ground truth values. Model fitting provided maps of f<sub>in</sub>, f<sub>is</sub>, f<sub>e</sub>, D<sub>in</sub>, D<sub>e</sub> and r<sub>s</sub>.”

      (a) Potential degeneracy between f<sub>is</sub>and r<sub>s</sub>. We agree that partial coupling (or degeneracy) between the soma fraction f<sub>is</sub> and soma radius r<sub>s</sub> is possible when the acquisition does not provide strong sensitivity to restricted sphere size (e.g., in the low b-values regime). Our protocol benefits from high b-values (up to 6000 s/mm<sup>2</sup>) enabled by the Connectom gradient system, which increases sensitivity to signal attenuation from restricted compartments and reduce the f<sub>is</sub>-r<sub>s</sub> coupling/degeneracy. However, we acknowledge that the specific choice of fixed diffusion timing (in our case δ=7 ms, Δ=24 ms) can further modulate the f<sub>is</sub>-r<sub>s</sub> coupling/degeneracy in a protocol-dependent way. To reflect this appropriately, we now explicitly state that r<sub>s</sub> should be interpreted as an “apparent soma radius” under our protocol, and that our inferences focus on relative group differences and spatial patterns rather than absolute histological soma radii.

      We have now added a paragraph in the limitations section acknowledging this point.

      (b) Dot compartment. We did not include an explicit “dot” (immobile) compartment, because there is no evidence that in human in vivo this is required (see for example very low and negligible contribution provided in Tax C. et al. NeuroImage 2020: https://www.sciencedirect.com/science/article/pii/S1053811920300215). Accordingly, our fits did not include a dot term, and we now state this explicitly in the Methods. However, we would like to clarify that our fitting method (described in details at https://github.com/palombom/SANDI-Matlab-Toolbox-Latest-Release) includes accurately the impact of Rician noise and thus it account for the corresponding rectified noise-floor that very often, in high b-values applications, is mistakenly associated with a “dot” compartment. Therefore, there is no expected bias on the estimated f<sub>is</sub> and r<sub>s</sub> due to not including a “dot” compartment.

      (c) Exchange modelling and fixed stick diffusivity. We agree that SANDI, as implemented here, does not explicitly model inter-compartment exchange during the diffusion encoding and uses simplified representations of neurites (sticks), but the intra-stick diffusivity, D<sub>in</sub>, was not fixed but rather fitted. In diseased or aging tissue, deviations from these assumptions (e.g., altered membrane permeability) may bias compartment estimates. This has been investigated in dept in Schiavi S. et al. HBM 2023 (https://onlinelibrary.wiley.com/doi/full/10.1002/hbm.26416), so we refer the redear to that. We have added an explicit limitation statement noting that HD-related microstructural changes (e.g., changes to membrane permeability) could affect model parameter fidelity, and thus f<sub>is</sub>and r<sub>s</sub> should be treated as MRI-derived effective indices rather than direct quantitative measures of neuron loss or glial hypertrophy. Importantly, our analysis compares groups under an identical acquisition and fitting pipeline, so grouplevel contrasts remain informative even if absolute parameter values are biased.

      (d) Robustness / identifiability checks. We agree that reporting robustness strengthens confidence, particularly given subtle effects and biological heterogeneity. The SANDI Matlab Toolbox we used extensively investigates model parameters robustness and identifiability using numerical simulations and synthetic signals accounting for the specific experimental protocol and noise distribution. An example of the results supporting the robustness / identifiability is reported in the Author response images. These results show that accuracy and precision of all SANDI model parameters, except D<sub>in</sub>, is very high (>~80%, Author response image 1)

      Author response image 1.

      Analysis of the accuracy and precision of SANDI model parameters estimation. We simulated 10<sup>4</sup> synthetic diffusion signals using the SANDI model with random combinations of five parameters: f<sub>neurite</sub>(f<sub>in</sub>), f<sub>soma</sub>(f<sub>is</sub>), D<sub>in</sub>, R<sub>soma</sub>(r<sub>s</sub>), and D<sub>e</sub>. Parameters were sampled uniformly from: f<sub>neurite</sub>, f<sub>soma</sub> ∈ [0,1]; D<sub>in</sub>, D<sub>e</sub> 𝛜[0.5,3.0] µm<sup>2</sup>/𝑚𝑠; 𝑅<sub>soma</sub> 𝛜[1,12] µm. Rician noise with experimentally estimated variance was added, and the SANDI model was then fit to the noisy signals. For each parameter, we report the relative percentage error between estimated and ground-truth values as a function of the parameter value (normalized to [0,1]), together with goodness-of-fit (R<sup>2</sup>).

      and sensitivity to changes as small as 5% in each of the model parameters is correctly captured (Author response image 2A), with small to negligible degeneracy (except, once again, for D<sub>in</sub>), even in presence of exchange (Author response image 2B).

      Author response image 2.

      Sensitivity to 5% parameter modulations. The matrices show how a controlled perturbation in one parameter propagates into the estimated values of all model parameters. Each row corresponds to a 5% increase in the parameter on the y-axis; the resulting percentage change observed in each estimated parameter is reported along the x-axis. An ideal estimator would yield a purely diagonal matrix, with 5% on the diagonal and 0% elsewhere (no cross-talk). In (A), we used the same synthetic SANDI signals as in Figure 1. In (B), we additionally generated 10<sup>4</sup> synthetic signals incorporating neurite–extra-cellular exchange using the NEXI model [https://doi.org/10.1016/j.neuroimage.2022.119277] and an exchange time representative of human cortex (𝜏<sub>ex</sub> ≈ 30 ms) [https://doi.org/10.1162/imag_a_00104].

      We have therefore revised the manuscript language to be more precise and appropriately cautious, describing f<sub>is</sub> and r<sub>s</sub> as apparent compartment indices and explicitly discussing potential confounds (e.g., parameter coupling, and unmodelled exchange), while clarifying the value of SANDI for detecting reproducible group-level microstructural differences in HD.

      (6) Clarify "not-classified" group in figures:

      It is not clear to me what the "not-classified" groups shown in Figures 3-4 represent, what criteria determined their inclusion, and whether their inclusion affects the comparability or interpretability of staging-based analyses

      We have added to the legends of Figures 3 and 4 that not-classified refers to HD participants who could not be HD-ISS classified due to missing clinical data or their CAG repeat falling within the 36-40 range. As correlation analyses were conducted across the whole HD sample though, these datapoints were included in the scatterplot.

      (7) Figure labeling:

      There appears to be a mismatch between figure numbering and captions around Figures 3-4. Please ensure alignment.

      Mismatch between figure numbering and captions has been corrected.

      Minor suggestions:

      (1) Figures 1-2:

      (a) Label axis values meaningfully, e.g., negative vs. positive instead of 0 vs 1.

      (b) Add units to MD axes (e.g., ×10⁻⁴ mm²/s).

      (c) Figure 6 colors: Consider improving the color distinction between "Age" and "fis" predictors, which are currently hard to differentiate.

      The suggested adjustments have been made to Figures 1, 2, 5 and 6 and Figure 2 legend.

      (c) Discuss why apparent soma size decreases in some ROIs (e.g., pallidum), if unexpected.

      We offer the following speculation about the reduced soma size in the pallidum (pages 20/21): Changes in apparent soma size may reflect alterations in neural and glial cell proportions and/or morphology, including astrocyte and microglia swelling in response to neurodegeneration and soma shrinkage preceding neuronal cell death. Thus, increased apparent soma size in the striatum may indicate HD-related reorganisation of cell types driven by MSN loss and reactive glial cell swelling, whereas smaller soma size in the pallidum may result from infiltration of smaller glia cells prior to secondary neuronal loss following striatal MSN degeneration.

      Reviewer #3:

      (1) An important question is whether the SANDI measures, which require an expensive scanner and elaborate processing, are better biomarkers than the more traditional DTI measures. Can the authors compare the effect size of FA/MD with SANDI measures? In some of the plots and tables, FA/MD seem to have comparable, if not higher, correlations with QMotor or CAP scores. On the same vein, it is unclear whether DTI measures were included in hierarchical stepwise regression. I wonder if the stepwise models may have picked up FA/MD instead of SANDI measures if they are given a chance. Overall, I hope the authors can discuss their findings also in this light of cost vs. benefit of adopting SANDI in future studies, which is an important topic for clinical trials.

      Effect sizes (ES) of group differences in all microstructural indices can be found in Table 4. ES of DTI and SANDI indices in the caudate and putamen were broadly comparable with a trend for MD showing larger ES (FA: r<sub>rb</sub> = 0.38 -0.55, MD: r<sub>rb</sub> = 0.51 -0.61, f<sub>is</sub>: r<sub>rb</sub> = 0.32 -0.45, r<sub>s</sub>: r<sub>rb</sub> = 0.45 0.53).

      This information is now reported in the result section on pages 15/16 and is being discussed in light of cost versus benefit considerations on pages 21 and 25.

      (2) Similar to the above point, it is very important to consider how strong the biomarking signal is from SANDI measures compared to the good old striatal volume. Some plots seem to indicate that volumes still have the highest correlation with QMotor and the highest effect size in group comparisons. It would be helpful for the community to know where the new SANDI measures stand compared to the most typically used volumes in terms of effect size.

      Effect sizes (ES) of group differences in volumes can be found in Table 2. ES in caudate and putamen volumes ranged between r<sub>rb</sub> = 0.49 -0.55 and were comparable to the ES of apparent soma size r<sub>rb</sub> = 0.45 -0.53 but slightly larger than ES of soma density r<sub>rb</sub> = 0.32 -0.45.

      This information is now reported in the result section on page 15/16 and is being discussed on pages 21 and 25.

      (3) The diffusion measures are inevitably correlated to some degree. Please provide a correlation matrix in the supplementary material, including all DWI measures, to enable readers to better understand how similar SANDI measures are to each other or vs. other DTI measures. Perhaps adding volumes to this correlation matrix may also be a good future reference.

      We have added cross-correlation matrices between all imaging measures (SANDI, DTI, Volumes) for the total sample as well as for HC and HD participants separately to the Supplementary material (Figure 3), providing an overview of the shared variance within SANDI parameters and between SANDI and DTI and volume metrics for each group.

      (4) ISS stages:

      (a) The online ISS calculator requires cut-offs derived from the longitudinal Freesurfer pipeline, while the authors do not have longitudinal data. Thus, the ISS classification might be inaccurate to some degree if the authors used the FS cross-sectional pipeline. Please review this issue and see if updated cut-offs should be used to classify participants.

      We acknowledge that our HD-ISS classifications may have been biased due to the use of crosssectional rather than longitudinal FreeSurfer v6 volumes (page 23).

      (b) Were there really no participants with ISS 0 among the 56 HD individuals? Please clarify in the manuscript.

      We classified four individuals as ISS 0 based on their caudate and/or putamen z-scored volumes falling below 2SD of the healthy control mean. These analyses are described on pages 14-15 and were based on the cross-sectional data of this study.

      (5) A note on terminology that might be confusing to some readers. According to the creators of ISS, the ISS stages are created for research only; they are not used or applied in the clinic. On the other hand, the terms "premanifest" and "manifest" have a clinical meaning, typically based on the diagnostic confidence level. The assignment of ISS0-1 to premanifest and ISS2-3 to manifest may create some non-trivial confusion, if not opposition, in some segments of the HD community. The authors can keep their current terminology, but will need to at least clarify to the reader that this assignment is speculative, does not fully match the clinically-based categories, and should not be confused with similarly named groups in the previous literature.

      To avoid confusion about terminology, we have removed the labels “premanifest” versus “manifest” throughout the manuscript. We refer to HD-ISS 0-1 and HD-ISS 2-3 when referring to the exploratory comparisons between HD-ISS stages.

      (6) The population in the study seems to be obtained from different other studies or research projects, and there are missing scores for several participants due to the retrospective nature of sample gathering for the analyses. Please state clearly that this study was done with retrospective data to properly justify why there are missing data. Also, and this is important, please clarify for the reader whether there was any temporal bias in the acquisition of data of a certain group (HD) vs. another (HC). It is important to rule out that there were no scanner changes or upgrades that may confound the reported group differences.

      We can confirm there were no Connectom scanner changes or upgrades that may have confounded the reported group differences. This was added to the image acquisition section on page 10. We have added to the participant section on page 9 that data were retrospectively pooled from separate studies and explain this was the reason why HD-ISS classification was only available for a subset of participants.

      (7) Several of the significant results with SANDI scores seem to be driven by a subgroup of HD individuals that are more clearly different than the healthy control distribution. Not sure if this may help, but one idea the authors can consider is to check if HD individuals that deviate more than 2 SDs from the healthy control distribution of SANDI scores have also worse QMotor, worse atrophy, or higher CAP scores than those HD individuals that are practically within the 2SD boundary distribution of HDs. This is another way of showing that the new measures have potential for application in individualized medicine (the MRI Z score of a patient as a proxy of the clinical deterioration). It is not a request to authors but just a suggestion for their consideration.

      The data points in the scatterplots of Figures 3, 4, and 7 have now been color-coded according to HD-ISS stage, showing a stage-related worsening of microstructural and volumetric imaging markers and Q-Motor performance.

      (8) The variance explained in hierarchical regression is obtained by fitting models within the sample, and can be subject to overfitting. In the absence of a more robust cross-validated R2, the authors may want to at least briefly inform the reader that the current approach can be subject to overfitting and does not represent a true out-of-sample R2.

      We have added this point to the study limitations in the Discussion section on page 23.

      (9) There are two Figure 3 labels, and all figures thereafter do not match the manuscript.

      The Figure numbering has been corrected.

      (10) In (the currently labelled) Figure 8, there seem to be fewer than 56 data points in the scatterplots. Is there a reason why not all 56 HD individuals do not have the CAP100 score available? CAP needs only CAG and age, which all HD gene carriers should have, to be included in the study.

      Inclusion criteria for individuals with HD for the HD-DRUM project were a positive genetic test for the presence of the mutant huntingtin allele (CAG length ≥ 36 repeats) and/or a clinical diagnosis of HD. Thus, for a small number of participants CAG was not available for the calculation of CAP100 score.

    1. The future is likely to be hybrid. Pixel-native models will still be best for realism, texture, and exploration. Code-native systems will be better for structure, iteration, and production.

      作者挑战了AI领域非此即彼的技术路线之争,提出未来将是像素原生和代码原生系统共存发展的混合模式。这一观点打破了当前技术阵营的对立思维,暗示不同技术路线各有优势,应根据具体应用场景选择。

    2. The most interesting visual AI tools today have stopped trying to generate the final output. Instead, they're generating the source code behind it.

      大多数人认为视觉AI的进步主要体现在生成更逼真的图像和视频上,但作者认为真正的突破在于AI从生成像素转向生成代码。这一观点挑战了当前视觉AI领域的主流发展方向,暗示未来价值不在于最终视觉效果,而在于可编辑、可迭代的代码结构。

    1. Rapport de Synthèse : Transparence et Financement des Politiques Éducatives par des Acteurs Privés

      Synthèse de Direction

      Ce document synthétise les travaux de la commission d'enquête sénatoriale portant sur les mécanismes de financement des politiques publiques par des organismes de droit privé. L'analyse se concentre sur l'audition de la Direction générale de l'enseignement scolaire (DGESCO).

      Les points clés sont les suivants :

      • Impératif de Transparence : Face à la diversification des ressources (philanthropie, outils fiscaux), le Sénat examine les risques d'influence idéologique et le manque de transparence financière au sein des sphères associatives et éducatives.

      • Encadrement Juridique Strict : L'intervention d'associations dans les écoles est régie par le Code de l'éducation.

      Elle doit être complémentaire aux programmes, validée par les chefs d'établissement et placée sous la responsabilité pédagogique des enseignants.

      • Mécanismes d'Agrément : Le ministère gère 167 agréments nationaux et 1 330 académiques.

      Le taux de refus avoisine les 30 à 40 %, principalement pour défaut de complémentarité pédagogique ou non-respect des valeurs républicaines.

      • Points de Vigilance : Des entités commerciales (ex: LIFT) ou des initiatives locales (ex: Passeport du civisme) tentent de s'introduire dans le milieu scolaire sans agrément, parfois en utilisant indûment les logos ministériels ou en s'appuyant sur des cautions scientifiques ou territoriales.

      • Priorité à l'EVARS : L'éducation à la vie affective, relationnelle et à la sexualité (EVARS) fait l'objet d'un nouveau programme (2025) et d'une surveillance accrue pour contrer les contestations et les offres privées non validées.

      --------------------------------------------------------------------------------

      1. Cadre Institutionnel et Objectifs de la Commission d'Enquête

      La commission d'enquête a été constituée pour analyser l'influence des organismes, sociétés ou fondations de droit privé dans le financement des politiques publiques.

      Objectifs Majeurs

      • Analyse des mécanismes : Comprendre l'ampleur des mouvements financiers et identifier les règles encadrant les interventions privées.

      • Évaluation des risques : Mesurer les risques d'influence, l'entrave au fonctionnement démocratique et l'absence de transparence financière.

      • Protection du système : Imaginer des moyens de protection pour garantir l'indépendance de l'éducation nationale sans bloquer le fonctionnement des établissements.

      Facteurs de Risques Identifiés

      La recherche de diversification des ressources par les acteurs associatifs et institutionnels est accentuée par les tensions sur les financements publics.

      L'État encourage cette démarche via l'outil fiscal, ce qui favorise le déploiement de stratégies idéologiques portées par des acteurs philanthropiques privés.

      --------------------------------------------------------------------------------

      2. Régulation de l'Intervention Associative en Milieu Scolaire

      L'intervention des associations dans l'enseignement public est strictement encadrée par le Code de l'éducation.

      Fondements Juridiques

      • Article L912-1 : Tout intervenant extérieur durant le temps scolaire est soumis à l'autorisation du chef d'établissement et agit sous la responsabilité d'un enseignant.

      • Article D551-6 : Les associations agréées interviennent en appui aux activités d'enseignement sans jamais s'y substituer.

      • Cas des associations non agréées : Leur intervention est exceptionnelle.

      Elle nécessite l'accord du Recteur ou du DAZEN (Directeur académique des services de l'éducation nationale) et l'avis de l'inspecteur de circonscription.

      Principes Fondamentaux d'Intervention

      • Complémentarité : L'action doit être en lien avec les programmes scolaires et les priorités ministérielles.

      • Responsabilité Pédagogique : L'enseignant demeure seul responsable des contenus et du déroulement des séances.

      • Projet d'Établissement : L'intervention doit s'inscrire dans le projet pédagogique de l'école ou du collège.

      --------------------------------------------------------------------------------

      3. Dispositifs de Soutien et de Financement (Budget 2025)

      Le ministère de l'Éducation nationale soutient les structures associatives via différents leviers financiers et administratifs.

      | Type de Soutien | Nombre d'Associations (2025) | Montant Alloué (Euros) | | --- | --- | --- | | Convention Pluriannuelle d'Objectifs (CPO) | 18 | 54 282 150 € | | Soutien Annuel (Subventions) | 155 | 6 429 000 € | | Total | 173 | ~60,7 Millions € |

      Note : Les subventions dépassant 2 millions d'euros font l'objet d'un contrôle budgétaire et comptable ministériel renforcé.

      --------------------------------------------------------------------------------

      4. Le Processus d'Agrément : Critères et Exigences

      L'agrément est le principal filtre permettant de garantir la qualité des intervenants.

      Critères d'Éligibilité

      • Respect du "Tronçon Commun" : Vérification de la transparence financière et signature du Contrat d'Engagement Républicain (valable 5 ans).

      • Valeur Pédagogique : La DGESCO évalue la qualité des interventions et leur adéquation avec les valeurs de l'école.

      • Refus des solutions "Clé en main" : Le ministère privilégie la coconstruction avec les enseignants plutôt que l'achat de modules rigides.

      Statistiques et Dynamique

      • Volume de demandes : Environ 100 dossiers complets examinés par an au niveau national.

      • Taux de refus : Entre 30 % et 40 %.- Motifs fréquents : Manque de complémentarité pédagogique, absence de déploiement national ou académique suffisant, ou non-conformité aux principes républicains.

      Instance de Décision

      L'avis est rendu par le Conseil National des Associations Éducatives Complémentaires de l'Enseignement Public (CNAECEP).

      Bien que consultatif, cet avis est généralement suivi par le Ministre, qui est le décideur final.

      --------------------------------------------------------------------------------

      5. Analyse de Cas : Risques d'Influence et Usurpation de Crédibilité

      L'audition a mis en lumière des tentatives de contournement des circuits d'agrément officiels.

      L'association "Passeport du Civisme"

      Cette structure s'est vu refuser son agrément national en 2024 au motif qu'elle propose un support pédagogique (le passeport) plutôt qu'une démarche associative globale.

      Malgré ce refus, l'association a utilisé indûment le logo du ministère sur son site et ses ressources imprimées pour démarcher des collectivités territoriales.

      La DGESCO a dû intervenir pour exiger le retrait du logo et alerter les académies.

      L'entité "LIFT"

      https://youtu.be/BCCPu-d6QZ8?t=2124 reponse dgesco https://youtu.be/BCCPu-d6QZ8?t=2614 LIFT illustre le risque lié aux acteurs commerciaux.

      • Nature : Société privée à but lucratif proposant des modules payants sur l'éducation sexuelle.

      • Problématiques identifiées : Absence de professionnels de santé dans la conception, manque d'adaptation à l'âge des élèves et format "clé en main" empêchant tout retravail pédagogique par l'enseignant.

      • Stratégie d'influence : L'entreprise s'appuie sur des cautionnements externes (recherche universitaire, marchés publics régionaux) pour tenter d'entrer dans les établissements sans validation ministérielle.

      --------------------------------------------------------------------------------

      6. Focus : Éducation à la Vie Affective, Relationnelle et à la Sexualité (EVARS)

      Ce domaine est identifié comme une zone de haute sensibilité, sujette à des tentatives d'influence externe.

      • Réforme de 2025 : Adoption quasi unanime de nouveaux programmes au Conseil Supérieur de l'Éducation (CSE).

      • Déploiement : Objectif de trois séances annuelles par groupe d'âge.

      En décembre 2025, 66 % des écoles et 48 % des collèges avaient déjà réalisé au moins une séance.

      • Contrôle des intervenants : La circulaire du 4 février 2025 impose que toute intervention extérieure soit assurée par un binôme incluant obligatoirement un personnel de l'Éducation nationale.

      Le ministère privilégie exclusivement des associations agréées.

      --------------------------------------------------------------------------------

      7. Perspectives et Améliorations du Pilotage

      La DGESCO reconnaît la nécessité de renforcer les mécanismes de contrôle et d'évaluation.

      Axes d'Amélioration

      • Professionnalisation de l'évaluation : Le ministère admet être mieux armé pour l'octroi de l'agrément (amont) que pour le suivi évaluatif de l'action réelle sur le terrain (aval).

      • Transparence des "Satellites" : Difficulté à cartographier les actions menées par les réseaux locaux de grandes têtes de réseau (ex: Ligue de l'Enseignement, Souvenir Français).

      • Plateforme "ADAGE" (Pass Culture) : Mise en place de groupes de travail pour améliorer le contrôle de la qualité de l'offre culturelle et instaurer un système d'avis portés par les enseignants sur les prestataires.

      • Nouveaux critères réglementaires : Projet d'imposer un bilan à mi-parcours de l'agrément et un délai d'observation avant toute première demande d'agrément.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) First, a central claim is that arousal modulates functional connectivity in a hemispherically asymmetric and community-specific manner. Although structured asymmetries are demonstrated at the group level, it remains unclear whether these effects reflect a stable neurobiological principle or arise from high-dimensional, connection-wise analyses that are sensitive to sampling variability. Given the interpretive weight placed on hemispheric lateralization, stronger evidence of robustness and individual-level consistency would be necessary to support this conclusion.

      We appreciate your critical comments on the robustness of our lateralization findings. We fully agree with you that it is essential to demonstrate that the observed hemispheric asymmetries reflect a stable neurobiological principle rather than an artifact of sampling variability or high-dimensional noise. To address this concern, we performed two rigorous validation analyses using 500-iteration resampling schemes, consisting of a split-half reliability test and a participant-level consistency assessment.

      First, to ensure our findings do not depend on specific sample compositions, we conducted a split-half reliability test where the dataset was randomly partitioned into two independent subgroups over 500 iterations. As shown in Figure S1A, the community labels maintained high spatial consistency across iterations (as evidenced by the confusion matrix and Dice coefficient distributions), and our original findings—including network-pair community architecture (Fig. S2A), regional affiliation patterns (Fig. S3A-B), and arousal–tvFC coupling lateralization (Fig. S4A-B)—were consistently situated at the center of the iteration distributions.

      Second, to account for potential within-participant dependencies in the HCP 7T dataset, we performed a participant-level resampling analysis (N = 139). By randomly selecting a different session for each participant across 500 iterations, we confirmed that the community architecture and hemispheric biases remain robust even under this strict control (Figure S1A, S2B, S3C-D and S4C-D). Collectively, these additional analyses provide strong evidence that the hemispheric lateralization we reported is not a byproduct of sampling bias, but instead represents a stable organizational principle of the arousal-modulated connectome.

      (2) Second, all analyses are based on ultra-high-field imaging. The manuscript does not address whether the reported arousal-related patterns, including the community structure and hemispheric asymmetries, are expected to be reproducible at standard field strengths. It therefore remains unclear whether the findings depend critically on the use of high-field data or whether they would generalize to more widely available datasets, limiting the broader applicability of the results.

      We appreciate your constructive comments on the generalizability of our findings across different field strengths.

      As you noted, our primary motivation for employing 7T ultra-high-field imaging was to leverage its superior signal-to-noise ratio (SNR) and significantly enhanced BOLD sensitivity. These technical advantages were instrumental in capturing the subtle, moment-to-moment coupling between spontaneous pupillary fluctuations and tvFC—signals that might be close to the detection threshold in standard field strength environments.

      However, we fully recognize your point that 3T remains the standard in most clinical and research settings. In the revised manuscript, we have added a dedicated discussion to address this (page 21, lines 447-456):

      “Fifth, the findings reported here were derived exclusively from ultra-high-field (7T) imaging data. The superior BOLD sensitivity of 7T fMRI was instrumental in resolving the fine-scale community architecture of arousal–tvFC coupling, which involves subtle signals that may be challenging to detect at lower field strengths. Given that 3T remains the most common parameter for neuroimaging research and clinical applications, future investigations are needed to determine the extent to which these organizational principles generalize to standard field strength data. Validating these motifs in large-scale 3T datasets will be essential to establish their broader applicability across different imaging environments.”

      (3) Third, arousal-connectivity coupling is assessed using zero-lag correlations between pupil diameter and time-resolved connectivity estimates. Physiological and hemodynamic considerations suggest that pupil-linked arousal and blood-based imaging signals may exhibit systematic temporal delays. The absence of analyses examining sensitivity to such delays raises the possibility that the reported coupling patterns depend on a specific temporal alignment assumption.

      Given the inherent delay of the hemodynamic response function (HRF) and the complex temporal relationship between pupillary dynamics and neural activity, we conducted an additional lagged cross-correlation analysis to test the sensitivity of our findings. Following established frameworks for linking BOLD signals with pupillometry (Yellin et al., 2015; Gonzalez-Castillo et al., 2022; Lloyd et al., 2023), we systematically shifted the pupil time series relative to the fMRI data by -3 TR to +3 TR (-3s to +3s) and evaluated the consistency of the community architecture across these different lags using Dice coefficients.

      As shown in Figure S5, these results demonstrate that the community organization remain stable across the tested range of physiological delays. This stability indicates that the arousal-modulated communities we reported are not specific to the zero-lag assumption but instead persist throughout the physiologically plausible lag window. Consequently, our findings reflect a robust neurobiological phenomenon rather than an artifact of a specific temporal alignment.

      (4) Fourth, the estimation of time-resolved connectivity relies on a single choice of sliding-window length. The manuscript does not examine whether the reported patterns are stable across different window sizes. Given ongoing concerns about parameter dependence in time-resolved connectivity analyses, sensitivity analyses would be important to establish that the findings are not artifacts of a particular analytical choice.

      To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).

      As shown in Figure S5, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. These findings provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data rather than being driven by specific analytical choices in the sliding-window setup.

      (5) Finally, the identification of seven connectivity communities is a central result, yet the justification for this choice relies primarily on a single clustering quality measure. In practice, evaluation of clustering solutions typically draws on multiple complementary criteria, including measures of compactness and separation, approaches for selecting the number of clusters, and assessments of stability under resampling. Without such complementary evaluations, it is difficult to determine whether the reported community structure reflects a stable organizational feature or sensitivity to specific methodological decisions.

      We agree that relying on a single measure can be limiting, and in the revised manuscript, we have implemented a comprehensive multi-criteria evaluation to justify our selection of K=7. To ensure the robustness of the community partition, we expanded our analysis to include several complementary indices, such as the Davies-Bouldin Index, Calinski-Harabasz Score, and Silhouette Coefficient, alongside the original Within-Cluster Sum of Squares (WCSS), as detailed in Figure S7A.

      To further minimize subjective bias in "elbow" detection, we utilized the L-method (Salvador & Chan, 2004), which identifies the optimal K by minimizing the combined root-mean-square error (RMSE) of two linear regression segments. As illustrated in Figure S7B, the RMSE was minimized at K=7, providing a robust mathematical basis for our partition. Furthermore, we systematically visualized the community maps across a range of granularities from K=5 to 9 (Figure S7C). This stability analysis demonstrates that the fundamental topological features and the resulting hemispheric asymmetries are not transient artifacts of a specific K but are consistently preserved as the clustering granularity increases. These additional evaluations demonstrate that the seven-community structure reflects a stable organizational feature of arousal-modulated connectivity

      Reviewer #2 (Public review):

      (1) Arousal effects on BOLD signals and on pupil size can have different delays, so it would be valuable to test lagged relationships (for example, shifting the pupil series forward and backward) to show that the main community structure and lateralization results are not sensitive to an arbitrary temporal alignment.

      We agree with you that accounting for the varying delays between BOLD signals and pupillary dynamics is essential for ensuring the robustness of our results. We conducted a comprehensive lagged cross-correlation analysis to address it. Following established frameworks for linking BOLD signals with pupillometry (Yellin et al., 2015; Gonzalez-Castillo et al., 2022; Lloyd et al., 2023), we systematically shifted the pupil time series relative to the fMRI data by -3 TR to +3 TR (-3s to +3s) and evaluated the consistency of the community architecture across these lags using Dice coefficients.

      As shown in Figure S5C, these results demonstrate that the core community organization remain stable across the tested range of physiological delays. This stability confirms that our findings are not sensitive to an arbitrary temporal alignment but instead reflect a robust neurobiological phenomenon that persists throughout the physiologically plausible lag window.

      (2) Pupil diameter covaries with blinks, eye closure, and other factors that can covary with head motion and physiological noise. The Methods include substantial quality control and denoising, including motion regression and scrubbing, plus exclusions for eye closure.

      We appreciate your attention to these potential confounding factors. While we implemented rigorous preprocessing including regressing out confounds on fMRI images, we agree that physiological noise and motion may influenced pupil signals.

      To address this, we conducted an additional control analysis where we included head motion (framewise displacement, FD) and the global signal (defined as the mean signal across all gray matter voxels) as covariates when calculating the arousal–tvFC coupling. We then re-evaluated the similarity between the resulting community architecture and our original findings. As shown in Figure S4, the community structure remained stable after controlling for these variables.

      Regarding eye closure, we intentionally did not regress this out, as extensive literature demonstrates that eye closure is itself a reliable physiological proxy for arousal levels (Sommer & Golz, 2010; Chang et al., 2016; Gonzalez-Castillo et al., 2022); regressing it out would likely remove the very arousal-related coupling effects we aim to investigate.

      (3) The dataset is described in terms of runs retained (for example, 485 resting runs), and runs are treated as observations in clustering after z-scoring across runs. If multiple runs come from the same individuals, the manuscript would benefit from explicitly showing that results replicate at the participant level (for example, community structure stability within participant across runs, and participant-level summary statistics used for inference), rather than relying primarily on pooled run-level patterns.

      We fully agree with you that it is essential to demonstrate that the observed hemispheric asymmetries reflect a stable neurobiological principle rather than an artifact of sampling variability or high-dimensional noise. To address this concern, we performed two rigorous validation analyses using 500-iteration resampling schemes, consisting of a split-half reliability test and a participant-level consistency assessment.

      First, to ensure our findings do not depend on specific sample compositions, we conducted a split-half reliability test where the dataset was randomly partitioned into two independent subgroups over 500 iterations. As shown in Figure S1A, the community labels maintained high spatial consistency across iterations (as evidenced by the confusion matrix and Dice coefficient distributions), and our original findings—including network-pair community architecture (Fig. S2A), regional affiliation patterns (Fig. S3A-B), and arousal–tvFC coupling lateralization (Fig. S4A-B)—were consistently situated at the center of the iteration distributions.

      Second, to account for potential within-participant dependencies in the HCP 7T dataset, we performed a participant-level resampling analysis (N = 139). By randomly selecting a different session for each participant across 500 iterations, we confirmed that the community architecture and hemispheric biases remain robust even under this strict control (Figure S1A, S2B, S3C-D and S4C-D). Collectively, these additional analyses provide strong evidence that the hemispheric lateralization we reported is not a byproduct of sampling bias, but instead represents a stable organizational principle of the arousal-modulated connectome.

      (4) Time-resolved connectivity is estimated using a 30-second sliding window and 5 second step. It is reasonable to wonder whether the same conclusions hold with alternative estimators that do not rely on fixed windows. The Discussion acknowledges this limitation, but adding a small robustness analysis would make the paper more definitive.

      To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).

      As shown in Figure S3, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. Furthermore, the core hemispheric asymmetry patterns were robustly preserved regardless of the specific windowing configuration used. These results provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data and are stable across a broad range of temporal scales.

      Reviewer #3 (Public review):

      (1) A major limitation of the study is the limited discussion of subcortical regions, which play a central role in arousal regulation according to extensive prior literature. Although the current analyses focus primarily on cortical organization, the authors should include a brief discussion of how their findings relate to subcortical arousal systems.

      We completely agree that subcortical structures are pivotal drivers of arousal regulation. While our study primarily utilized a symmetric cortical atlas to ensure a mathematically rigorous assessment of hemispheric lateralization, we recognize that the exclusion of subcortical regions limits the functional interpretation of the observed patterns.

      In the revised manuscript, we have added a dedicated discussion part (page 20, lines 412-428) to address this point:

      “First, to ensure a mathematically rigorous assessment of hemispheric asymmetry, our analysis was restricted to a symmetric cortical parcellation. Consequently, while we demonstrate that arousal-modulated connectivity follows a structured macroscopic architecture, we did not explicitly analyze the subcortical nuclei hypothesized to drive these patterns. We hypothesize that the presence of these low-dimensional cortical communities reflects coordinated motifs rather than a homogeneous gain modulation, potentially mirroring the differentiated projection patterns of subcortical neuromodulatory systems. For instance, the locus coeruleus–noradrenergic pathway (Chandler et al., 2014; Schwarz & Luo, 2015) and thalamus (Hwang et al., 2017; Shine, 2019; Müller et al., 2020; Shine et al., 2023) possess extensive yet non-uniform projections that may anchor the community-specific and hemispherically asymmetric patterns observed here. “

      (2) While sliding window methods can capture temporal changes in functional organization, they have limitations in characterizing moment-to-moment neural fluctuations. In particular, results can be highly sensitive to window length and step size. The manuscript would benefit from (a) a clearer discussion of these methodological limitations, (b) justification for the chosen window length and step size, and (c) a sensitivity analysis demonstrating whether the main findings are robust across different parameter choices.

      To ensure that our findings are not artifacts of a specific analytical choice, we performed an exhaustive sensitivity analysis by repeating our entire pipeline across a wide range of window lengths (30s, 35s, 60s, and 90s) and step sizes (1s, 5s, and 10s). We then employed Dice coefficients to quantify the topological similarity between these alternative configurations and our original parameters (30s window, 5s step).

      As shown in Figure S5, our results demonstrate high topological consistency, with Dice coefficients for community structures remaining consistently above 0.8 across all tested parameter combinations. Furthermore, the core hemispheric asymmetry patterns were robustly preserved regardless of the specific windowing configuration used. These results provide strong evidence that the arousal-modulated organizational principles we reported are inherent to the data and are stable across a broad range of temporal scales.

      (2) The authors use k-means clustering to identify groups of brain regions and refer to these groupings as "communities." However, in general, community detection typically refers to graph-based algorithms that identify modules based on connectivity structure (e.g., modularity maximization). The clusters derived from k-means in feature space are not necessarily equivalent to graph-theoretic communities. The authors should explicitly clarify this distinction and adjust terminology accordingly to avoid conceptual ambiguity.

      We agree that the term "community detection" is often specifically associated with graph-based algorithms, such as modularity maximization, which define modules based on topological connectivity. In contrast, our implementation of k-means identifies groupings based on the similarity of arousal–FC coupling patterns within a high-dimensional feature space.

      To avoid any conceptual ambiguity or potential confusion, we have explicitly clarified this distinction in the Methods (pages 24-25, lines 533-542) section of the revised manuscript:

      “We employed the k-means clustering algorithm (Euclidean distance) to explore a range of cluster solutions from K = 2 to 15. To ensure the stability of the results and avoid local optima, each K was repeated 250 times with random initializations. The optimal number of clusters was determined by evaluating clustering quality and reproducibility (e.g., maximizing silhouette stability). It is important to clarify that "communities" in this context refer to clusters of edges that exhibit similar arousal-modulation motifs within a high-dimensional feature space, rather than topological modules typically derived from graph-theoretic algorithms like modularity maximization. This procedure consistently identified seven distinct communities, each representing a robust, arousal-sensitive connectivity motif that characterizes the large-scale organization of brain-pupil coupling.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) To strengthen confidence in the reported hemispheric effects, the authors should provide additional robustness analyses, such as subject-level consistency of lateralization measures, split-half or resampling reliability, and sensitivity to alternative preprocessing or analysis choices. Reporting the distribution of lateralization effects across individuals would help clarify whether the observed asymmetries reflect stable features or group-level averages driven by a subset of connections or participants.

      We agree that establishing the individual-level stability of lateralization is essential. We have now provided extensive validation, including split-half reliability tests and participant-level consistency analyses (500 iterations). These results confirm that the reported asymmetries are robust and consistent across the sample. Please refer to Reviewer #1 Weakness2 for the full analysis and associated figures (Figure. S1-S4).

      (2) The authors should examine whether arousal-connectivity coupling patterns are robust to plausible temporal delays between pupil diameter and BOLD signals. Lagged or time-shifted analyses would help establish that the findings do not depend on a specific zero-lag assumption.

      We agree that validating the coupling between pupil dynamics and the time varying FC is essential. To address this, we conducted a lag sensitivity analysis by shifting the pupil-derived arousal signal within a physiologically plausible range (-3 to +3 TR). The community architecture remains highly consistent across these temporal offsets, showing high spatial correlation and Dice coefficients with our original findings. This stability confirms that the identified organizational motifs are robust and not dependent on a specific zero-lag assumption. For the full details of this validation and the associated figures, please refer to Reviewer #1 Weakness3 and Figure S5 in the Supplementary Material.

      (3) Given reliance on a single sliding-window length, the authors should assess how key results vary across different window sizes. Demonstrating stability of the community structure and lateralization patterns across parameter choices would strengthen the methodological foundation of the study.

      We have conducted an exhaustive sensitivity analysis across various window lengths (30s, 35s, 60s, 90s) and step sizes (1s, 5s, 10s). The high Dice coefficients (>0.8) confirm that our findings are not dependent on specific windowing choices. Please refer to Reviewer #1 Weakness3 and Figure S5 for the full results.

      (4) The justification for the chosen number of connectivity communities would benefit from additional clustering evaluations. Complementary criteria such as measures of compactness and separation, model selection approaches for determining the number of clusters, and stability or reproducibility under resampling would help establish whether the reported community structure is robust rather than method-dependent.

      To strengthen the mathematical basis for our partition, we have implemented a multi-metric evaluation and the L-method for objective K selection. These metrics consistently support the seven-community structure. Please refer to our response to Reviewer #1 Weakness5 and Figure S7 for the comprehensive evaluation.

      (5) The manuscript would benefit from a clearer discussion of why ultra-high-field imaging was required for the present analyses and whether similar results are expected at standard field strengths. If feasible, validation using lower-field data or reference to existing datasets would substantially enhance generalizability.

      We have expanded our discussion to clarify that 7T was instrumental for capturing the subtle, high-frequency arousal-tvFC coupling due to its superior SNR. We also explicitly discuss the potential and limitations of generalizing these findings to 3T datasets. Please refer to our response to Reviewer #1 Weakness2 for the full discussion (page 21, lines 447-456).

      (6) The authors should more explicitly report exclusion related to pupil measurements and discuss how missing or noisy pupillometry may affect the applicability of the approach in other datasets or experimental settings.

      We agree that transparency in data screening is essential for the reproducibility of our method. In the revised manuscript, we have clarified our quality control pipeline in the quality control section in Methods (page 23, lines 502-510):

      “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female). Runs were excluded if (a) more than 20% of frames exceeded motion thresholds, (b) eye tracking did not cover the full fMRI time series, or (c) more than 90% of samples were classified as eye closure. After applying these criteria, 485 of the initial 723 scans were retained for analysis. The same quality-control pipeline was applied to the movie-watching dataset, yielding 513 usable scans out of the original 725. Detailed information on data retention and run distribution per participant is summarized in Figure S9.”

      Furthermore, we have added a discussion regarding how noisy or missing pupillary signals might affect the generalizability of our approach (pages 20-21, lines 437-447):

      “Fourth, the generalizability of our approach to external cohorts warrants caution regarding pupillary data integrity. In contexts where high-fidelity eye-tracking is technically demanding—such as in clinical settings involving patients with restricted compliance or in naturalistic fMRI studies—the prevalence of blink artifacts and signal dropouts may bias the estimation of arousal-modulated states. Excessive reliance on data interpolation in such cases could artificially smooth temporal fluctuations, leading to an overestimation of community stability. Future applications should therefore prioritize high-frequency sampling and potentially incorporate multi-modal physiological features (e.g., respiratory or cardiac signals) to cross-validate arousal dynamics when pupillary data is suboptimal (Meissner et al., 2023; Bolt et al., 2025; Weijs et al., 2025).”

      (7) The authors should ensure that all data and analysis code necessary to reproduce the results are made publicly available in accordance with eLife policies, including clear documentation of preprocessing steps, parameter choices, and clustering procedures.

      All analysis code and the necessary processed data required to reproduce our findings have been made publicly available through https://github.com/kongxy6478/Arousal-modulates-functional-connectivity. This repository includes documented pipelines for pupillometry cleaning and fMRI denoising, alongside the core Python scripts used for sliding-window connectivity calculation, k-means clustering, and hemispheric lateralization analysis.

      Reviewer #2 (Recommendations for the authors):

      (1) Add a lag sensitivity analysis between pupil-derived arousal and time-resolved connectivity, and report whether the seven community structure and key lateralization findings are stable across a plausible lag range.

      We agree that validating the coupling between pupil dynamics and the time varying FC is essential. To address this, we conducted a lag sensitivity analysis by shifting the pupil-derived arousal signal within a physiologically plausible range (-3 to +3 TR). The community architecture remains highly consistent across these temporal offsets, showing high spatial correlation and Dice coefficients with our original findings. This stability confirms that the identified organizational motifs are robust and not dependent on a specific zero-lag assumption. For the full details of this validation and the associated figures, please refer to Reviewer #1 Weakness3 and Figure S5 in the Supplementary Material.

      (2) Quantify and report the extent to which residual head motion, blink rate, eye closure segments, and global signal changes explain arousal connectivity coupling, for example, via partial correlation or regression controls, and show that key effects persist.

      We agree that it is essential to demonstrate that the observed arousal-connectivity coupling is not driven by non-specific physiological or motion-related artifacts. As requested, we have quantified the influence of head motion (FD) and global signal on our primary results. By implementing partial correlation analyses, we confirmed that the identified arousal-modulated community structures persist even after strictly controlling for these variables. These results indicate that the arousal-tvFC coupling we report reflects a specific neuro-arousal process rather than a byproduct of motion or systemic physiological fluctuations. For the detailed quantitative results and control analysis figures, please refer to our response to Reviewer #2 Weakness3 and Figure S6 in the Supplementary Material.

      (3) Add participant-level validation: demonstrate that community profiles and lateralization signatures are consistent within participants across runs, and consider participant-level statistical summaries rather than treating all runs as independent observations.

      We agree that demonstrating participant-level consistency is vital. In response, we performed two rigorous 500-iteration resampling schemes: a split-half reliability test and a participant-level consistency assessment (N = 139). These analyses, which involved randomly partitioning the sample and selecting single sessions per participant, confirm that our community architecture and hemispheric biases are remarkably stable and not driven by sampling variability or high-dimensional noise. For a comprehensive description of these validations and the associated statistical distributions, please refer to our detailed response to Reviewer #2 Weakness3 and Figures S1–S4.

      (4) Provide an alternative dynamic connectivity estimator robustness check, or at a minimum, vary the window length and step size to show stability of the primary conclusions.

      We have conducted an exhaustive sensitivity analysis across various window lengths (30s, 35s, 60s, 90s) and step sizes (1s, 5s, 10s). The high Dice coefficients (>0.8) confirm that our findings are not dependent on specific windowing choices. Please refer to Reviewer #1 Weakness3 and Figure S5 for the full results.

      (5) Consider validating the seven community solutions with at least one additional unsupervised approach, and report agreement with the main k-means solution.

      We agree that validating the clustering scheme is essential. To this end, we implemented a multi-criteria evaluation (including Davies-Bouldin and Silhouette indices) and utilized the L-method (Salvador & Chan, 2004) to mathematically confirm K=7 as the optimal granularity (Figure S7A–B). Furthermore, we verified that the core topological features and hemispheric asymmetries remain robustly consistent across a range of granularities from K=5 to 9 (Figure S7C). These analyses demonstrate that our findings are not dependent on a specific K or subjective bias. For the full quantitative evaluation and stability maps, please refer to our response to Reviewer #2 Weakness5 and Figure S7.

      (6) State explicitly, early in Results, what the main inferential unit is (run or participant) for each key analysis, and clarify how repeated runs per participant are handled.

      We agree that defining the inferential unit is critical for methodological clarity. In the revised manuscript, we have explicitly stated at the beginning of the Results section (page 5, lines 113-116):

      “While our primary inferential analyses were conducted at the run level to leverage the high-density sampling of the HCP 7T dataset, we further validated the robustness of these findings using participant-level statistical summaries and resampling to account for within-participant dependencies (see Figure. S1-S2 in Supplementary Materia).”

      Specifically, all key findings—including community architecture and hemispheric asymmetries—were validated using participant-level statistics and resampling schemes (N = 139) to ensure that the results are not biased by within-participant dependencies.

      (7) When introducing the integration and segregation indices, add a brief intuitive explanation of what a positive or negative value means in plain language before the equations.

      We thank the reviewer for this suggestion to improve the accessibility of our methods. We have added brief, intuitive explanations for both indices in the Methods section (pages 26-27, lines 569-582):

      “The integration index provides a measure of the overall hemispheric dominance of arousal-modulated connections. A positive value indicates that arousal-related edges are preferentially concentrated in the left hemisphere (including its internal and outgoing connections) compared to the right.” and “The segregation index assesses whether arousal preferentially modulates local, intra-hemispheric communication versus long-range, inter-hemispheric communication. A positive value reflects a "segregated" left-hemisphere bias, where arousal strengthens within-hemisphere connections more than it strengthens across-hemisphere communication for that same hemisphere. “

      (8) In the Discussion, separate claims into "what we show" versus "what we hypothesize," especially when connecting findings to neuromodulatory pathways.

      In the revised manuscript, we have carefully separated our direct empirical findings from our mechanistic hypotheses. we have utilized more cautious and speculative language (e.g., "suggesting a potential role of," "may be mediated by," and "we hypothesize that”) (page 17, lines 352-358):

      “Specifically, we show the presence of low-dimensional, reproducible communities suggests that arousal modulates the connectome through coordinated motifs rather than homogeneous gain modulation. We hypothesize that this structured macroscopic architecture reflects the differentiated projection patterns of subcortical neuromodulatory systems, such as the locus coeruleus–noradrenergic pathway (Aston-Jones & Cohen, 2005; Jordan, 2024) and thalamus (Magnin et al., 2010; Lewis et al., 2015; Liu et al., 2018)”

      (9) Provide a clear participant-level summary (number of participants contributing to the retained runs, demographics if available, and distribution of runs per participant), alongside the reported run counts retained after quality control.

      We agree that clear reporting of participant-level data is essential. In the revised Methods section, we have added a detailed summary of participant demographics (age and sex) and clarified the sample composition (page 23, lines 502-503):

      “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female).”

      Furthermore, to provide a transparent view of the data retained after quality control, we have included Figure S9 to illustrate the distribution of valid runs per participant. This visualization confirms the amount of data contributing to our group-level inferences and accounts for exclusions due to motion or pupillary signal quality.

      (10) Report the robustness of results to reasonable changes in pupil preprocessing choices (for example, smoothing parameters or interpolation rules), since pupil diameter is the key arousal index.

      We agree that the robustness of pupil-derived arousal estimates is fundamental to our findings. To address this, we conducted an extensive validation analysis by comparing our original pupil preprocessing pipeline against 18 alternative combinations of parameters. These variations included different smoothing window sizes (100 ms, 200 ms, and 500 ms), interpolation methods (linear vs. cubic spline), and blink buffer durations (25 ms, 50 ms, and 100 ms). As shown in Figure S8, the pupil diameter time courses derived from these diverse pipelines remained highly correlated with our original estimates (all above 0.65). This demonstrates that our arousal-modulated connectivity results are remarkably robust to reasonable changes in pupil preprocessing choices.

      Reviewer #3 (Recommendations for the authors):

      I have two additional minor comments:

      (1) Given the overall goal of this study to identify large-scale brain communities or clusters underlying arousal, the results may be sensitive to the choice of cortical parcellation. The authors should consider:

      (a) including analyses using additional parcellation schemes, or

      (b) discussing how the current findings might depend on the chosen parcellation and the implications for robustness and generalizability.

      We have addressed this by adding a dedicated point in the Discussion (page 21, lines 456-465):

      “Sixth, our findings were derived using a single high-resolution cortical parcellation. While the specific choice of atlas can influence fine-grained regional connectivity, it is important to note that our primary conclusions—such as hemispheric asymmetries and community-level preferences—were identified and interpreted at the macroscopic network and system level. By aggregating signals across broad functional systems, this approach likely mitigates the dependency on precise regional boundary definitions. Nevertheless, future studies employing alternative parcellation schemes would be valuable to further confirm that these organizational principles are not specific to the current atlas but represent a generalizable feature of the arousal-modulated connectome.”

      (2) Some key details, such as the number of participants included in the study, as well as basic demographic information, are not reported.

      We apologize for this omission. In the revised Methods section, we have now included a detailed summary of the participant demographics, including the final sample size (N = 139), age, and sex distribution (page 23, lines 502-503):

      “The final analyzed sample for the resting-state consisted of N = 139 healthy participants (mean age = 29.1±3.5 years, 77 female)”

      Furthermore, to ensure full transparency regarding data retention, we have added a new figure (Figure S9) illustrating the distribution of valid fMRI runs per participant following our quality-control procedures. We believe these additions provide a clear and complete overview of the study sample.

      Reference

      Aston-Jones, G., & Cohen, J. D. (2005). AN INTEGRATIVE THEORY OF LOCUS COERULEUS-NOREPINEPHRINE FUNCTION: Adaptive Gain and Optimal Performance. In Annual Review of Neuroscience (Vol. 28, Issue Volume 28, 2005, pp. 403–450). Annual Reviews. https://doi.org/10.1146/annurev.neuro.28.061604.135709

      Bolt, T., Wang, S., Nomi, J. S., Setton, R., Gold, B. P., deB.Frederick, B., Yeo, B. T. T., Chen, J. J., Picchioni, D., Duyn, J. H., Spreng, R. N., Keilholz, S. D., Uddin, L. Q., & Chang, C. (2025). Autonomic physiological coupling of the global fMRI signal. Nature Neuroscience, 28(6), 1327–1335. https://doi.org/10.1038/s41593-025-01945-y

      Chandler, D. J., Gao, W.-J., & Waterhouse, B. D. (2014). Heterogeneous organization of the locus coeruleus projections to prefrontal and motor cortices. Proceedings of the National Academy of Sciences, 111(18), 6816–6821. https://doi.org/10.1073/pnas.1320827111

      Chang, C., Leopold, D. A., Schölvinck, M. L., Mandelkow, H., Picchioni, D., Liu, X., Ye, F. Q., Turchi, J. N., & Duyn, J. H. (2016). Tracking brain arousal fluctuations with fMRI. Proceedings of the National Academy of Sciences, 113(16), 4518–4523. https://doi.org/10/f8ktgg

      Gonzalez-Castillo, J., Fernandez, I. S., Handwerker, D. A., & Bandettini, P. A. (2022). Ultra-slow fMRI fluctuations in the fourth ventricle as a marker of drowsiness. NeuroImage, 259, 119424. https://doi.org/10.1016/j.neuroimage.2022.119424

      Hwang, K., Bertolero, M. A., Liu, W. B., & D’Esposito, M. (2017). The Human Thalamus Is an Integrative Hub for Functional Brain Networks. The Journal of Neuroscience, 37(23), 5594–5607. https://doi.org/10.1523/JNEUROSCI.0067-17.2017

      Jordan, R. (2024). The locus coeruleus as a global model failure system. Trends in Neurosciences, 47(2), 92–105. https://doi.org/10.1016/j.tins.2023.11.006

      Lewis, L. D., Voigts, J., Flores, F. J., Schmitt, L. I., Wilson, M. A., Halassa, M. M., & Brown, E. N. (2015). Thalamic reticular nucleus induces fast and local modulation of arousal state. eLife, 4, e08760. https://doi.org/10.7554/eLife.08760

      Liu, X., De Zwart, J. A., Schölvinck, M. L., Chang, C., Ye, F. Q., Leopold, D. A., & Duyn, J. H. (2018). Subcortical evidence for a contribution of arousal to fMRI studies of brain activity. Nature Communications, 9(1), 395. https://doi.org/10.1038/s41467-017-02815-3

      Lloyd, B., De Voogd, L. D., Mäki-Marttunen, V., & Nieuwenhuis, S. (2023). Pupil size reflects activation of subcortical ascending arousal system nuclei during rest. eLife, 12, e84822. https://doi.org/10.7554/eLife.84822

      Magnin, M., Rey, M., Bastuji, H., Guillemant, P., Mauguière, F., & Garcia-Larrea, L. (2010). Thalamic deactivation at sleep onset precedes that of the cerebral cortex in humans. Proceedings of the National Academy of Sciences, 107(8), 3829–3833. https://doi.org/10.1073/pnas.0909710107

      Meissner, S. N., Bächinger, M., Kikkert, S., Imhof, J., Missura, S., Carro Dominguez, M., & Wenderoth, N. (2023). Self-regulating arousal via pupil-based biofeedback. Nature Human Behaviour, 8(1), 43–62. https://doi.org/10.1038/s41562-023-01729-z

      Müller, E. J., Munn, B., Hearne, L. J., Smith, J. B., Fulcher, B., Arnatkevičiūtė, A., Lurie, D. J., Cocchi, L., & Shine, J. M. (2020). Core and matrix thalamic sub-populations relate to spatio-temporal cortical connectivity gradients. NeuroImage, 222, 117224. https://doi.org/10.1016/j.neuroimage.2020.117224

      Salvador, S., & Chan, P. (2004). Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. 16th IEEE International Conference on Tools with Artificial Intelligence, 576–584. https://doi.org/10.1109/ICTAI.2004.50

      Schwarz, L. A., & Luo, L. (2015). Organization of the Locus Coeruleus-Norepinephrine System. Current Biology, 25(21), R1051–R1056. https://doi.org/10.1016/j.cub.2015.09.039

      Shine, J. M. (2019). Neuromodulatory Influences on Integration and Segregation in the Brain. Trends in Cognitive Sciences, 23(7), 572–583. https://doi.org/10.1016/j.tics.2019.04.002

      Shine, J. M., Lewis, L. D., Garrett, D. D., & Hwang, K. (2023). The impact of the human thalamus on brain-wide information processing. Nature Reviews Neuroscience, 24(7), 416–430. https://doi.org/10.1038/s41583-023-00701-0

      Sommer, D., & Golz, M. (2010). Evaluation of PERCLOS based current fatigue monitoring technologies. 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, 4456–4459. https://doi.org/10.1109/IEMBS.2010.5625960

      Weijs, M. L., Missura, S., Potok-Szybińska, W., Bächinger, M., Badii, B., Carro-Domínguez, M., Wenderoth, N., & Meissner, S. N. (2025). Modulating cortical excitability and cortical arousal by pupil self-regulation. Nature Communications, 16(1), 4552. https://doi.org/10.1038/s41467-025-59837-5

      Yellin, D., Berkovich-Ohana, A., & Malach, R. (2015). Coupling between pupil fluctuations and resting-state fMRI uncovers a slow build-up of antagonistic responses in the human cortex. NeuroImage, 106, 414–427. https://doi.org/10.1016/j.neuroimage.2014.11.034

    1. If Nvidia has cracked the code on bringing AI agents easily, safely, and usefully to the masses, it could — and should — be big.

      大多数人认为AI代理技术仍处于早期阶段,难以在消费级设备上有效运行,但作者暗示Nvidia已经解决了这一技术难题。这一乐观观点挑战了当前AI代理技术仍不成熟的行业共识,暗示市场可能即将迎来AI代理的大规模普及。

    2. if Nvidia has cracked the code on bringing AI agents easily, safely, and usefully to the masses, it could — and should — be big

      大多数人认为将AI代理安全地带给大众消费者是一个难以解决的挑战,作者暗示Nvidia已经'破解了密码',能够轻松、安全、有效地将AI代理带给大众,这挑战了AI普及面临的技术和安全性难题的普遍认知。

    1. Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.

      Es 4 veces menos probable que deje pasar fallos sin señalarlos.

    1. Document de synthèse : Traitement institutionnel et judiciaire des troubles du comportement chez les mineurs

      Résumé exécutif

      Ce document synthétise l'intervention d'un magistrat spécialisé sur la prise en charge des mineurs en France.

      L'analyse met en lumière un basculement des problématiques de la délinquance juvénile classique vers celle de la radicalisation, tout en soulignant une défaillance générationnelle dans le dialogue avec la jeunesse.

      La justice y est présentée non pas comme un outil purement répressif, mais comme une institution pivot structurée autour de quatre fonctions essentielles : l'identité, la protection, l'éducation et la défense des victimes.

      Un point critique est soulevé concernant l'écart entre les principes républicains affichés et la réalité vécue par les jeunes, ainsi que la nécessité de passer du concept d'« autorité parentale » à celui de « responsabilité ».

      Enfin, le document souligne l'importance de l'application effective des lois existantes plutôt que la multiplication de réformes législatives.

      --------------------------------------------------------------------------------

      I. Cadre institutionnel et politique de la protection de l'enfance

      Le besoin de visibilité et de cohérence

      L'intervention souligne une critique historique de l'ONU concernant le manque de visibilité de la politique de protection de l'enfance en France.

      Pour y répondre, la loi du 14 mars 2016 a instauré le Conseil National de Protection de l'Enfance (CNPE).

      Cet organisme réunit l'État, les collectivités locales, le secteur associatif et les professionnels afin de créer une dialectique permettant d'évaluer et de réorienter les politiques publiques.

      Le fossé républicain et la radicalisation

      Un constat majeur est dressé : une partie de la jeunesse française est « à vau-l'eau », faute de dialogue avec les adultes.

      Ce vide est exploité par des réseaux mafieux ou des prédicateurs.

      • Dissonance des valeurs : Il existe un fossé entre les principes de « Liberté, Égalité, Fraternité » et la réalité sociale (injustices quotidiennes, conditions de vie précaires).

      • Évolution de la menace : La préoccupation actuelle s'est déplacée de la délinquance juvénile (qui représente environ 17 % de la délinquance totale et tend à baisser en proportion) vers la radicalisation et le terrorisme.

      • Engagement citoyen : La création de la « réserve de l'Éducation nationale » visait à mobiliser les citoyens pour renouer le dialogue, bien que l'angle initial de la laïcité soit jugé trop restrictif.

      --------------------------------------------------------------------------------

      II. Définitions juridiques et seuils de responsabilité

      Le statut juridique de l'enfant est régi par des normes internationales (Convention internationale des droits de l'enfant) qui interdisent de traiter un mineur de moins de 18 ans comme un adulte.

      Échelle de la responsabilité pénale

      Le système français prévoit une montée en puissance de la responsabilité selon l'âge :

      | Tranche d'âge | Statut et Responsabilité | | --- | --- | | 0 à 7-8 ans | Irresponsabilité pénale quasi totale ; considéré comme enfant en danger. | | 13 à 16 ans | Responsabilité pénale atténuée ; primauté de l'éducatif. | | 16 à 18 ans | Possibilité de condamnation comme un adulte si l'excuse de minorité est levée. | | 18 à 21 ans | Statut intermédiaire hérité de l'époque où la majorité était à 21 ans (besoin d'un statut 18-25 ans). |

      Note sur la sévérité : Jusqu'en 2016, un mineur de plus de 16 ans pouvait encourir la réclusion criminelle à perpétuité. La loi a désormais ramené ce plafond à 30 ans de réclusion.

      --------------------------------------------------------------------------------

      III. Les quatre fonctions majeures de la justice des mineurs

      La justice n'intervient pas sur la base de symptômes médicaux, mais à travers des « entrées » situationnelles.

      1. La fonction d'identité et de filiation

      Le juge définit qui est l'enfant à travers son nom, son prénom et sa filiation.

      Cette fonction détermine qui exerce la responsabilité de protection et d'éducation.

      L'intervenant préconise de remplacer le terme « autorité parentale » par « responsabilité parentale », l'autorité devant être au service de la responsabilité.

      2. La protection (Assistance éducative)

      Elle intervient quand l'enfant est en danger physique, psychologique ou moral (articles 375 et suivants du Code civil).

      • L'accueil vs le placement : Le terme « placement » est jugé déshumanisant et assimilé à une déchéance par les familles.

      Il convient de parler d'« accueil ».

      • Le rôle du juge : Il intervient en cas de conflit (soit sur le diagnostic du danger, soit sur la réponse à y apporter).

      L'intervention judiciaire doit être ponctuelle et viser à remettre les parents « en selle ».

      3. L'éducation et le traitement de la délinquance

      Basée sur l'ordonnance de 1945 et la loi de 1912, cette fonction vise à transformer la personne délinquante en personne non-délinquante.

      • Primauté de l'éducatif : La délinquance est souvent vue comme le résultat d'une carence éducative.

      • Efficacité : Environ 85 % des mineurs délinquants ne le sont plus à leur majorité grâce au travail social et judiciaire.

      • Contrainte et éducation : Il n'y a pas d'éducation sans contrainte.

      L'incarcération peut, dans certains cas, faire partie d'une démarche éducative pour marquer une limite.

      4. La garantie des droits des enfants victimes

      Les enfants victimes cumulent trois handicaps : leur statut de victime, leur genre (souvent des filles) et leur manque de crédibilité supposé.

      La justice doit assurer leur protection, notamment par :

      • L'enregistrement audiovisuel des auditions.

      • L'allongement des délais de prescription (démarre désormais à la majorité, pour une durée allant jusqu'à 20 ou 30 ans).

      --------------------------------------------------------------------------------

      IV. Enjeux et controverses contemporains

      La place du tiers dans la famille

      La structure familiale moderne inclut souvent des beaux-pères ou belles-mères qui exercent une responsabilité quotidienne sans statut juridique clair.

      Il est proposé que toute personne ayant légalement la charge d'un enfant puisse exercer les actes de la vie courante.

      Châtiments corporels et éducation

      La loi de 2016 visant à interdire les châtiments corporels a été annulée par le Conseil constitutionnel pour des raisons de forme (cavalier législatif).

      L'intervenant souligne la difficulté de la France à passer du « dressage » (violence) à l'« apprentissage » (autorité sans violence).

      Droits de l'enfant et pratiques culturelles/religieuses

      Un débat sensible oppose les droits fondamentaux de l'enfant (intégrité physique) aux traditions religieuses.

      • Circoncision et excision : Bien que non comparables dans leurs effets, elles partagent le point commun de marquer l'appartenance communautaire par une atteinte au corps.

      L'intervenant prône un débat public sur ces questions, affirmant que la loi de la République doit primer sur les normes religieuses ou culturelles.

      Statistiques et réalité de la délinquance

      Les statistiques policières et judiciaires sont à manipuler avec précaution.

      L'efficacité réelle de la police est estimée entre 6 et 7 % par rapport à la réalité totale de la délinquance (le « chiffre noir »), car seulement un fait sur cinq est déclaré par les victimes.

      --------------------------------------------------------------------------------

      V. Conclusions et perspectives d'action

      • Mise en œuvre des lois : L'enjeu n'est pas de créer de nouvelles lois (comme le doublement des Centres Éducatifs Fermés, jugé irréaliste et coûteux), mais d'appliquer les mesures existantes.

      La loi impose par exemple un rendez-vous éducatif dans les cinq jours suivant une décision, ce qui est rarement respecté.

      • Priorité au milieu ouvert : L'essentiel du travail doit se faire via les éducateurs et psychologues en milieu ouvert pour accompagner les 60 000 jeunes délinquants annuels, plutôt que de se focaliser sur les solutions lourdes (CEF).

      • Neutralité et objectivité : Le magistrat conclut en invitant les professionnels (psychologues) à maintenir une analyse systémique et une connaissance rigoureuse des institutions pour être réellement efficaces dans leur pratique.

    1. Synthèse d'Audition : Intervention des Associations et Organismes Privés dans le Milieu Scolaire

      Résumé Exécutif

      Ce document synthétise les témoignages recueillis par la commission d'enquête sénatoriale portant sur l'influence des organismes privés dans les politiques publiques, spécifiquement au sein du système éducatif.

      Les représentants des trois grandes fédérations de parents d'élèves (FCPE, PEEP et APEL) y soulignent un manque critique de transparence, de suivi et de contrôle concernant l'intervention d'associations extérieures dans les établissements scolaires.

      Bien que l'ouverture de l'école à des intervenants externes soit jugée nécessaire en appui pédagogique, plusieurs points de vigilance majeurs émergent :

      • Des mécanismes de contrôle insuffisants : L'Éducation nationale manque de processus rigoureux pour évaluer les interventions avant, pendant et après leur réalisation.

      • Une information lacunaire des parents : Bien que représentés dans les instances, les parents sont souvent placés devant le fait accompli et ne disposent d'aucun pouvoir décisionnel réel sur le choix des intervenants.

      • La problématique de l'EVARS : La mise en œuvre du programme d'éducation à la vie affective, relationnelle et sexuelle cristallise les tensions, révélant des disparités de moyens et des risques de dérives idéologiques.

      • Une préférence pour la formation interne : Un consensus se dégage sur la nécessité de privilégier la formation des enseignants plutôt que de sous-traiter des sujets sensibles à des organismes tiers dont la neutralité et la compétence peuvent varier.

      --------------------------------------------------------------------------------

      1. Cadre Réglementaire et Mécanismes d'Intervention

      L'intervention des associations en milieu scolaire est régie par des règles précises, bien que leur application pratique révèle des failles structurelles.

      Le principe de l'agrément

      Selon l'article D551-6 du Code de l'éducation, seules les associations agréées peuvent intervenir durant le temps scolaire.

      Cet agrément est censé garantir le respect des valeurs de la République, la neutralité, le pluralisme et l'absence de but lucratif.

      • Appui vs Substitution : L'intervention doit être un complément pédagogique et ne peut en aucun cas se substituer à l'enseignement délivré par les professeurs.

      • Le rôle du chef d'établissement : Il est le "chef d'orchestre" qui autorise l'intervention après concertation avec l'équipe pédagogique.

      • La faille dérogatoire : Il existe une possibilité de dérogation permettant à un chef d'établissement d'autoriser une association non agréée à titre exceptionnel, ce qui constitue une "faille" potentielle dans la sécurité du dispositif.

      Les instances de concertation

      | Instance | Rôle et Limites | | --- | --- | | CNAEC / CAEC | Commissions (nationales ou académiques) où les parents siègent pour donner un avis sur les agréments. Les informations fournies sont jugées minimales (fiches succinctes). | | CA / CVC / CVL / CESC | Instances d'établissement où les projets sont présentés. Les parents y sont informés de la venue d'associations, mais n'ont aucun rôle de décideur. |

      --------------------------------------------------------------------------------

      2. Analyse du Programme EVARS (Éducation à la Vie Affective, Relationnelle et Sexuelle)

      Le programme EVARS (ou EARS dans l'enseignement catholique) est au centre des préoccupations actuelles en raison de sa sensibilité et de sa mise en œuvre récente.

      • Nécessité du programme : Les associations de parents soutiennent l'importance de ce programme, notamment pour lutter contre l'inceste et les violences dès la maternelle, et pour promouvoir un civisme relationnel au lycée.

      • Disparités de mise en œuvre : On observe un contraste important de moyens et de suivi d'une région à l'autre.

      Le portage du programme dépend souvent du volontarisme des chefs d'établissement.

      • Spécificité de l'enseignement privé : Le secrétaire général de l'enseignement catholique a diffusé un programme spécifique ("Grandir Heureux") qui, selon certains représentants, pourrait s'écarter des recommandations strictes de l'Éducation nationale, bien que l'APEL réaffirme sa vigilance contre tout endoctrinement.

      • Résistances et contestations : Des campagnes "anti-EVARS" et l'entrisme de certaines associations (ex: "Parents Vigilants") lors des élections de parents d'élèves perturbent le climat scolaire.

      --------------------------------------------------------------------------------

      3. Dysfonctionnements et Failles de Contrôle

      Les auditions révèlent une incapacité de l'administration à assurer un suivi effectif des interventions une fois l'agrément donné.

      L'absence de traçabilité des incidents

      Il n'existe actuellement aucun bilan annuel national ou académique, ni aucune statistique chiffrée sur les dysfonctionnements constatés lors des interventions.

      Les informations remontent souvent par la presse ou par les témoignages directs des enfants aux parents, plutôt que par les canaux administratifs officiels.

      Le problème de la présence enseignante

      La réglementation impose la présence d'un enseignant durant toute l'intervention.

      Cependant :

      • Des cas de dérapages ont été signalés lorsque l'enseignant s'absente (même partiellement).

      • La remise en cause d'une intervention est difficile, car elle implique de pointer la responsabilité de l'enseignant ou de l'établissement qui n'a pas respecté le cadre.

      L'effet "nomade" des associations problématiques

      Un constat alarmant est partagé : une association ayant causé un incident dans un établissement peut être exclue de celui-ci, mais continuer à intervenir dans d'autres établissements du même département ou de départements voisins, faute de base de données centralisée sur les signalements.

      --------------------------------------------------------------------------------

      4. Recommandations des Fédérations de Parents

      Pour remédier aux risques d'influence et au manque de transparence, les représentants proposent plusieurs axes d'évolution :

      • Renforcement de la formation des enseignants : La solution privilégiée est de former les enseignants (notamment en SVT, EPS ou professeurs principaux) pour qu'ils assurent eux-mêmes ces programmes.

      Cela garantirait une autorité hiérarchique directe et une responsabilité pédagogique claire.

      • Transparence radicale :

      • Rendre publics et obligatoires les bilans annuels d'interventions.

      • Créer une "banque de données" ou fiche technique sur les associations pour assurer leur traçabilité nationale.

      • Imposer une communication claire aux parents lors des réunions de rentrée sur le planning et l'identité des intervenants extérieurs.

      • Évaluation qualitative : Mettre en place des questionnaires anonymisés pour les élèves après chaque intervention, dont la synthèse serait partagée avec les parents et l'association concernée pour favoriser une démarche d'amélioration.

      • Exigence de formation des intervenants : S'assurer que les membres des associations ne sont pas de simples "experts isolés" mais des personnes formées spécifiquement à la prise de parole devant un jeune public, capable de répondre de manière adaptée et non idéologique.

      --------------------------------------------------------------------------------

      Citations Clés

      "L'école doit rester un lieu sûr, libre et pluraliste.

      Toute intervention extérieure doit être encadrée et transparente, sinon nous risquons de perdre ce qui fait la force de notre République." — Représentant FCPE

      "Nous sommes essentiellement informés et en rien décideurs." — Représentant PEEP

      "Il ne faudrait pas que la manière de faire intervenir des associations [...] n'aboutisse pas à une augmentation de la défiance vis-à-vis de l'institution." — Représentant PEEP

      "Nous savons en France qu'il y a un problème, un tabou qui est l'inceste [...] il nous paraît essentiel que dès la maternelle les jeunes puissent savoir ce que c'est que le respect de son corps." — Représentant APEL

    1. Reviewer #3 (Public review):

      Summary

      This paper analyzes human single-neuron activity recorded with Behnke-Fried electrodes during naturalistic listening and reading. The authors demonstrate a double dissociation between superior temporal gyrus neurons (responsive during listening but not reading) and fusiform gyrus neurons (responsive during reading but not listening), and report that these two classes of neurons show selectivity to specific phonological and orthographic features of the stimulus, respectively. Across the language network, the authors also report neurons whose responses are amodal (active during both listening and reading), which they organize into a modal-to-amodal processing hierarchy. A separate thread of analyses tracks the relationship between single-neuron spiking, micro-wire, and macro-wire signals across these regions. The authors interpret their findings as evidence for hierarchical processing across the language network and for a "compositional code" for orthography in reading.

      Strengths

      The dataset is rare and valuable. Simultaneous single-neuron, micro-wire, and macro-wire recordings during naturalistic reading and listening in the same patients are difficult to obtain, and the experimental design reflects substantial care. The cross-modality comparison at single-neuron resolution is a novel measurement, and the paper presents these results while also situating them against prior neuroimaging and intracranial work. The simultaneous availability of signals at three spatial scales within the human language network is an unusual and potentially important resource for the field.

      Weaknesses

      (1) Framing and novelty

      The paper appropriately situates its modality-selectivity findings against prior neuroimaging and intracranial work (citing Buchweitz et al. 2009 among others) and frames its novel contribution as bringing single-neuron resolution to a question that has previously been examined at population scales. This framing is fair as far as it goes. However, two issues remain. First, the paper does not engage with neuroimaging evidence that complicates its clean modality-selectivity story - most notably Wilson, Bautista, & McCarron (2018), who found that the dorsal superior temporal sulcus is activated by both intelligible and unintelligible inputs in both modalities. Several reconciliations of single-neuron modality selectivity with population-level cross-modal activation are possible (sparse coding, BOLD-vs-spiking dissociations, etc.), and the paper should engage with these possibilities. Second, the paper's discussion extends well beyond the modality-selectivity result that is its headline contribution, into broader claims about a "compositional code" for orthography and "hierarchical processing" across the language network. These broader claims are not supported by the analyses presented (see Weakness 3), and their inclusion distracts from and weakens the core finding rather than building on it. The paper would be stronger if these claims were either subjected to the population-level analyses they require or scaled back to exploratory observations.

      These framing issues are compounded by writing problems that obscure what the paper is claiming. Some passages, such as the assertion that the dataset "suggests an unprecedented examination of linguistic features across various brain regions at various resolutions," are not interpretable as written and should be rewritten.

      (2) Methodological concerns about the TRF analyses

      The selectivity findings in Figures 3 and 5 rest on temporal response function / temporal receptive field (TRF) analyses with several core issues.

      2.1) First, the construction of the TRF feature stream for the reading condition is not specified in the methods. Reading stimuli are presented in RSVP, with all letters of a word appearing simultaneously. How letter or letter-position features are mapped to a time-varying regressor reflects a substantive hypothesis about the psychological mechanisms of reading, with statistical consequences for what the TRF can recover and how reading and listening analyses can be compared.

      2.2) Second, the stimulus distribution limits which effects can be reliably estimated. While the design appears balanced for some features (e.g., subject gender and number), the features that drive the TRF analyses - particularly letter identity and position in the orthographic TRF - are unlikely to be well covered in a small stimulus set. This raises a concern about high-variance feature importance estimates.

      2.3) Third, the TRF feature set includes syntactic, semantic, and discourse predictors alongside phonological and orthographic features. The paper does not justify this choice in fitting single-neuron responses in STG and FSG, and the consequences for the unique-variance analyses are not discussed. Because syntactic features are correlated with phonological and orthographic features in natural stimuli (function words are short, have characteristic phoneme distributions, and so on), the unique variance attributed to each feature set depends on what is being controlled for. Including syntactic predictors when fitting STG or FSG neurons also risks inflating overall TRF fit by chance, particularly in the absence of cross-neuron correction.

      2.4) Fourth, there seems to be no correction for multiple comparisons across the neuron × feature grid. The within-neuron feature-importance procedure briefly described in the Figure 3 caption may help combat overestimates of feature importance within a single fit, but does not address the question of how many of the "selective" neurons reported across the paper would survive correction at the population level. With many neurons, many features, and a limited stimulus set, some neurons will appear selective to some features by chance alone, and these are likely to be the ones that appear as example panels in figures.

      Together, these issues mean the per-feature selectivity results cannot be interpreted as the paper currently interprets them. This is consequential because the per-feature selectivity findings underpin the paper's broader claims about a compositional code for orthography and about hierarchical processing across feature levels.

      (3) Claims that outrun the evidence

      Several of the paper's broader claims are not supported by the analyses presented.

      3.1) The authors claim a "compositional code" for orthography, in which single neurons code for the combination of letter identity and position. This claim is illustrated with two example neurons. A claim about a coding scheme is a population-level claim and requires a population-level analysis. A natural test would be a per-neuron model comparison between a TRF with letter identity alone and a TRF including letter identity × position interactions, controlled for model complexity, asking how many neurons show improved prediction with the interaction features. As noted above in {section sign}2.2, this analysis would also need to grapple with which letters and positions the data can support estimating. There is a potential connection to the data sparsity worries here: the n=2 example neurons may have the only selectivity profiles for which the relevant interactions could be estimated at all.

      3.2) The "hierarchical processing" claim is motivated by neurons selective to features at multiple levels - graphemes and sub-graphemes in reading, single phonemes and diphthongs in listening. This claim is not specified mechanistically. The paper does not state what kind of structural linguistic hierarchy is intended (segmental phonology to syllabic structure?), what kind of hierarchical neurocomputational mechanism is being proposed, or why selectivity at multiple levels of a feature hierarchy is evidence for that mechanism rather than for any other mechanism (e.g., parallel feature detectors). As written, the claim is too underspecified to evaluate.

      3.3) The "forked letters" finding (selectivity to k, v, w, y, z) is potentially confounded with letter frequency and co-occurrence structure. These letters are low-frequency, with some exhibiting strong positional asymmetries, and they infrequently co-occur with other letters. Under the unique-variance analysis, decorrelation from other features inflates apparent unique variance even in the absence of genuine selectivity.

      3.4) The word-length effect in Figure 4 is established by PCA on the top five fusiform neurons, with no analysis showing the effect is qualitatively similar across a broader selection. Beyond establishing that something varies with word length, the paper makes no substantive claim about what the neural code represents - for instance, whether it reflects letter- or word-specific processing or a more general visual response to stimulus extent. Prior intracranial work has reported word-length effects in regions posterior to the VWFA but not within it (Thesen et al. 2012), raising the question of whether the effect reported here reflects letter-specific processing or a more general visual response that happens to correlate with stimulus extent.

      (4) Missed opportunities

      Several aspects of the paper are not so much wrong as underdeveloped, in ways that the authors are well-positioned to address.

      4.1) The cross-scale comparison between single-neuron, micro-wire, and macro-wire signals is presented descriptively, without articulating what conclusion these analyses support about the relationship between scales of measurement. Given the rarity of simultaneous recordings at these scales, this is a substantial missed opportunity. The rasters in Figure 2 visually suggest a tight relationship between spiking and micro-population activity that is not evident in the summary in Figure 2g. This discrepancy is not explained. Characterizing the functional and temporal relationship linking spike rates to micro- and macro-HGA is a substantive scientific question, and the paper is well-positioned to address it.

      4.2) The stimuli include controlled grammatical manipulations, but these manipulations are used as nuisance regressors in the TRF analyses rather than as the object of structured analysis. A design with controlled comparisons is being treated as if it were unconstrained naturalistic stimulation, which underuses the experimental structure the authors built.

      4.3) Finally, the paper foregrounds the dataset as a contribution but does not describe data sharing plans. Given that several of this review's recommendations call for analyses the authors have not yet done, the long-term value of the dataset to the community will depend substantially on what is shared and how.

      ​​Buchweitz, A., Mason, R. A., Tomitch, L. M., & Just, M. A. (2009). Brain activation for reading and listening comprehension: An fMRI study of modality effects and individual differences in language comprehension. Psychology & neuroscience, 2(2), 111-123.

      Jobard, G., Vigneau, M., Mazoyer, B., & Tzourio-Mazoyer, N. (2007). Impact of modality and linguistic complexity during reading and listening tasks. Neuroimage, 34(2), 784-800.<br /> Thesen, T., McDonald, C. R., Carlson, C., Doyle, W., Cash, S., Sherfey, J., Felsovalyi, O., Girard, H., Barr, W., Devinsky, O., Kuzniecky, R., & Halgren, E. (2012). Sequential then interactive processing of letters and words in the left fusiform gyrus. Nature communications, 3, 1284.

      Wilson, S. M., Bautista, A., & McCarron, A. (2018). Convergence of spoken and written language processing in the superior temporal sulcus. Neuroimage, 171, 62-74.

    2. Author response:

      We thank the editors and reviewers for their constructive feedback on our manuscript. We accept the reviewers' recommendations and will implement them fully in our revised manuscript and include all of the suggested literature references. Below, we highlight several key points raised during the evaluation and outline exactly how we will address them. We will also explicitly address every other point and minor recommendation raised by the reviewers in our final, comprehensive point-by-point response.

      Population-level quantification and statistical thresholds: The reviewers noted that our manuscript relied on single-neuron examples without fully demonstrating how widespread these patterns are across the recorded population. To address this, we will add population-level quantification across the recorded units using standard False Discovery Rate (FDR) corrections for multiple comparisons. We will include summary tables in the text and add statistical threshold lines to the distribution figures to report the proportion of significant neurons per region.

      Identifying amodal neurons: Reviewers raised concerns that our classification of amodal language neurons required a more direct test. We will provide additional measures of modality and, in particular, we will implement a cross-modal generalization analysis where our encoding models are trained on one modality (e.g., listening) and evaluated on the other (e.g., reading). This additional procedure will classify neurons as amodal if their cross-modal predictive performance exceeds a baseline null model.

      Isolating linguistic features from sensory confounds: A point was raised regarding whether some neurons were tracking low-level sensory properties (like sound amplitude or visual text size) rather than language features. We will address this by running encoding analyses that include additional basic acoustic envelopes and visual baseline properties as control variables. This will allow us to evaluate the unique variance explained by linguistic features after accounting for these low-level sensory baselines.

      Evaluating the "Compositional Code" in the Fusiform Gyrus: Reviewers pointed out that our claim regarding a "compositional code" (neurons tracking a combination of letter identity and position) was supported primarily by individual examples. To provide population-level context, we will perform a model comparison across our fusiform gyrus neurons. We will compare a baseline letter-only model against a model that includes letter-by-position interactions to report how many neurons statistically support this compositional structure.

      TRF Feature and procedure explanation: Reviewers requested clarification on the construction of our TRF features. We will update the Methods section to explicitly detail how the features were constructed for both modalities. We will also include a feature correlation matrix in the Supplementary Materials. Furthermore, in order to contrast low-level possible confounds and high-level linguistic features, we will also conduct a control analysis tracking, e.g., specific affixes across different structural roles – for example, comparing how neurons respond to the phoneme /-s/ when it functions as a plural number marker versus when it appears as part of a lexical item (e.g., pass) or a third-person verb agreement. We will conduct such analyses in addition to fitting the main TRF models with these additional confounds included, ensuring a clear dissociation between high and low-level features.

  4. May 2026
    1. A code of ethics cannot guarantee ethical behavior. Moreover, a code of ethics cannot resolve all ethical issues or disputes or capture the richness and complexity involved in striving to make responsible choices within a moral community.

      Despite a code of ethics being made and taught to future social workers not every social worker is going to abide by the expectations that is set for this profession. As a result, this makes me worry about power and structural inequality for clients who are minorities or who are in vulnerable situations because social workers are trained professions who are supposed to help people. However, if a social worker cannot place their personal values and beliefs aside when working with clients then in some situations a client may be more at risk at the hands of a social worker. Also, some social workers may take advantage of their clients because of the power they hold when working with vulnerable clients.

    1. Remote versus local is more important than it seems. A locally installed tool is auditable. You can read the code, pin the version, and know it won't change under you.

      行动建议:优先使用本地安装的工具而非远程工具,因为本地工具更可审计。对于必须使用的远程工具(如托管MCP服务器),应将其视为不受信任的组件,首先在隔离环境中使用模拟数据进行测试,以限制恶意工具的影响范围。

    2. A locally installed tool is auditable. You can read the code, pin the version, and know it won't change under you. A remote tool—a hosted MCP server, a cloud connector—can change behavior at any point after you've approved it;

      大多数人认为远程工具比本地安装的工具更安全,因为它们由专业团队维护。但作者指出远程工具实际上可能更危险,因为它们可以在用户批准后随时改变行为,而本地工具则更加可控。这一观点挑战了云原生和远程服务的默认安全假设。

    1. Opus 4.8 is around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked.

      大多数人认为AI模型会自信地输出有缺陷的代码而不自知,但作者认为Opus 4.8显著提高了自我纠错能力。这挑战了人们对AI模型自我评估能力的普遍怀疑,表明AI可能在代码质量方面比人们预期的更加可靠。

    2. Claude Code with Opus 4.8 can now carry out codebase-scale migrations across hundreds of thousands of lines of code from kickoff to merge

      大多数人认为AI模型在处理大规模代码迁移时需要人工干预和审查,但作者认为Opus 4.8能够独立完成数十万行代码的全流程迁移。这挑战了软件开发领域对AI辅助能力的传统认知,暗示AI可能比人们想象的更能胜任复杂的工程任务。

    1. 25% Off Savings!Use code 25MLSEnds 06 : 59 : 44Terms

      The discount message is visible, but there is a lot happening in this area with the promo code, countdown, and sign-up offer. From an accessibility perspective, important information should be easy to read and not feel crowded. In retail, too many messages at once can create confusion instead of urgency.

    1. AI-assisted engineers are burning out, is this fine?
      • The Reality of AI Burnout: AI-assisted software engineering delivers high-speed productivity on paper, but it introduces a hidden cost of cognitive overload, fatigue, and a new form of "AI burnout."
      • The Productivity Trap: AI tools compress highly intense cognitive workflows (prompting, reviewing, and debugging) into shorter periods. Instead of working less, engineers fill saved time with more tasks, replacing rewarding creative work with exhausting oversight.
      • Loss of Craft and Fulfillment: The traditional cycle of planning and writing code is highly satisfying. AI bypasses this tactile process, turning engineers into supervisors of code they didn't write, which dramatically diminishes feelings of ownership, pride, and achievement.
      • Erase of System Intuition: Delegating codebase comprehension to AI agents leads to "cognitive debt." Engineers stop holding the architecture and edge cases in their heads, losing the deep intuition required to spot bugs or design flaws early.
      • Review Bottlenecks: AI dramatically increases code output, but human capacity to review that code remains unchanged. Senior engineers absorb a disproportionate amount of risk and cognitive load trying to clean up thousands of lines of mediocre, AI-generated code.
      • Practical Solutions to Reclaim Balance:
        • Acknowledge Wins: Keep a win-log, track hours, and demo results to restore a sense of personal achievement.
        • Rethink AI Workflows: Focus heavily on the "planning" phase, decompose large tasks, and avoid jumping straight from one AI-heavy task to another.
        • Preserve the Craft: Protect specific hours or passion projects for manual coding without AI intervention.
        • Set Boundaries: Enforce strict work hours, take deliberate breaks to counter continuous cognitive demands, and stop once daily goals are met.
    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1.

      We appreciate the constructive comments, which greatly improved this manuscript.

      Reviewer #2.

      We appreciate Reviewer #2's thorough analysis of our manuscript. However, we are concerned that the reviewer criticized a conclusion different from the one we claim in the manuscript. Although Reviewer #2's public comment stated, "Such an approach is insufficient to unequivocally support the central claim that DNA methylation increases accessibility of H2A.Z-containing nucleosomes", we did not draw such a bold conclusion. In the Abstract, we cautiously described that the impact of DNA methylation we observed was subtle and based on satellite II-derived DNA sequences. We made a nuanced proposal regarding this observation, stating, "Altogether, we propose that SRCAP drives the biased association of H2A.Z to unmethylated DNA, while additional mechanisms, potentially taking advantage of the subtle DNA methylation-induced physical effects, further assist the exclusion of H2A.Z from methylated DNA". We believe our analysis will contribute valuable insights into the mechanistic basis behind the antagonism between DNA methylation and H2A.Z.

      Reviewer #3.

      We appreciate the constructive comments, which greatly improved this manuscript.


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodeling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.

      We would like to thank the editor and reviewers for their critical assessments of our manuscript. While we do acknowledge the limitations of our work, we believe that our results provide important mechanistic insights into the long-standing question of how H2A.Z is preferentially enriched in hypomethylated genomic DNA regions. First, our structural and biochemical data suggest that DNA methylation increases the openness and physical accessibility of H2A.Z, albeit the effect is relatively subtle and is sequence-dependent. Second, using Xenopus egg extracts and synthetic DNA templates, we provide the first clear and direct evidence that DNA methylation-sensitive H2A.Z deposition is due to the H2A.Z chaperone SRCAP-C, corroborated by our discovery that SRCAP-C binding to DNA is suppressed by DNA methylation. Although the molecular details by which DNA methylation inhibits binding of SRCAP-C is an important area of future study, in our current manuscript, we do provide evidence that directly links the presence of SRCAP-C to the establishment of the DNA methylation/H2A.Z antagonism in a physiological system. Thanks to criticisms by the reviewers, we realized that we did not clearly state in our Abstract that the impact of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle, although we did explain these observations and limitations in the main text. In our revised manuscript, we are willing to edit the text to better clarify the criticisms raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.

      Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.

      Strengths:

      The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.

      Weaknesses:

      The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.

      Although we acknowledge the limitations of our study and are willing to expand our discussion to more thoroughly discuss these points, we believe our manuscript provides several important mechanistic insights which this reviewer may not have fully appreciated.

      Our first conclusion that H2A.Z nucleosomes on methylated DNA are more open and accessible compared to their unmethylated counterparts is supported by both our cryo-EM study and the restriction enzyme accessibility assay. Although the physical effect of DNA methylation is relatively subtle and is likely sequence dependent, as we clearly noted within the manuscript, the difference does exist and is valuable information for the chromatin field at large to consider.

      The second major conclusion of our manuscript is that SRCAP-C exhibits preferential binding to unmethylated DNA over methylated DNA, and that SRCAP-C represents the major mechanism that can explain the biased deposition of H2A.Z to unmethylated DNA in Xenopus egg extracts. Furthermore, our experiments using Xenopus egg extract clearly demonstrated that H2A.Z is deposited by both DNAmethylation sensitive and insensitive mechanisms. Depletion of SRCAP-C almost completely eliminated the levels of DNA-methylation-sensitive H2A.Z deposition and reduced the total level of H2A.Z on chromatin to less than half of that seen in non-depleted extract. This result demonstrated that DNA methylation-sensitive H2A.Z loading is primarily regulated by SRCAP-C, at least in our experimental context where transcription, replication, and other epigenetic modifications are not involved. It is likely that additional mechanisms do further contribute, implicated by our sequencing experiments, particularly at regions with active transcription, and we have noted these possibilities and the rationale for their existence in the Discussion.

      Our study also suggests that a SRCAP-independent, DNA methylation-insensitive mechanism of H2A.Z loading exists, which we suspect to be mediated by Tip60-C. In line with this possibility, our data suggest that Tip60-C binds DNA in a DNA methylation-insensitive manner in Xenopus egg extract. Since antibodies to deplete Tip60-C from Xenopus egg extract are currently unavailable, we were unable to directly test that hypothesis and decided not to include Tip60-C into our final model as we lacked experimental evidence for its role. However, whether or not Tip60-C is the complex responsible for the DNA methylation-insensitive pathway does not influence our final conclusion that SRCAP-C plays a major role in DNA methylation-sensitive H2A.Z loading. We are planning to edit our manuscript to more comprehensively discuss these points.

      Please note that while Berta et al reported that DNA methylation increases at H2A.Z loci in tumors defective in SRCAP-C, they selected those regions based off where H2A.Z is typically enriched within normal tissues (Berta et al., 2021). They did not show data indicating whether H2A.Z is still retained specifically at those analyzed loci upon mutation of SRCAP-C subunits. Thus, although we greatly admire their work and are pleased that many of our findings align with theirs, their paper did not directly address whether SRCAP-C itself differentiates between DNA methylation status nor the impact that has on H2A.Z and DNA methylation colocalization. In contrast, our Xenopus egg extract system, where de novo methylation is undetectable (Nishiyama et al., 2013; Wassing et al., 2024) offers a unique opportunity to examine the direct impact of DNA methylation on H2A.Z deposition using controlled synthetic DNA substrates. Corroborated with our demonstration that DNA binding of SRCAP-C is suppressed by DNA methylation, we believe that our manuscript provides a specific mechanism that can explain the preferential deposition of H2A.Z at hypomethylated genomic regions.

      Reviewer #2 (Public review):

      This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATPdependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:

      We appreciate the critical assessment of our manuscript by this reviewer. Although we acknowledge the limitations of our study and will revise the manuscript to better describe them, we would like to respectfully argue against the statement that our "conclusions […] are not supported by the data".

      (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.

      The fact that the DNA resolution in the methylated structure was not high enough to resolve the positions of methylated CpGs despite a high overall resolution of 2.78 Å implies that 1) the Sat2R-P DNA was not as stably registered as the 601L sequence, requiring us to create two alternative Sat2R-P atomic models to account for the variable positioning in our samples, and 2) that the presence of DNA methylation increases that positional variability. We understand that one may prefer to see highly resolved density around each methylation mark, but we do believe that our inability to accomplish that is actually a feature rather than a weakness and has important biological implications. The decrease in local DNA resolution on the methylated Sat2R-P structure compared to its unmethylated counterpart is meaningful and suggests to us that DNA methylation weakens overall DNA wrapping and positioning on the nucleosome, supported by the increased flexibility seen at the linker DNA ends as well as an increase in the population of highly shifted nucleosomes amongst the methylated particles. Additionally, one major view in the DNA methylation/nucleosome stability field is that the presence of DNA methylation can make DNA stiffer and harder to bend, causing opening and destabilization of nucleosomes (Ngo et al., 2016). The increased opening of linker DNA ends and accessibility of methylated H2A.Z nucleosomes in our hands also aligns with such an idea, again suggesting decreased histone-DNA contact stability on methylated DNA substrates. We plan to revise the writing in our manuscript to better reflect these ideas.

      The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      The reviewer raises a fair question about whether canonical H2A would experience the same DNA methylation-dependent structural effects. We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis.

      One of the primary reasons we chose the Sat2R-P sequence was, as noted above, that there already was a published study examining how DNA methylation affects nucleosome structure using a variant of this sequence which we could compare to our results, as the reviewer has suggested. We did have to modify the sequence, namely by making it palindromic, in order to increase the final achievable resolution. We viewed the Sat2R-P sequence as an attractive candidate because it is physiologically relevant; the initial sequence was taken directly from human satellite II. Several modifications were made for technical reasons, including making the sequence palindromic as described above and also ensuring that each CpG is recognizable by a methylation-sensitive restriction enzyme so that we could be certain about the degree of methylation on our substrates. These practical concerns outweighed the necessity of maintaining a strict physiological sequence to us. However, we still believe the final Sat2R-P more closely mimics physiological sequences than Widom 601. Additionally, human satellite II is a highly abundant sequence in the human genome that is known to undergo large methylation changes on the onset of many disorders, like cancer, as well as during aging. Thus, there are interesting biological questions surrounding how the methylation state of this particular sequence affects chromatin structure.

      Furthermore, it has been reported that satellite II is devoid of H2A.Z (Capurso et al., 2012). Beyond those reasons, the satellite II sequence is generally interesting to our lab because we have been studying genes involved in ICF syndrome, where hypomethylation of satellite II sequences forms one of the hallmarks of this disorder (Funabiki et al., 2023; Jenness et al., 2018; Wassing et al., 2024). We understand that sequence context plays a large role in nucleosome wrapping and stability. This is why we strived to test multiple sequences in each of our assays. We do agree that it would be interesting to use DNA sequences where H2A.Z binding has already been described to be affected in a DNA methylation-dependent manner, forming an exciting future study to pursue.

      Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.

      We did not argue that only the global methylation level is relevant. We also would appreciate it if the reviewer could provide specific references that "clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent". We are aware of a series of studies conducted by Chongli Yuan's group, including one testing the effect of placing methylated CpGs at different positions along the Widom 601 sequence. In that study (Jimenez-Useche et al., 2013), they did find that positioning of mCpGs has differential impacts on the salt resistance of the nucleosomes, with 5 tandem mCpG copies at the dyad causing the most dramatic nucleosome opening whereas having mCpGs only at the DNA major grooves, but not elsewhere, increased nucleosome stability. However, they did also find that methylation of the original Widom 601 sequence also caused destabilization, albeit to a lesser degree, and another study by the same group (Jimenez-Useche et al., 2014) also found that CpG methylation decreased nucleosome-forming ability for all tested variants of the Widom 601 sequence, regardless of CpG density or positioning.

      Other studies monitored how distribution of methylated CpGs correlates with nucleosome positioning (Collings et al., 2013; Davey et al., 1997; Davey et al., 2004). However, these studies assessed the sequence-dependent effects specifically on nucleosome assembly during in vitro salt dialysis, which is a different physical process than the one our manuscript focuses on, especially when considering the fact that H2A.Z is deposited onto preassembled H2A-nucleosome. Our cryo-EM analysis examines the structural changes induced by DNA methylation on already formed nucleosomes rather than the process of formation. Thus, probing accessibility changes using a restriction enzyme was the more appropriate biochemical assay to verify our structures.

      We do very much agree that DNA context can influence nucleosome stability under different conditions. A study of molecular dynamics simulations concluded that the "combination of overall DNA geometrical and shape properties upon methylation" makes nucleosomes resistant to unwrapping (Li et al., 2022), while another modeling study suggests that DNA methylation impacts nucleosome stability in a manner dependent on DNA sequence, where "[s]trong binding is weakened and weak binding is strengthened" (Minary and Levitt, 2014). While G/C-dinucleotides are preferentially placed at major groove-inward positions in the nucleosomes in vivo (Chodavarapu et al., 2010; Segal et al., 2006) and G/C-rich segments are excluded from major groove-outward positions in Widom 601-like nucleosomes (Chua et al., 2012), methylated CpG dinucleotides are preferably, if not exclusively, located at major groove-outward positions in vivo. Mechanisms behind this biased mCpG positioning on the nucleosome remain speculative, likely caused by a combination of multiple factors, but the fact that we did not observe clear structural impacts using the Widom 601L sequence, where mCpGs are located at the major groove-outward and -inward positions ((Chua et al., 2012) and our structure), deserves a space for discussion. On the other hand, positioning of mCpG on satellite II-derived sequences that we used in this study was based on a physiological sequence, and thus it may not be appropriate to say that those CpGs are placed at multiple "random" positions. Although we decided not to discuss the position of 5mC on our Sat2R nucleosome structure due to ambiguous base assignments, neither of our two atomic models is consistent with an idea that DNA methylation repositions the CpG to the outward major grooves. As the potential contribution of how DNA methylation affects the nucleosome structure via modulating DNA stiffness has been extensively studied (Choy et al., 2010; Li et al., 2022; Ngo et al., 2016; Perez et al., 2012), we believe that it is appropriate to consider overall DNA properties along the whole DNA sequence, though we are willing to discuss potential positional effects in the revised manuscript.

      Perhaps one of the most important points that we did not emphasize enough in our original manuscript was that in contrast to the subtle intrinsic effect of DNA methylation that was DNA sequence dependent, we observed SRCAP-dependent preferential H2A.Z deposition to unmethylated DNA over methylated DNA in both 601 and satellite II DNAs. In the revised manuscript, we will make the value of comparative studies on 601 and satellite II in two distinct mechanisms.

      Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract the relatively subtle effect of DNA methylation on the intrinsic H2A.Z nucleosome stability. Therefore, we will accordingly revise the Abstract to make this point clearer.

      (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.

      We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript. We believe this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.

      (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).

      (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.

      While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylation-insensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.

      (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.

      As depicted in Figure 6 and described in the Discussion, we clearly indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system.

      As noted in our response to (2), the lack of a clear impact on our 601L structures implies that this is due to the extraordinarily strong artificial nucleosome positioning capacity of the 601 sequence and its variants. Since 601 is heavily used in chromatin biology, including within DNA methylation research, such negative data are still useful to include and publish.

      (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      We are willing to address this concern. However, please note that our data showed that methylation-dependent H2A.Z deposition is almost completely erased upon SRCAP depletion, indicating functionally effective depletion. The specificity of the custom antibody against Xenopus SRCAP was verified by mass spectrometry. Additionally, we have obtained the same effect using another commercially available SRCAP antibody, though we did not include this preliminary result in our original manuscript. Due to its relatively low abundance and high molecular weight, SRCAP western blot signals are weak, making it challenging to quantify the degree of depletion. We also believe that the value of quantification in this context, with the points noted above, is rather limited. In the past, our lab has published papers on depleting the H3T3 kinase Haspin from Xenopus egg extracts (Ghenoiu et al., 2013; Kelly et al., 2010) but were never able to detect Haspin via western blot. This protein was only detected by mass spectrometry specifically on nucleosome array beads with H3K9me3 (Jenness et al., 2018). However, depletion of Haspin was readily monitored by erasure of H3T3ph, the enzymatic product of Haspin. In these experiments, it was impossible, and not critical, to quantitatively monitor the depletion of Haspin protein in order to investigate its molecular functions. Similarly, in this current study, the important fact is that depletion of SRCAP suppressed methylation-sensitive H2A.Z deposition and quantifying the degree of SRCAP depletion would not have a major impact on this conclusion.

      (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      We are aware that the Tip60 complex is a very likely candidate for mediating DNA methylation insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive. Therefore, the reviewer's statement that we "completely overlooked" Tip60-C’s role does not fairly report on our efforts. We wished to test the potential contribution of Tip60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role Tip60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating Tip60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study.

      (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-02400759-1).

      We understand that the H2A.Z stability field is highly controversial. We have introduced the many conflicting reports that have been published in the field but can further expand on the controversies if desired. We also understand that the term “nucleosome stability” is broad and encompasses many physical aspects. As noted in a prior response, we will better specify our use of the term within the manuscript. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. We reported on our findings of the general H2A.Z stability with the hopes to help clarify some of the field’s controversies.

      In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.

      We appreciate this reviewer’s advice. However, please note that the first author who led this project has already successfully defended their PhD thesis primarily based on this project, making it impractical and unrealistic to completely change the focus of this manuscript to include an entirely new avenue of research. We believe that our data provide important insights into the mechanisms by which H2A.Z is excluded from methylated DNA, particularly via the DNA methylation-sensitive binding of SRCAP-C, which has never been described before. We agree that many questions are still left unanswered, including the exact molecular mechanism behind how DNA methylation prevents SRCAP-C binding. We have preliminary data that suggest none of the known DNA-binding modules of SRCAP-C, including ZNHIT1, by themselves can explain this sensitivity. This implies that domain dissection in the context of the holo-SRCAP complex is required to fully address this question. We believe this represents a very exciting future avenue of study; however, it does not negate our finding that SRCAP-C itself is important for maintaining the DNA methylation/H2A.Z antagonism. Therefore, we respectfully disagree with this reviewer's summary statement, which misleadingly undermines the impact of our work.

      Reviewer #3 (Public review):

      Summary:

      Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.

      Strengths:

      The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.

      We are grateful that this reviewer recognizes the importance of our study.

      Weaknesses:

      The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.

      (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.

      The altered H4 tail orientation and extra density seen on the acidic patch were incidental findings that we thought could be interesting for the field to be aware of but decided not to follow up on as there were other structural differences that were more directly related to our central question. We do believe that the above two differences are linked to each other because we used a highly purified and homogenous sample for cryo-EM analysis and the H4 tail/acidic patch interaction is a well characterized contact that mediates inter-nucleosome interactions. Additionally, other groups have reported that the presence of DNA methylation causes condensation of both chromatin and bare DNA (cited within our manuscript), though the mechanics behind this phenomenon remain to be elucidated. We believed that our structure data may also align with those findings. However, the reviewer is fair in pointing out that we do not provide further experimental evidence in verifying the existence of these increased interactions. We can revise our writing to clarify that these points are currently hypotheses rather than validated results.

      (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.

      We would like to thank the reviewer for bringing up these issues. Although our current data cannot explicitly clarify these possibilities, we favor an idea that DNA methylation specifically alters histone to DNA contacts and that this effect is felt globally across the entire nucleosome rather than only at specific locations. The intrinsic flexibility of linker DNA ends means that that region tends to exhibit the greatest differences under different physical influences, hence the focus on characterizing that area; flexibility of a thread on a spool is most pronounced at the ends. However, we also found that the DNA backbone of H2A.Z on methylated DNA had a lower local resolution compared to its unmethylated counterpart, despite that structure having a higher global resolution, which suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation. This is corroborated by the increased population of open/shifted structures in our classification analysis. The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.

      References

      Berta, D.G., H. Kuisma, N. Valimaki, M. Raisanen, M. Jantti, A. Pasanen, A. Karhu, J. Kaukomaa, A. Taira, T. Cajuso, S. Nieminen, R.M. Penttinen, S. Ahonen, R. Lehtonen, M. Mehine, P. Vahteristo, J. Jalkanen, B. Sahu, J. Ravantti, N. Makinen, K. Rajamaki, K. Palin, J. Taipale, O. Heikinheimo, R. Butzow, E. Kaasinen, and L.A. Aaltonen. 2021. Deficient H2A.Z deposition is associated with genesis of uterine leiomyoma. Nature. 596:398–403.

      Capurso, D., H. Xiong, and M.R. Segal. 2012. A histone arginine methylation localizes to nucleosomes in satellite II and III DNA sequences in the human genome. BMC Genomics. 13:630.

      Chodavarapu, R.K., S. Feng, Y.V. Bernatavichute, P.Y. Chen, H. Stroud, Y. Yu, J.A. Hetzel, F. Kuo, J. Kim, S.J. Cokus, D. Casero, M. Bernal, P. Huijser, A.T. Clark, U.

      Kramer, S.S. Merchant, X. Zhang, S.E. Jacobsen, and M. Pellegrini. 2010. Relationship between nucleosome positioning and DNA methylation. Nature. 466:388–392.

      Choy, J.S., S. Wei, J.Y. Lee, S. Tan, S. Chu, and T.H. Lee. 2010. DNA methylation increases nucleosome compaction and rigidity. J Am Chem Soc. 132:1782–1783.

      Chua, E.Y., D. Vasudevan, G.E. Davey, B. Wu, and C.A. Davey. 2012. The mechanics behind DNA sequence-dependent properties of the nucleosome. Nucleic Acids Res. 40:6338–6352.

      Collings, C.K., P.J. Waddell, and J.N. Anderson. 2013. Effects of DNA methylation on nucleosome stability. Nucleic Acids Res. 41:2918–2931.

      Davey, C., S. Pennings, and J. Allan. 1997. CpG methylation remodels chromatin structure in vitro. J Mol Biol. 267:276–288.

      Davey, C.S., S. Pennings, C. Reilly, R.R. Meehan, and J. Allan. 2004. A determining influence for CpG dinucleotides on nucleosome positioning in vitro. Nucleic Acids Res. 32:4322–4331.

      Funabiki, H., I.E. Wassing, Q. Jia, J.D. Luo, and T. Carroll. 2023. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. Elife. 12.

      Ghenoiu, C., M.S. Wheelock, and H. Funabiki. 2013. Autoinhibition and polo-dependent multisite phosphorylation restrict activity of the histone h3 kinase haspin to mitosis. Mol Cell. 52:734–745.

      Jenness, C., S. Giunta, M.M. Muller, H. Kimura, T.W. Muir, and H. Funabiki. 2018. HELLS and CDCA7 comprise a bipartite nucleosome remodeling complex defective in ICF syndrome. Proc Natl Acad Sci U S A. 115:E876–E885.

      Jimenez-Useche, I., J. Ke, Y. Tian, D. Shim, S.C. Howell, X. Qiu, and C. Yuan. 2013. DNA methylation regulated nucleosome dynamics. Sci Rep. 3:2121.

      Jimenez-Useche, I., D. Shim, J. Yu, and C. Yuan. 2014. Unmethylated and methylated CpG dinucleotides distinctively regulate the physical properties of DNA. Biopolymers. 101:517–524.

      Kelly, A.E., C. Ghenoiu, J.Z. Xue, C. Zierhut, H. Kimura, and H. Funabiki. 2010. Survivin reads phosphorylated histone H3 threonine 3 to activate the mitotic kinase Aurora B. Science. 330:235– 239.

      Li, S., Y. Peng, D. Landsman, and A.R. Panchenko. 2022. DNA methylation cues in nucleosome geometry, stability and unwrapping. Nucleic Acids Res. 50:1864–1874.

      Minary, P., and M. Levitt. 2014. Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A. 111:6293–6298.

      Ngo, T.T., J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha. 2016. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun. 7:10813.

      Nishiyama, A., L. Yamaguchi, J. Sharif, Y. Johmura, T. Kawamura, K. Nakanishi, S. Shimamura, K. Arita, T. Kodama, F. Ishikawa, H. Koseki, and M. Nakanishi. 2013. Uhrf1-dependent H3K23 ubiquitylation couples maintenance DNA methylation and replication. Nature. 502:249–253.

      Osakabe, A., F. Adachi, Y. Arimura, K. Maehara, Y. Ohkawa, and H. Kurumizaka. 2015. Influence of DNA methylation on positioning and DNA flexibility of nucleosomes with pericentric satellite DNA. Open Biol. 5.

      Perez, A., C.L. Castellazzi, F. Battistini, K. Collinet, O. Flores, O. Deniz, M.L. Ruiz, D. Torrents, R. Eritja, M. Soler-Lopez, and M. Orozco. 2012. Impact of methylation on the physical properties of DNA. Biophys J. 102:2140–2148.

      Segal, E., Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I.K. Moore, J.P. Wang, and J. Widom. 2006. A genomic code for nucleosome positioning. Nature. 442:772–778.

      Wassing, I.E., A. Nishiyama, R. Shikimachi, Q. Jia, A. Kikuchi, M. Hiruta, K. Sugimura, X. Hong, Y. Chiba, J. Peng, C. Jenness, M. Nakanishi, L. Zhao, K. Arita, and H. Funabiki. 2024. CDCA7 is an evolutionarily conserved hemimethylated DNA sensor in eukaryotes. Sci Adv. 10:eadp5753.

      Zilberman, D., D. Coleman-Derr, T. Ballinger, and S. Henikoff. 2008. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature. 456:125–129.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors designed two sets of experiments to explore the molecular mechanisms underlying the mutually exclusive distribution of H2A.Z and DNA methylation previously reported by several groups.

      First, they examined how DNA methylation affects the physical stability of H2A.Z-containing nucleosomes. Although their results point to subtle differences between nucleosomes assembled on methylated versus unmethylated DNA, the authors did not extend their analyses to directly test the stability of these H2A.Z-containing nucleosomes under more challenging conditions. Prior studies have demonstrated that certain nucleosomes, such as those containing H3.3-H2A.Z or H2A.Z-H3K56Q, exhibit specific instability, but such instability is only revealed under challenging conditions, for example, altered salt concentrations or the presence of additional factors like FACT (PMID: 17575053; PMID: 19633671; PMID: 19639024; PMID: 41303375). In light of this literature, the observable structural features noted here for nucleosomes containing H2A.Z and methylated DNA are suggestive of increased instability, yet the authors did not employ comparable approaches to rigorously test whether such instability might explain the absence of H2A.Z from methylated genomic regions.

      As a result, at this stage of analysis, the idea that nucleosomes containing both H2A.Z and methylated DNA are intrinsically unstable, and that this instability accounts for the depletion of H2A.Z from methylated regions, remains unsubstantiated.

      We thank the reviewer's constructive criticisms. Through our response to these points, we were able to significantly improve our manuscript, including major rewriting of the Abstract and Discussion as well as incorporation of new data.

      We agree that combinations with other histone variants, modifications, and mutations could further affect our observed impact of DNA methylation on H2A.Z-nucleosome stability. What we observed based on satellite II-derived DNA was that DNA methylation made H2A.Znucleosomes (with H3.2) more open, although the effect of DNA methylation is relatively small (as compared to the general impact of H2A.Z incorporation). We readily admit that such a subtle physical effect is unlikely to be the main driver of the antagonistic distribution of H2A.Z and DNA methylation, though small physical changes have been known to influence larger biological functions, and sought to describe additional regulatory factors that could play major roles.

      We also agree that H3.3 is of major interest when discussing H2A.Z. In our Xenopus egg extract experiments using DNA beads, the primary H3 variant deposited is H3.3 as no DNA replication occurs on the beads to allow for H3.1/.2 replication-coupled deposition. From those experiments, we demonstrated that preferential loading of H2A.Z can be primarily explained by SRCAP. In other words, in the absence of SRCAP, loading/retention of H2A.Z on H3.3nucleosomes was not noticeably affected by DNA methylation, indicating that DNA methylation’s physical effects on H2A.Z nucleosomes plays little, if any, role in the preferential accumulation of H2A.Z on unmethylated DNA at least in the context of synthetic DNA beads incubated in

      Xenopus egg extract lacking active transcription. Our sequencing data hints at the interesting possibility that transcription, along with other factors missing in egg extract, may be involved in further pruning H2A.Z from methylated DNA which conceivably could take advantage of subtle physical alterations. However, we agree we lack firm supporting evidence for such a mechanism which led us to forgo including that in our final model figure and we instead only report on our observations with discussions on potential biological implications and limitations. Of note, it has been reported that the H2A.Z nucleosome is more accessible than the H2A nucleosome, while inclusion of H3.3 does not further enhance accessibility of the H2A.Z nucleosome (PMID 38920622). We have now noted these points in the Discussion of our revised manuscript.

      We appreciate and agree with this reviewer’s point that nucleosome instability sometimes requires challenging conditions to be fully revealed. However, in our system, use of H2A.Z was the challenge provided as we find in our hands that H2A.Z by itself substantially destabilizes histone-DNA contacts compared to canonical H2A. And it is only with this already destabilized nucleosome that we see further enhancement of accessibility/openness in the presence of DNA methylation. This is similar to findings by [PMID: 23260052] that reported that only an intrinsically destabilized sub-population of canonical H2A nucleosomes on 601 DNA experienced detectable physical changes in the presence of DNA methylation.

      In response to this reviewer's comment, we edited the Abstract and Discussion to clearly note the subtly of the impact of DNA methylation on H2A.Z nucleosome structure, and that the potential functional significance remains an open question.

      Second, the authors investigated whether SRCAP-C contributes to preferential H2A.Z incorporation into unmethylated DNA. The absence of H2A.Z from methylated regions does not necessarily imply that it cannot be incorporated there; it may instead reflect the chromatin environment associated with DNA methylation, which could disfavor SRCAP-C activity, whereas open chromatin environments strongly promote SRCAP-dependent H2A.Z deposition.

      This reviewer suggested an alternative model where SRCAP prefers to act on open chromatin and that the apparent preferential H2A.Z deposition to unmethylated DNA is due solely to the increased accessibility associated with unmethylated DNA. Following such a model, one would predict that SRCAP-C's preference to unmethylated DNA would be eliminated on nucleosome-free DNA in Xenopus egg extracts. To test this alternative model, we repeated the SRCAP-C binding experiment in egg extracts depleted of the HIRA complex, the H3.3-H4 chaperone responsible for de novo nucleosome assembly on exogenously added DNA in egg extracts. Contrary to this prediction, both SRCAP and ZNHIT1 still display preferential binding to unmethylated DNA substrates in HIRA-depleted extracts in which nucleosome assembly is suppressed (newly added Suppl Fig 16). The results argue that discrimination of SRCAP-C from methylated DNA is not due to a potential effect of chromatin compaction by DNA methylation. Furthermore, our new result is in line with an idea that SRCAP employs 1D diffusion on the linker DNA before engaging the H2A nucleosome (PMID 39131301), implying that discrimination of SRCAP-C from methylated linker DNA contributes to this process. This is now illustrated in the new model Figure 6.

      Please note we also indicate in both our model and in text that there exists an additional methylation-insensitive mechanism that drives H2A.Z deposition on methylated DNA, leading to a substantial amount of colocalized H2A.Z and DNA methylation. Why two different deposition pathways for H2A.Z differing in their methylation sensitivities must exist is an interesting topic for future work and has not been described prior to our report.

      This interpretation is consistent with the authors' own comparative mapping of H2A.Z and DNA methylation in sperm pronuclei incubated in egg extract versus a transcriptionally active Xenopus fibroblast line. They observed that about 40% of H2A.Z-associated genomic DNA is methylated in sperm pronuclei, but only 3% in fibroblasts. As they note, the major difference between these systems is the presence of transcription in fibroblasts, a process known to drive H2A.Z eviction/recycling, and which is absent in the egg-extract system. Thus, no specific inhibition of SRCAP-C by methylated DNA needs to be invoked: H2A.Z deposition on both methylated and unmethylated accessible regions, followed by preferential eviction from methylated sites in active nuclei, could fully account for the observed patterns.

      As the reviewer correctly notes here, we proposed that transcription is likely to play an important role in pruning H2A.Z from methylated DNA. Our observations and proposed mechanism do not argue against the possible existence of a DNA methylation-insensitive, transcription-dependent mechanism that promotes dissociation of H2A.Z from methylated DNA, which we believe likely would be correlated to gene body methylation. In fact, we did propose in our Discussion that such a transcription-mediated mechanism may conceivably take advantage of the subtly destabilized DNA wrapping of H2A.Z nucleosomes on methylated DNA to further selectively prune H2A.Z at colocalized regions. However, such a mechanism would be an additional component to what we have already described and does not explain the observed preferential recruitment of SRCAP-C to unmethylated DNA in Xenopus egg extracts in the absence of active transcription.

      In this respect, studies from the Felsenfeld laboratory showing that double-variant nucleosomes are highly unstable under physiological ionic conditions are particularly relevant (PMID: 19633671; PMID: 19639024). They demonstrated that such unstable nucleosomes are only evident under low ionic strength extraction conditions, emphasizing that the apparent absence of H2A.Z may reflect facilitated removal rather than failure of assembly.

      The authors may also have been influenced by the study of Berta et al. (cited in the manuscript), which examined uterine leiomyomas harboring somatic or germline mutations in SRCAP-C subunits. In those tumors, the normal association of H2A.Z with accessible, active chromatin, and its exclusion from methylated regions, was lost. However, this observation does not demonstrate that SRCAP-C actively prevents H2A.Z incorporation into methylated DNA. Instead, it may simply reflect that in the absence of SRCAP-C, a default, less efficient deposition pathway operates regardless of whether the chromatin environment is normally permissive or restrictive for SRCAP-dependent activity.

      Even if one accepts the more straightforward interpretation proposed by the present authors, that SRCAP-C is actively inhibited by methylated DNA, as suggested by their pull-down experiments from Xenopus egg extracts using unmethylated and methylated DNA, the hypothesis lacks mechanistic support.

      Considering this reviewers' criticism, we have expanded our discussion to indicate a possibility that SRCAP-C may have an alternative mechanism to find open chromatin independent of DNA methylation status. However, our data show that SRCAP-C preferentially binds to unmethylated DNA in a manner independent of transcription or other epigenetic status in Xenopus egg extracts, and that SRCAP-C carries the major mechanism that explains preferential deposition of H2A.Z to unmethylated DNA. Therefore, we believe that our study for the first time offers a mechanistic explanation of how H2A.Z discrimination from methylated DNA is accomplished through SRCAP-dependent H2A.Z deposition.

      The following points summarize the issues discussed above:

      (1) The authors did not sufficiently test the hypothesis that H2A.Z-methylated DNA nucleosomes are inherently unstable and could explain the exclusion of H2A.Z from methylated genomic regions.

      We stand by our conclusion that DNA methylation has an intrinsic capacity to make the H2A.Z nucleosome more open and accessible, even though the effect is subtle. We did not argue that this subtle effect can fully explain the exclusion of H2A.Z from methylated genomic regions. Rather, our Xenopus egg extract experiment suggested that in the transcriptionally inactive egg extract setting, such a mechanism plays little or no role and it is SRCAP-C instead that is the major driver. Whether this physical mechanism also contributes to their exclusion in cells with active transcription remains a future subject of study.

      (2) The proposed active role of SRCAP-C in preventing H2A.Z assembly on methylated DNA is supported only by limited experimental data and lacks a mechanistic explanation. In particular, this hypothesis does not account for the significant H2A.Z assembly observed on methylated DNA regions in sperm nuclei after incubation in egg extract.

      We respectfully disagree with this summary assessment. Our conclusions are well aligned with the substantial H2A.Z association with methylated DNA in sperm pronuclei assembled in Xenopus egg extracts seen. We demonstrated that:

      (1) In transcriptionally-silent Xenopus egg extracts using synthetic DNA beads, DNAbinding of SRCAP-C is inhibited by DNA methylation.

      (2) In this set up, H2A.Z is preferentially, if not exclusively, loaded to unmethylated DNA over methylated DNA.

      (3) Depletion of SRCAP-C almost completely eliminated preferential association of H2A.Z to unmethylated DNA, while leaving some DNA methylation-insensitive H2A.Z loading.

      (4) These data indicate the presence of a SRCAP-C-dependent, DNA methylationsensitive mechanism as well as a SRCAP-C-independent, DNA-methylation-insensitive mechanism to load H2A.Z to chromatin. This conclusion matches well with our genomic analysis showing that H2A.Z is preferentially but not exclusively loaded to hypomethylated genomic segments to sperm pronuclei in Xenopus egg extracts.

      (5) As we clearly discussed, this SRCAP-C-dependent mechanism by itself is insufficient to explain the much clearer exclusion of H2A.Z in somatic cells. We discussed the possibility that transcription contributes to further pruning of H2A.Z from methylated DNA.

      To deliver this overall message with nuances that we noted above, we have heavily revised the Abstract, the model Figure 6, and Discussion. Thanks to the criticisms raised by this reviewer, we believe that our revised manuscript has been significantly improved.

      Reviewer #2 (Recommendations for the authors):

      (1) A major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis, considering the cost and effort for this additional cryo-EM analysis.

      (2) The reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract that the effect of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle. We will accordingly revise the Abstract, the model Figure 6, and Discussion to make this point clearer.

      (3) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value and should be removed.

      We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript, however, we believe that this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.

      (4) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).

      (5) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (6) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.ZDNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      We appreciate recognition of the importance of our finding by this reviewer. We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.

      While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylationinsensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.

      (7) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript. The authors need to discuss this in more detail.

      As depicted in Figure 6 and described in the Discussion, we indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system. In the revised manuscript, we heavily edited the Discussion to better clarify these points.

      (8) The SRCAP depletion is insufficiently validated, i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      In response to this, quantification of the SRCAP depletion is now included as Supplementary Figure 13A and B. Since our anti-ZNHIT1 antibodies reproducibly detected ZNHIT1 on DNA beads isolated from egg extracts, we have conducted additional verification of the SRCAP depletion by probing for SRCAP and ZNHIT1 on DNA beads, confirming that these proteins were depleted on DNA beads upon immunodepletion with anti-SRCAP antibodies (Author response image 1). To further validate this conclusion, we added data showing that the effect of SRCAP depletion on methylation-sensitive H2A.Z deposition was reproduced through use of a different commercially available antibody raised against human SRCAP (newly added Suppl Fig 14).

      Author response image 1.

      Verification of SRCAP depletion using DNA beads. DNA beads were incubated in interphase-cycled Xenopus egg extract that had been depleted with either our custom SRCAP antibody or an IgG negative control. SRCAP and ZNHIT1 association was then assessed via Western Blot.

      (9) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      Thank you very much for raising this interesting point. We were aware that the TIP60 complex is a very likely candidate for mediating DNA methylation-insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive (shown in the revised Supplementary Figure 15). We wished to test the potential contribution of TIP60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role TIP60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating TIP60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study. However, we have now added descriptions to note that TIP60-C is a likely candidate to execute the SRCAPindependent and methylation-insensitive mechanism of H2A.Z loading in Xenopus egg extracts. In the model figure, we initially did not include Tip60-C, but we now infer TIP60-C is a likely candidate in the revised model (Figure 6) to facilitate the future research in the field.

      (10) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1). These references should be considered.

      We appreciate that the reviewer points out this important issue. Although we had described that controversy exists regarding how H2A.Z and DNA methylation contributes to nucleosome stability, it was not clearly explained. We understand that this confusion was in part due to the term “nucleosome stability”, which is broad and encompasses many physical aspects. As noted in a prior response, we now better specify our use of the term within the manuscript, emphasizing the nucleosome openness and accessibility, particularly at the nucleosome core particle entry/exit sites. As noted by published studies (PMID 38920622), the impact on nucleosome stability may differ between the internal and external segments of nucleosomal DNA. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible at DNA ends compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. This may be caused by usage of different assays (for example, nucleosome assembly during salt dialysis or salt sensitivity vs openness/accessibility of preassembled nucleosome). In the Discussion of the revised manuscript, we now explain these factors, with the hope that our study will help clarify some of the field’s controversies.

      Reviewer #3 (Recommendations for the authors):

      (1) Since the cryo-EM structure determined by single-particle analysis represents only one major population, it would be important to determine the dyad axis position by complementary biochemical assays, such as MNase-seq or chemical digestion by the Fenton reaction (PMID: 22929776).

      We would like to thank the reviewer for bringing up this important issue. We agree that the high-resolution structure represents only a subpopulation in which we specifically selected for the most stably wrapped nucleosomes in each sample. This issue is why we then supplemented our high-resolution structure with our in-silico classification analysis to survey the overall structure distribution of the full nucleosome particle population. The classification input contains all nucleosome-like particles picked from both unmethylated and methylated sample micrographs mixed together, ensuring that all particles are taken into consideration and that both samples have been analyzed in an identical manner. From our sorting analysis, we find an increased population of open and shifted nucleosome structures present in our methylated DNA sample, indicating destabilization of DNA-histone wrapping with DNA methylation. This is corroborated by the lower local resolution seen on the DNA backbone of our high-resolution H2A.Z on methylated DNA structure, despite it having a higher global resolution compared to its unmethylated counterpart. This suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation.

      The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We realized that we did not explain how we decided to place the HinfI site in the context of our solved cryo-EM structure. In the revised Figure 3B, we now illustrate that the HinfI site is located at a segment where H2A/H2A.Z directly contacts the DNA and explained that this segment belongs to the region that exhibited clear methylation-induced flexibility in our cryo-EM structures. Thus, our structure helped us design this experiment.

      We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes, as subtle technical errors in the MNase concentration can have significant effects. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.

      (2) I assume that the authors confirmed complete DNA methylation by restricted enzyme digestion. It would be helpful to include this validation in supplementary figures.

      We would like to thank the reviewer for pointing out that this critical verification was missing from our initial manuscript. DNA methylation of Sat2R-P and Sat2R was verified via BstBI digestion (Suppl Fig 1B and 7D, respectively); 601L verified with HpaII digestion (Suppl Fig 6B); and 19x601 DNA verified via BstUI digestion (Suppl Fig 11A). All data has been added to the specified figures. Unfortunately, the 16xHSat2 DNA substrate we used in our assays does not contain appropriate cut-sites for methylation-sensitive restriction enzymes. Due to that, we always prepared the 16xHSat2 DNA in parallel with the 19x601 substrate under identical conditions then use digestion of the 19x601 substrate to verify quality of methylation for each batch. To more directly verify methylation of 16xHSat2 DNA, we used Xenopus laevis ZHX2 and ZHX3, which we recently identified as proteins that selectively associate with methylated DNA in Xenopus egg extracts. Although identification and characterization of Xenopus ZHX2/3 will be described elsewhere, previous published proteomic studies have also identified mammalian ZHXs as proteins that enrich on methylated DNA (PMID 21029866, 23434322). By incubating DNA beads in Xenopus egg extract and probing for endogenous ZHX2/3 (our antibody recognizes both ZHX2 and ZHX3), we verified that ZHXs selectively binds to methylated 16xHSat2 but not unmethylated DNA (Author response image 2). Although this does not necessarily verify that all CpGs in 16xHSat2 were methylated, we observed comparable methylation-induced inhibition of SRCAP binding between 16x601 and 16HSat2, supporting our conclusion.

      Author response image 2.

      Verification of 16xHSat2 methylation status via ZHX2/3 protein binding. 16xHSat2 DNA beads were incubated in Xenopus egg extract and endogenous ZHX2/3 protein binding assessed via Western Blot with a custom generated antibody that recognizes both ZHX2 and ZHX3.

      (3) Figure 1A: The dyad position is difficult to identify. Please indicate it clearly using a distinct color (not green).

      We now directly indicate each sequence midpoint with a black triangle and also changed the font of DNA sequences to further clarify that the dyad resides at the palindromic center.

    1. Rapport de Synthèse : Crise de la Protection de l'Enfance et Scandale du Périscolaire

      Résumé Exécutif

      Ce document analyse les défaillances systémiques de la protection de l'enfance en France, mises en lumière par le scandale des violences sexuelles dans le milieu périscolaire parisien.

      Le procès de « David G. », animateur accusé d'agressions sur des enfants de 3 à 5 ans, sert de catalyseur à une dénonciation plus large : celle d'une société et d'institutions qui peinent à entendre la parole des mineurs.

      Le constat est sans appel : entre opacité administrative, manque de formation des personnels et réquisitions judiciaires jugées dérisoires, l'école de la République échoue dans sa mission de sanctuaire.

      Le rapport souligne l'urgence d'une transition vers une culture du signalement systématique, d'un investissement massif dans la formation et d'une remise en question de « l'adultisme », ce système de domination qui dévalue la parole de l'enfant face à celle de l'adulte.

      --------------------------------------------------------------------------------

      1. Le Scandale du Périscolaire : Un État des Lieux Alarmant

      L'affaire David G. et l'ampleur du phénomène

      Le procès de David G., animateur de 36 ans, révèle une situation d'une ampleur inédite.

      Les faits reprochés concernent des agressions sexuelles sur de très jeunes enfants (3 à 5 ans) lors de moments de soins ou de jeux.

      • Chiffres clés : Une centaine d'écoles maternelles parisiennes sont concernées par des plaintes ou des signalements.

      Le phénomène touche également d'autres villes de France : Marseille, Rouen, Montpellier, Nantes.

      • Victimes : On dénombre des dizaines de plaintes.

      Dans l'affaire spécifique « Alphonse Baudin », 21 auditions ont été menées.

      • Symptômes identifiés par les parents : Énurésie, troubles alimentaires, colères soudaines, maux de ventre.

      Une chaîne de défaillances institutionnelles

      L'enquête médiatique et les témoignages des parties civiles pointent des failles majeures dans la gestion du personnel par la Mairie de Paris (CASP, DASCO) :

      • Alertes ignorées : Un signalement concernant David G. avait été effectué dès septembre 2024 par une mère.

      Au lieu d'une suspension et d'une enquête administrative, l'animateur a simplement été déplacé d'une petite à une moyenne section.

      • Opacité administrative : Les parents dénoncent des « éléments de langage creux » de la part des autorités et une difficulté chronique à obtenir des informations après la révélation des faits.

      • Précarité de l'encadrement : Le secteur repose sur des vacataires dont la formation est jugée dérisoire (parfois seulement deux jours et demi de formation initiale).

      --------------------------------------------------------------------------------

      2. Analyse Judiciaire : Le Sentiment d'une Impunité

      Le procès public de mai 2026 a suscité l'indignation des familles et de leurs conseils, principalement en raison de la nature des réquisitions.

      | Élément du Procès | Détails et Critiques | | --- | --- | | Réquisitions du Parquet | 3 ans d'emprisonnement, dont 1 an sous bracelet électronique et 2 ans de sursis. | | Peine encourue | Jusqu'à 10 ans d'emprisonnement. | | Position de la défense | Déni total des faits, l'accusé évoquant des « gestes maladroits » et un manque de formation. | | Critique des parties civiles | Réquisitions jugées « dérisoires » et « laxistes », envoyant un message de non-protection aux familles et de permissivité aux agresseurs. |

      --------------------------------------------------------------------------------

      3. Dimensions Systémiques : Adultisme et Domination

      L'analyse des experts présents souligne que le problème n'est pas uniquement individuel mais anthropologique et social.

      Le concept d'adultisme

      Le document met en avant le terme « adultisme » (ou « enfantisme »), défini comme un mécanisme de domination des adultes sur les enfants :

      • Dévaluation de la parole : La parole de l'enfant est systématiquement suspectée de fabulation ou de manipulation.

      Elle est considérée comme ayant « moins de valeur » que celle d'un adulte.

      • Réification de l'enfant : L'enfant est souvent perçu comme la propriété de ses parents ou comme un « sous-adulte » sans autonomie politique ou juridique réelle.

      • Injonction à l'obéissance : L'éducation verticale force les enfants à obéir aux adultes sans discuter, ce qui les rend vulnérables face à des prédateurs qui utilisent cette autorité.

      La déconstruction du mythe du « monstre »

      Le docteur Antoine Pellissolo souligne que les agresseurs ne sont pas des « monstres » marginaux mais des individus souvent bien intégrés (enseignants, pères de famille, animateurs).

      Cette « normalité » favorise le déni collectif et empêche la mise en place de politiques de prévention efficaces, notamment la prise en charge des individus présentant des attirances sexuelles envers les mineurs avant le passage à l'acte.

      --------------------------------------------------------------------------------

      4. Obstacles à la Protection et au Signalement

      Les défaillances de l'Éducation Nationale

      Bien que l'école soit le premier lieu de signalement, le système rencontre des obstacles majeurs :

      • Manque de formation des enseignants : Beaucoup ne sont pas formés au repérage des « signaux faibles » ou à la manière de recueillir la parole d'un très jeune enfant.

      • Complexité du signalement : L'obligation de dénoncer des crimes (Article 40 du Code pénal) est parfois perçue par les agents comme un court-circuitage de la hiérarchie.

      Les retours des procureurs suite aux signalements sont quasi inexistants.

      • Dispositifs inadaptés : Les « Salles Mélanie » (conçues pour l'audition des mineurs) sont mal réparties sur le territoire et leur protocole n'est pas adapté à la psychologie des enfants de 3 ans, qui se crispent souvent sans la présence d'un parent.

      --------------------------------------------------------------------------------

      5. Recommandations et Perspectives de Réforme

      Le débat souligne plusieurs pistes pour sortir de l'ère « pré-MeToo » de l'enfance :

      • Application des préconisations de la CIIVISE : Mise en œuvre des 82 recommandations du premier rapport, incluant une culture de la protection systématique.

      • Réforme législative : Soutien à la proposition de loi visant à renforcer la protection des enfants en milieu scolaire et périscolaire, incluant un contrôle accru des établissements privés sous contrat (référence à l'affaire Bétharram).

      • Moyens financiers et humains : Augmentation massive des budgets pour la formation continue de tous les personnels en contact avec des mineurs (animateurs, enseignants, agents territoriaux).

      • Cellule nationale unique : Création d'une plateforme centralisée pour recueillir et aiguiller les signalements (famille, école, justice) afin d'assurer un traçage efficace des auteurs.

      • Modèle espagnol : Suivre l'exemple de l'Espagne qui, par un investissement massif, a réduit drastiquement les chiffres des violences intrafamiliales et sexuelles.

      Citations Clés

      « C’est un double message envoyé à la société. D’un côté, on ne protège pas vos enfants et puis de l’autre ce n’est pas si grave que ça. »Rebecca Royer, avocate des parties civiles.

      « On vit dans une société qui force les enfants tout le temps à faire plein de choses, qui ne les écoute pas et on est surpris que les enfants soient victimes de violence sexuelle. »Lolit Arrivé, institutrice et journaliste.

      « Il faut un village pour violer un enfant. »Romain Lemire (cité par Nathan Devers), auteur.

      « La parole d’un enfant a moins de valeur que celle d’un adulte et ça c’est absolument scandaleux. »Rebecca Royer.

    1. Briefing : Colonies de Vacances et Accessibilité — Enjeux, Histoire et Dispositifs d’Aide

      Ce document de synthèse analyse les enjeux contemporains de l'accès aux colonies de vacances en France, en s'appuyant sur les interventions de la FCPE (Fédération des conseils de parents d'élèves) et de la JPA (Jeunesse au Plein Air).

      Il détaille l'évolution historique du secteur, le cadre réglementaire strict garantissant la sécurité des mineurs, ainsi que les mécanismes de financement visant à favoriser la mixité sociale.

      Résumé Exécutif

      L'accès aux vacances est affirmé non pas comme un luxe, mais comme un droit fondamental de l'enfant, essentiel à son développement, à son autonomie et à la mixité sociale.

      Malgré cet impératif, environ 4,7 millions d'enfants ne sont pas partis en vacances en 2023.

      Le secteur des colonies de vacances (ACM - Accueils Collectifs de Mineurs) est l'un des plus réglementés d'Europe, garantissant un haut niveau de sécurité et de qualité éducative.

      Pour pallier les inégalités économiques, de nouveaux dispositifs comme le Pass'colo s'ajoutent aux aides traditionnelles de la CAF (VACAF) et de la JPA, permettant un cumul de soutiens financiers pour les familles dont le quotient familial (QF) est inférieur ou égal à 1 500 €.

      --------------------------------------------------------------------------------

      1. Philosophie et Missions des Séjours Collectifs

      Les colonies de vacances sont définies comme des espaces d'éducation populaire et de citoyenneté.

      Elles reposent sur plusieurs piliers fondamentaux :

      • Droit à l'émancipation : Les vacances permettent de découvrir un quotidien différent, de « respirer » hors du cadre familial et scolaire.

      • Mixité sociale et inclusion : Elles favorisent la rencontre entre enfants de milieux divers.

      Un effort particulier est porté sur l'accueil des enfants en situation de handicap dans des séjours ordinaires.

      • Apprentissage de l'autonomie : C'est souvent le premier lieu de mobilité sans les parents, constituant un « tremplin vers la socialisation ».

      • Éducation durable : Les séjours modernes intègrent de plus en plus une sensibilisation à l'écologie et au patrimoine local.

      « Les vacances ne sont pas un luxe, ce sont un droit.

      Un droit à grandir, un droit à respirer, un droit à découvrir autre chose que son quotidien. » — Adeline Neddet, FCPE

      --------------------------------------------------------------------------------

      2. Évolution Historique des Colonies de Vacances

      Le secteur s'apprête à fêter ses 150 ans en 2026.

      Son histoire se divise en plusieurs phases clés :

      | Période | Caractéristiques principales | | --- | --- | | Fin XIXe - Début XXe | Émergence avec une double préoccupation sociale et sanitaire (offrir « l'air pur » aux enfants des villes polluées). | | 1930 - 1950 | Institutionnalisation et affirmation d'une vocation éducative. | | 1950 - 1970 | « L'âge d'or » pendant les Trente Glorieuses. Pic de 3 millions de départs en 1960. Financement massif par les comités d'entreprise et les communes. | | 2000 - 2025 | Innovation, inclusion (handicap), lutte contre les discriminations et écoresponsabilité. |

      --------------------------------------------------------------------------------

      3. La Jeunesse au Plein Air (JPA) : Acteur et Expert

      Fondée en 1938, la JPA est une association d'éducation populaire et complémentaire de l'école publique.

      Elle regroupe 39 organisations membres (syndicats enseignants, associations de parents comme la FCPE, organisateurs de séjours).

      Ses pôles d'expertise :

      • Juridique : Sécurisation des accueils collectifs de mineurs via des juristes et avocats experts.

      • Plaidoyer : Sensibilisation des décideurs politiques et levée de fonds pour les aides financières.

      • Développement territorial : Un réseau de 50 comités départementaux actifs animés par des bénévoles.

      --------------------------------------------------------------------------------

      4. Sécurité, Qualité et Réglementation

      La France possède l'un des cadres réglementaires les plus stricts au monde pour les séjours de mineurs.

      • Déclaration obligatoire : Tous les séjours doivent être déclarés auprès du ministère (SDGES/DRAJES - Jeunesse et Sports).

      • Cadre légal multicritères : Les séjours sont régis par le Code de l'action sociale et des familles, le Code de la santé publique, le Code de l'éducation et le Code du travail.

      • Projets obligatoires : Chaque organisateur doit rédiger un projet éducatif (valeurs) et chaque directeur de centre un projet pédagogique (mise en pratique concrète).

      • Encadrement qualifié : Taux d'encadrement stricts et personnel diplômé (BAFA, BFD).

      Tout le personnel (cuisine, service, bénévoles) est déclaré et contrôlé.

      • Contrôles inopinés : L'État réalise des inspections régulières sur les lieux de séjour pour vérifier l'hygiène, la sécurité et le respect des normes.

      --------------------------------------------------------------------------------

      5. Dispositifs d'Aides Financières et Accessibilité

      Face au coût croissant des séjours, plusieurs dispositifs de soutien sont mobilisables et souvent cumulables.

      A. Le Pass'colo (Nouveauté)

      Destiné aux enfants de 11 ans (nés en 2014 et 2015 pour les campagnes actuelles), cet âge étant considéré comme une période charnière entre l'école élémentaire et le collège.

      • Éligibilité : Familles avec un QF ≤ 1 500 €.

      • Montant : Dégressif selon le QF, allant de 200 € à 350 €.

      • Fonctionnement : L'aide est déduite directement de la facture par l'organisateur (système de tiers-payant).

      B. Les aides CAF / MSA (VACAF)

      Depuis 2026, un socle national harmonise l'Aide aux Vacances Enfants (AVE).

      • Critère : QF < 950 € (au 31 janvier).

      • Prise en charge : Jusqu'à 50 € par jour pour des séjours de 5 à 15 jours.

      • Label : Les séjours doivent respecter la charte de la laïcité.

      C. Les Aides Solidaires JPA

      • Disponibles pour les familles dont le QF est inférieur ou égal à 1 500 €.

      • Accessibles via les comités départementaux de la JPA.

      D. Autres sources de financement

      • Comités Sociaux et Économiques (CSE) : Tarifs préférentiels ou participations directes.

      • Collectivités territoriales : Certaines communes ou départements cofinancent les départs.

      • Chèques-vacances (ANCV) : Acceptés par la majorité des organisateurs membres de la JPA.

      --------------------------------------------------------------------------------

      6. Guide Pratique pour les Familles

      Pour trouver un séjour de qualité et bénéficier des aides, les ressources suivantes sont préconisées :

      • Jeunes.gouv.fr/pascolo : Un annuaire riche permettant de filtrer les séjours par âge, date, prix et thématique (équitation, sciences, sport, etc.).

      • Plateforme VACAF : Pour les allocataires CAF, afin d'identifier les organisateurs labellisés.

      • Site JPA.asso.fr : Pour contacter les comités départementaux et consulter la liste des organisateurs membres garantissant une éthique d'éducation populaire.

      Note sur les "Colos Apprenantes" : Ce dispositif spécifique mis en place après la crise COVID a été supprimé par le ministère en 2024, le Pass'colo devenant désormais le levier majeur de la politique publique de départ en vacances.

    1. Claude Code is the most common coding agent tool reported, with 86% of users reporting Claude Code use (31% report using Codex, the next most common tool).

      Claude Code在编码代理工具中占据主导地位(86%的使用率),远超其他工具如Codex(31%)。这表明Anthropic的产品在学术研究领域具有显著的市场优势。然而,需要注意的是,这个数据是在特定时间段(2026年初)收集的,市场格局可能随时间变化。

    2. The vast majority of respondents (81%) have tried using AI chatbots in research, particularly for writing code and editing prose. But only 20% have adopted coding agents—tools like Claude Code that autonomously write and execute analysis code—into their work.

      81%使用AI聊天机器人的比例远高于20%采用编码代理的比例,这表明虽然大多数社会科学家已经尝试过AI工具,但只有少数人真正采用了更先进的自主编码工具。这个差距反映了AI工具采用过程中的明显分层,可能与技术接受度、工作流程整合难度有关。

    1. Mistral Vibe extension for VS Code; the coding agent working across your whole project, inside your IDE.

      文章提到VS Code扩展,但没有提供具体的安装量、用户渗透率或性能数据。对于开发者工具而言,这类数据对于评估产品在目标市场的渗透率至关重要。与GitHub Copilot等竞争对手相比,我们无法判断Vibe Code的市场接受度。此类技术产品声明需要后续的使用统计数据来验证其实际采用率。

    1. Anthropic把几乎所有资源压在文本推理和代码执行上。这个策略在商业上正在被验证:Claude Code年化收入25亿美元...但从范式演进的角度看,这是一个在积累技术债的选择。

      大多数人认为专注于文本推理和代码执行是明智的商业策略,但作者认为Anthropic的这种选择是在积累技术债,因为它可能在未来统一连续空间架构的竞争中处于被动。这一观点挑战了当前AI商业成功的标准叙事。

    1. SkillSpector checks conventional software risks such as vulnerable dependencies, suspicious scripts, dangerous code patterns, credential access, and data exfiltration paths. SkillSpector also checks agent-specific risks, such as hidden instructions, prompt injection, trigger abuse, excessive agency, tool poisoning, and mismatches between a skill's declared purpose, requested access, and bundled behavior.

      行动建议:在开发或使用AI代理技能时,使用SkillSpector工具进行安全扫描,检查依赖项、脚本模式、凭证访问和数据泄露路径等常规风险,以及隐藏指令、提示注入、触发滥用等特定风险。这有助于在技能部署前识别并缓解潜在的安全问题。

    1. The Strange Melancholy of Slaying Monsters
      • Video games have traditionally used a "player-versus-environment" model of monster slaying for accomplishment, but many titles subvert this ritual to introduce ethical dilemmas and an elegiac tone.
      • In the Western tradition, the concept of the "tragic monster killer" dates back to J.R.R. Tolkien’s analysis of Beowulf, which rejects the notion of martial heroism as its own end and acknowledges the inevitable ruin of the warrior.
      • Games like Shadow of the Colossus highlight this moral complexity by forcing players to slay peaceful, majestic creatures; the game lacks regular enemies and presents the colossi's deaths with agonizing visual effects and mournful music rather than a celebratory fanfare.
      • Titles such as Dark Souls and Bloodborne reinforce a melancholic atmosphere by designing bosses characterized by deep sorrow and tragic descents into ruin, mirroring Friedrich Nietzsche's warning about becoming a monster when fighting them.
      • Mainstream titles like BioShock, Spec Ops: The Line, and God of War incorporate the "false hero" trope, forcing players to confront their complicity in violence or show resignation toward inescapable gaming conventions.
      • The indie game Undertale subverts RPG norms by humanizing its quirky monsters and allowing players to spare them through non-violent negotiation, ultimately revealing that classic progression mechanics like EXP and LV stand for "execution points" and "level of violence."

      Hacker News Discussion

      • Personal Experiences of Disenchantment: Several commenters shared specific gameplay moments where accidentally humanizing a virtual opponent permanently altered their perception of video game violence, including a player who quit Skyrim after realizing they had slaughtered a homeless bandit family for meaningless loot.
      • The Psychology of Fiction and Reality: A discussion developed around how players reconcile virtual actions; while most understand that video game enemies are just code, the introduction of narrative texture and realistic consequences can pierce the layer of abstraction and invoke genuine guilt or melancholy.
      • Military Shooters and Propaganda: Some users recalled playing tactical shooters like Operation Flashpoint, where the sudden realization of the geopolitical absurdity or human cost behind a simulated conflict broke their immersion and temporarily ruined first-person shooters for them.
      • Intentional Game Design: Participants praised developers who deliberately use ludonarrative resonance—aligning gameplay mechanics with the narrative—to challenge the mindless power fantasies common to the medium.
    1. Tech CEOs are apparently suffering from AI psychosis
      • Box founder Aaron Levie coined the phrase "AI psychosis" to describe tech executives who suffer from delusions of AI grandeur due to being too distant from the actual day-to-day operations where value is generated.
      • Because CEOs only interact with high-level prototypes, they mistakenly leap to the conclusion that AI agents can effortlessly handle full workloads without realizing the heavy human labor required to review code, patch bugs, catch hallucinations, and train models.
      • This executive delusion has real-world consequences, driving severe workforce reductions; in the first five months of 2026, over 115,000 tech workers were laid off—nearly matching the total for all of 2025—with AI cited as a primary justification.
      • High-profile actions, such as ClickUp CEO Zeb Evans laying off 22% of his workforce after deploying 3,000 AI agents, are framed as shifting humans into "manager and verifier" roles for AI outputs.
      • Empirical data from UC Berkeley, NBER, and MIT refutes these massive productivity assumptions, demonstrating no robust link between current AI adoption and aggregate productivity gains, with MIT predicting baseline competence on text tasks will not materialize until 2029.
      • A Harvard Business Review study warns that flooding an organization with unverified AI output merely shifts bottlenecks onto executives, risking widespread structural and operational chaos if human oversight fails to scale.

      Hacker News Discussion

      • Distance from Reality: Commenters strongly agreed with the premise that executives live in a bubble, noting that they deal primarily with administrative assistants, sycophants, and curated, "happy path" demos that look like magic, making them blind to edge cases and errors.
      • The "Yes-Man" Nature of AI: Multiple users pointed out that AI agents behave like the ultimate corporate sycophants—they work 24/7, lack internal moral conflict, and never say no—making them highly attractive to authoritative executives who dislike pushback from human workers.
      • Absence of Self-Preservation: A key distinction raised in the comments is that unlike human employees, AI lacks "self-preservation," a sense of reputation, or a fear of consequences, meaning an agent will confidently delete a production database or kill its own server processes without hesitation.
      • Misuse of the Term: Some participants criticized the article's title as clickbait, arguing that "AI psychosis" should describe literal psychological delusions in individuals interacting with AI rather than standard corporate incompetence or unrealistic executive expectations.
      • Projection of Executive Work: A popular theory suggested that CEOs assume AI can replace everyone's job because it can easily replicate their own daily tasks, such as generating slide decks, sending emails, and attending high-level meetings.
    1. Document de Synthèse : Contrer l'Absentéisme au Secondaire

      Résumé Exécutif

      Ce document synthétise une approche innovante pour la gestion de l'absentéisme dans une école secondaire, développée et présentée par Véronique Sir, directrice d'établissement et candidate au doctorat.

      Le projet marque une transition fondamentale d'un modèle punitif, jugé lourd et inefficace, vers un modèle relationnel qui responsabilise et outille les enseignants.

      Cette nouvelle stratégie a permis de réduire de 50 % le nombre d'élèves présentant plus de 15 absences non motivées en une seule année scolaire.

      Au-delà des chiffres, la retombée la plus significative est l'amélioration notable de la relation entre les enseignants et les élèves, les premiers n'étant plus perçus comme des "polices de la retenue" mais comme des adultes bienveillants et soucieux de la présence de chaque jeune.

      La mise en œuvre s'est articulée en cinq étapes clés, incluant une analyse rigoureuse, la création d'un sous-comité stratégique, une approche pilote par "petits pas", une intégration systémique et un partage des connaissances.

      Le projet met en lumière l'importance du temps, de l'adhésion des équipes et de la focalisation sur le pouvoir d'agir collectif de l'école plutôt que sur des facteurs externes.

      Contexte et Problématique Initiale

      À l'arrivée de la nouvelle direction il y a trois ans, deux irritants majeurs étaient palpables et verbalisés par le personnel de l'école :

      1. Un manque de cohérence dans l'application du code de vie.

      2. Une gestion des absences perçue comme excessivement lourde et inefficace.

      Cette dernière tâche était si pesante que la majorité des enseignants souhaitaient s'en dégager.

      L'analyse initiale des données a permis de "neutraliser l'effet négatif" des perceptions en démontrant que le problème, bien que réel, ne concernait que deux ou trois élèves par groupe, et non une majorité comme il était parfois ressenti.

      Le Projet de Gestion des Absences : Une Approche Relationnelle

      Philosophie et Changement de Paradigme

      Le cœur du projet est un changement radical de philosophie, passant d'un système répressif à une approche humaine et proactive.

      D'un modèle punitif à un modèle relationnel : L'ancienne méthode, qui consistait à sanctionner l'absence (par exemple, par une retenue), est abandonnée au profit d'une démarche qui cherche à comprendre les causes de l'absence et à outiller l'élève.

      Comme le résume Mme Sir : "On est passé d'un modèle punitif à un modèle relationnel et outillé soutenu par des facilitateurs à l'école."

      Le rôle central de l'enseignant : Le projet repose sur l'implication directe des enseignants, qui deviennent les premiers intervenants.

      Ils sont responsables des sept premières interventions auprès de leurs élèves tuteurs, incluant deux appels aux parents pour les sensibiliser.

      Cette approche s'oppose au réflexe de déléguer cette responsabilité à l'équipe de soutien, reconnaissant qu'une poignée d'intervenants ne peut gérer efficacement les absences de plus de 900 élèves.

      La présence des enseignants est donc jugée "essentielle".

      Résultats Quantitatifs

      Le projet, axé sur une gestion par les résultats, a démontré un impact mesurable et significatif sur la réduction de l'absentéisme chronique non motivé.

      | Période | Contexte | Nombre d'élèves avec >15 absences non motivées | | --- | --- | --- | | Juin 2024 | Fin de la phase pilote (3 mois, 3 groupes sur 35) | Environ 120 élèves | | Juin 2025 | Fin de la première année complète (tous les groupes) | Environ 60 élèves | | 31 octobre 2025 | Début de l'année scolaire en cours | 6 élèves |

      Ces chiffres représentent une diminution d'environ 50 % des cas d'absentéisme chronique en un an.

      Il est noté que le mois de juin tend à augmenter le nombre d'absences, ce qui rend la comparaison encore plus probante.

      Le principal fait saillant est que tous les élèves de l'école (clientèle d'environ 950 jeunes) sont désormais connus et suivis, ne permettant à personne de "passer sous la craque".

      Les Cinq Étapes de la Mise en Œuvre

      Le cheminement réflexif du projet a été structuré en cinq phases distinctes, menées en collaboration avec des chercheurs universitaires.

      1. Analyse de la situation : La première étape a consisté à faire émerger des données factuelles pour objectiver les deux irritants majeurs (code de vie et gestion des absences).

      2. Création du sous-comité : Considérée comme le "cœur de la démarche", cette étape a impliqué la sélection stratégique de ses membres.

      Le comité inclut non seulement des personnes ouvertes au changement, mais aussi des enseignants plus critiques et des membres du personnel encore attachés au modèle punitif.

      L'objectif était de créer un espace de réflexion pour confirmer la fin du statu quo et construire une vision commune.

      3. Culture des "petits pas" : Pour gérer le changement, le projet a débuté par un pilote limité : trois groupes, trois enseignants volontaires, pendant trois mois.

      Ce n'est que la deuxième année que l'approche a été étendue à toute l'école.

      Cette phase a été marquée par des "allers-retours constants" et un "droit à l'erreur", permettant d'ajuster les moyens tout en gardant le cap sur la finalité (le modèle relationnel).

      4. Veilles et intégration systémique : Cette étape, imbriquée dans les autres, a consisté à ancrer le projet dans toutes les instances de l'école :

      Comité projet éducatif : Intégration d'indicateurs sur l'assiduité.   

      Plan de lutte contre la violence et l'intimidation : Favoriser un climat scolaire sécuritaire.  

      Assemblées générales : Véhiculer l'importance du projet, en faisant témoigner les "agents facilitateurs".  

      Rencontres de niveaux : Instaurer un point statutaire toutes les deux semaines pour suivre les élèves absentéistes.

      5. Partage à la communauté : La dernière étape consiste à diffuser le projet pour "faire gagner du temps" à d'autres équipes-écoles, évitant ainsi de réinventer des solutions existantes.

      Défis, Facteurs de Succès et Recommandations

      Défis Rencontrés

      La gestion du temps et des attentes : Les résultats ne sont pas immédiats.

      Comprendre les causes profondes de l'absentéisme prend du temps, ce qui peut être un défi dans une culture axée sur les résultats rapides.

      L'adhésion de l'équipe : La deuxième année, lorsque tout le personnel est impliqué, est cruciale et peut voir émerger plus de résistance.

      Le sous-comité joue un rôle fondamental pour accueillir ces résistances sans reculer.

      La gestion des cas chroniques : Certains élèves, aux prises avec des enjeux de santé mentale ou de démotivation scolaire importants, résistent aux interventions.

      L'implication des professionnels (psychoéducateurs, conseillers d'orientation) est ici fondamentale.

      Le roulement du personnel : L'arrivée de personnel non formé en pédagogie peut rendre la création de liens plus difficile, nécessitant un soutien accru de la part des "agents facilitateurs" internes.

      Principale Réussite : L'Amélioration de la Relation Enseignant-Élève

      Le gain le plus "magnifique" et le plus positif du projet est l'amélioration de la qualité des relations.

      Les enseignants ne sont plus vus comme des agents de sanction. Un enseignant a partagé une anecdote révélatrice :

      "Les élèves m'ont dit à plusieurs reprises cette année : 'Cou'donc, avez-vous une vie à part nous regarder à l'école ?'".

      Pour l'équipe, cette remarque est une "victoire", car elle signifie que chaque élève sait qu'au moins un adulte se soucie de sa présence.

      Erreurs à Éviter

      1. Aller trop vite : Le changement culturel et la compréhension des causes profondes de l'absence exigent du temps.

      2. Remettre le sort aux parents : Plutôt que de se concentrer sur les motifs d'absence (sur lesquels l'école a peu de contrôle), la discussion doit être réorientée vers le "pouvoir d'agir collectif" à l'interne.

      3. Utiliser les données à mauvais escient : Un outil de suivi (Power BI) a été développé pour fournir des données quotidiennes.

      La vigilance est de mise pour que ces données servent à comprendre et agir, et non à "masquer artificiellement" les problèmes ou à créer une compétition entre les écoles.

      Retombées Stratégiques et Pérennité du Projet

      Outre la baisse de l'absentéisme et l'amélioration des relations, le projet a généré plusieurs impacts positifs durables :

      Approche personnalisée : L'école est passée d'une généralisation ("tous les élèves de 4e secondaire s'absentent") à une analyse fine et personnalisée des besoins de chaque élève.

      Standardisation des interventions : Un protocole écrit garantit la qualité et la pérennité des interventions, indépendamment du personnel en place.

      Autonomisation et résilience des équipes : Les enseignants ont développé une autonomie ("empowerment") et une résilience face à la problématique, conscients de leur pouvoir d'agir collectif.

      Préparation à la croissance : La structure mise en place est comparée aux "fondations d'une maison", rendant l'école prête à accueillir une hausse de sa clientèle.

      Pérennité du modèle : Le projet est conçu pour être durable. L'objectif final est de développer une autonomie telle que le projet puisse survivre au départ de la direction actuelle.

      Comme le conclut Mme Sir : "demain matin si je pars comme direction d'établissement, le projet va survivre grâce à nos agents facilitateurs qui vont assurer la pérennité du projet."

    1. How does the UST's TeachOnline office aligns (or not) with the contents of this encyclical.

      In alignment with our Catholic University's mission of goodness, knowledge and discipline; first, we've worked very hard to understand how artificial intelligence works, the best approach for artificial intelligence and, what it can and cannot do. As instructional designers we have an ethical and moral code to do no harm to our students; the creation or purveying of false information would be a moral and intellectual harm; so, to the best of our abilities, we seek to only generate accurate and factual information with artificial intelligence tools. We do this by using existing documents, meeting transcripts, and other human-generated artifacts as part of context engineering for the prompts we are creating.

      Additionally, on the topic of goodness, and in alignment with the ethical quandaries of using artificial intelligence tools that can be connected to "long chain of mediation, involving vast networks of natural resources, energy infrastructure, and above all people". That is, tools that are known to be exploitative to the environment and hurt neighboring people, –specially marginalized communities– (xAI/Grok), disregard the subsidiarity of local communities (Meta AI), and known for harming adult and children with its ability to convince them of false and violent informaton (ChatGPT); our chosen tools are Anthropic's Claude Sonnet and Opus models. That isn't to say that Anthropic is guiltless. However, it continues to stand above all other companies as being the most ethical and conscientious artificial intelligence lab – although that is not saying much, Claude has been used as a hacking tool, and it was used in Pentagon for weapon and operation planning; prior to its designation as a national security risk, ironically because they sought to enact a "red line" (that is disarm) on their AI being used on weapon systems and mass surveillance.

      As educators and instructional designers, we welcome the challenge to rethink "the organization of schools, physical spaces, evaluation methods and the role of teachers themselves... promote an authentically integral education that addresses every dimension of the person." To do this, we follow our scientific and ethical practices of our profession in the development of courses that have measurable outcomes, accurate, engaging, collaborative, applicable to real life, that hopefully lead to reflection and contemplation. Additionally, our role as educators helps "disarm" AI from its worst possible uses, and we can further assist by beating "swords into ploughshares" by helping our students understand the ethical and moral boundaries of any technological use and implement it in ways that aid humanity. We respect that our faculty engage in the work of Nehemiah, by helping to build the wall of Jerusalem; by engaging in one of the most charitable acts in humanity, that of giving away and imparting their knowledge unto the future generation.

      WIP!!!!

    1. a decoding operation, which implies the implemen tation of a cognitive acqui rement, acultural code.

      why does art acc mean something to people? Not just a natural or inherent sense but an embodied knowledge

    1. Additional Distinguishing Information Virginia-Based Firm DAA is headquartered in Vienna, Virginia. We understand the Commonwealth’s regulatory and governance framework for public school divisions, including the Virginia Public Procurement Act and relevant sections of the Code of Virginia governing school operations and closures. Specialized Focus Unlike general-purpose consulting firms that offer redistricting as one service among many, DAA is built specifically around demographic analysis, enrollment forecasting, and school boundary planning. Our methodologies are purpose-built for K-12 applications. Our team understands the particular data quality challenges of school enrollment records. We are attuned to the community dynamics that surround redistricting decisions. And our deliverables are designed for the people who use them — school boards making policy decisions, administrators managing operations, and families trying to understand where their children will go to school. Census Bureau Expertise Two of our three principals are former Census Bureau staff who worked directly on the methods used to produce population estimates and projections for the United States. This gives us an unusually deep understanding of the federal demographic data that forms the foundation of any enrollment projection. We know not just how to use Census, ACS, and other federal data products, but how they are made — their assumptions, their error structures, and where they should and should not be trusted. This is particularly important in the post-2020 environment, where differential privacy and COVID-19 disruptions have introduced data quality challenges that many analysts do not fully appreciate. Collaborative, Not Black-Box We do not disappear for three months and return with a set of recommendations. Our phased methodology is designed so that the district is a partner at every stage — reviewing data, validating assumptions, providing local knowledge, and understanding the tradeoffs before scenarios are finalized. This approach produces better analysis, because local expertise catches things that data alone cannot, and it produces better outcomes, because the district owns the process and its results. Clients Who Come Back Several of our clients have extended their engagements beyond the original scope. Brunswick County Public Schools extended from one year to four. Others have returned for annual updates or new phases of work. We take this as the strongest possible signal that our work delivers value and that districts find us good partners to work with. Data Security and FERPA Compliance This project will require access to geocoded student enrollment records protected under the Family Educational Rights and Privacy Act (FERPA), 20 U.S.C. § 1232g. We take these obligations seriously and have established procedures that meet the requirements of every district we work with. Data Handling All student data will be transmitted via Box.com, an enterprise-grade encrypted file sharing platform that has been approved by multiple school districts for this purpose. Once received, student data is stored on encrypted, access-controlled systems accessible only to project team members. No student data is shared with third parties under any circumstances. All analysis is conducted on secured workstations with current security software and operating system patches. FERPA Compliance Our team has completed FERPA training and maintains current knowledge of student data privacy requirements. We will execute a data sharing agreement with the district that complies with FERPA’s “studies” exception (34 CFR § 99.31(a)(6)). All published reports and maps present data in aggregate form only — no individual student will be identifiable in any deliverable. Student-level records are used solely for geocoding and boundary analysis. Data Destruction Upon project completion or contract termination, we will securely destroy all student-level data in our possession and provide written certification of destruction to the district. Aggregate analytical outputs will be retained only as needed for the contracted deliverables. Staff Training All DAA personnel assigned to this project have received training on student data privacy, including FERPA obligations, data minimization principles, and incident response procedures.

      unless these sections are required, remove them.

    2. The planned closure of Stewartsville Elementary — the district’s oldest operating school, built in 1912 and running at 41% of capacity in 2025-26 — adds both urgency and political sensitivity to this redistricting. The closure is projected to save approximately $1 million per year in operating costs. In March 2026, the Board voted 5-2 to close the school, prompting a community lawsuit that cited procedural requirements under VA Code § 22.1-79(8). Rather than wait for the court’s June ruling, the Board held a new public hearing and voted unanimously to delay closure to the end of 2026-27, launching this division-wide redistricting study at the same time.

      Note that we understand the sensitivity of this project, and the feelings throughout the community on how it is going.

    1. Briefing : L’Éducation à la Vie Affective et Relationnelle (EVAR) à l’École Primaire

      Ce document synthétise les points clés de l'entretien avec Aurélie Gourmelon, conseillère pédagogique et autrice, concernant la mise en œuvre de l'éducation à la vie affective et relationnelle (EVAR) de la petite section au CM2.

      Résumé Exécutif

      L'éducation à la vie affective et relationnelle (EVAR) est une obligation légale en France depuis 2001, bien que sa mise en œuvre soit restée longtemps floue faute de programmes précis.

      La publication de nouveaux programmes en février 2025 marque un tournant en offrant un cadre structuré aux enseignants.

      L'approche préconisée repose sur une pédagogie positive, centrée sur le développement des compétences psychosociales, le respect de soi et d'autrui, et la protection de l'enfance.

      Plutôt que d'être une discipline isolée et chronophage, l'EVAR s'intègre de manière transversale dans la vie quotidienne de la classe et les matières existantes (littérature, arts, sciences).

      Le succès de cet enseignement repose sur la posture de l'enseignant, sa capacité à dédramatiser le sujet et à traiter les questionnements des élèves avec un angle scientifique et laïque.

      --------------------------------------------------------------------------------

      Cadre Institutionnel et Terminologie

      Évolution des Programmes

      Depuis 2001, le Code de l'éducation impose au moins trois séances annuelles d'éducation à la sexualité de l'école primaire à la terminale.

      Cependant, l'absence de directives précises a longtemps laissé les enseignants démunis.

      • Programmes de février 2025 : Ils comblent ce vide en fournissant des indications concrètes sur le contenu et la méthode.

      • Différenciation de terminologie : Avant la 6e, on parle d'EVAR (Éducation à la Vie Affective et Relationnelle).

      Le terme "Sexualité" (EVARS) est officiellement réintroduit à partir du collège.

      La question de la "Sexualité" en primaire

      Le retrait du "S" de sexualité vise à rassurer les parents et les enseignants.

      Toutefois, la source précise que la sexualité, au sens large, débute dès la naissance.

      À l'école primaire, elle ne concerne pas l'acte sexuel, mais :

      • La connaissance de soi.

      • La reconnaissance des sensations et des émotions.

      • La définition de ses propres limites (ce que l'on aime ou n'accepte pas).

      --------------------------------------------------------------------------------

      Enjeux et Objectifs de l'EVAR

      L'objectif central est de donner des repères aux enfants pour qu'ils apprennent à s'écouter, se respecter et se protéger.

      Une Approche Positive et Normative

      L'EVAR ne doit pas entrer par le prisme des agressions ou du danger (approche traumatisante), mais par la normalité :

      • Définir les bons comportements (comment un adulte ou un camarade doit se comporter).

      • Enseigner la norme permet à l'enfant d'identifier, par contraste, ce qui est anormal ou inacceptable.

      • Libération de la parole : Ces espaces de discussion permettent parfois de révéler des situations d'inceste ou d'agressions (10 % des enfants sont concernés par l'inceste).

      Compétences Psychosociales et Réciprocité

      L'EVAR est intrinsèquement liée au développement des compétences psychosociales.

      Elle repose sur la réciprocité :

      • Consentement : Apprendre à donner le sien, mais aussi à lire et respecter celui de l'autre.

      • Intimité : Faire respecter son espace et respecter celui d'autrui.

      • Communication : Apprendre à dire "non" et à accepter le "non" des autres sans le percevoir comme un rejet personnel.

      --------------------------------------------------------------------------------

      Levée des Résistances et Mise en Œuvre Pratique

      Analyse des réticences courantes

      | Réticence | Réponse Pédagogique | | --- | --- | | "C'est le rôle des parents" | L'école a une mission de protection de l'enfance (notamment face à l'inceste) et de transmission de connaissances scientifiques et laïques. | | "Manque de temps" | L'EVAR est transversale. Elle s'infuse dans les moments de vie (toilettes, vestiaires, récréation) et les disciplines (littérature, histoire de l'art). | | "Peur d'aller trop loin" | L'enseignement doit être adapté à l'âge. Il s'agit de répondre aux besoins réels et non d'anticiper des notions complexes. |

      L'Intégration Transversale

      L'enseignant polyvalent peut injecter l'EVAR à tout moment :

      • Gestion des conflits : Travailler sur le consentement lors de disputes dans la cour.

      • Littérature de jeunesse : Questionner les stéréotypes de genre (ex: pourquoi le héros est-il toujours un garçon ?) ou les représentations familiales.

      • Arts : Observer la représentation des hommes et des femmes à travers les siècles.

      --------------------------------------------------------------------------------

      Posture et Stratégies de l'Enseignant

      Gérer les questions des élèves

      Lorsqu'un enfant pose une question complexe ou surprenante, l'enseignant doit adopter une posture de questionnement plutôt que de réponse immédiate :

      • Questionner la question : Demander à l'enfant ce qu'il en pense ou ce qu'il sait déjà.

      Souvent, la question reflète une inquiétude plutôt qu'un besoin de savoir technique.

      • Identifier l'inquiétude : Chercher ce qu'il y a derrière la question (ex: une inquiétude sur la puberté ou les règles) pour apporter une réponse rassurante.

      • Contrôler l'information : Répondre de manière collective permet de corriger les erreurs apprises sur Internet ou entre pairs, en utilisant un discours scientifique et laïque.

      Conseils de posture

      • Éviter le militantisme : L'école doit rester dans l'enseignement et non dans le militantisme personnel.

      • Se questionner sur ses propres biais : Prendre conscience de ses stéréotypes et de son éducation.

      • Travailler en équipe : Si un enseignant se sent mal à l'aise avec un sujet, il peut envisager un échange de service avec un collègue.

      Les erreurs à éviter (Comment rater une séance d'EVAR)

      • La "comptine" : Faire apprendre par cœur des phrases comme "mon corps m'appartient" sans réflexion.

      Savoir réciter n'est pas savoir se protéger.

      • Le discours moralisateur : Imposer des injonctions plutôt que de partir des représentations des élèves.

      • Le "One-Shot" : Faire trois séances par an et ne plus jamais en parler.

      L'EVAR nécessite une répétition et une présence quotidienne.

      • L'approche négative : Présenter le monde comme rempli d'agresseurs potentiels, ce qui est anxiogène.

      • Le pointage individuel : Mettre un enfant en difficulté devant le groupe suite à une confidence ou une erreur de comportement.

      --------------------------------------------------------------------------------

      Outils et Ressources Mentionnés

      Le document s'appuie sur l'ouvrage Enseigner l’éducation à la vie affective et relationnelle de la petite section au CM2, qui propose :

      • Programmes inversés : Une entrée par notion (consentement, droits de l'enfant, etc.) plutôt que par âge, pour visualiser la progression du cycle 1 au cycle 3.

      • Cartes mentales de programmation : Pour établir les prérequis (ex: comprendre les droits de l'enfant avant d'aborder le consentement).

      • Activités clés en main : Des séquences adaptables selon les supports choisis par l'enseignant.

      • Malle de lecture : Une sélection d'ouvrages de littérature de jeunesse pour aborder les thèmes de l'EVAR (ex: ouvrages de Baptiste Beaulieu).

      Conclusion de l'approche : L'EVAR doit être perçue comme un enseignement "joyeux" et "facile", essentiel pour permettre aux enfants de devenir des citoyens épanouis et respectueux.

    1. I tracked 430 hours of Claude Code usage. 73% was wasted on these 9 patterns.
      • Data Logged via Proxy: Over a 90-day period, a developer tracked all Claude Code activity using an HTTP proxy to capture full payloads, token counts, and costs directly interfacing with the Anthropic API.
      • The Scale: The dataset spanning this study consists of 430 hours of actual work, 6 million input tokens, and a total spend of $1,340 on API costs.
      • The Waste Discovery: Analysis revealed that only 27% of the total tokens processed did actual "productive work." The remaining 73% were consumed by nine hidden, automated inefficiency patterns.
      • The Solution: By identifying and resolving these nine patterns—each requiring roughly a 30-second fix—productive token efficiency can be increased from 27% to approximately 65% without changing the underlying model or losing functionality.
      • The 9 Major Cost Culprits:
        1. CLAUDE.md Bloat (~14% waste): Large, overly dense, or un-optimized systemic instructions files consume massive, unnecessary overhead tokens on every single interaction. Fix: Compress, aggressively prune rules, or split instructions into context-specific modular files.
        2. Conversation History Re-read (~13% waste): Long chat sessions exponentially multiply costs, as message #30 costs 30 times more than message #1 due to processing the entire accumulated history. Fix: Use a structured context-refresh cadence to summarize and discard older, unnecessary messages without losing the current task state.
        3. Hook Injection (~11% waste): Context injected via automated UserPromptSubmit hooks unnecessarily loads extra code and data into the prompt context for tasks that don't require them. Fix: Replace indiscriminate global hooks with conditional triggers that only attach context when explicit keywords or file types are targeted.
        4. Cache Misses (~10% waste): Expired prompt caches (which have a short 5-minute lifespan) force expensive, full-price re-tokenization of the codebase context when work pauses briefly. Fix: Set up an automated low-cost "keep-alive" ping task every 4 minutes to maintain the prompt cache active during active development blocks.
        5. Skill Loading (~7% waste): Inactive or irrelevant scripts (such as loading complex front-end UI design skills during a pure backend task) create up to 13,500 token overheads per command. Fix: Explicitly disable global skill auto-loading and isolate advanced capabilities to dedicated subdirectories or specific active profiles.
        6. Extended Thinking (~5% waste): Leaving the reasoning engine globally enabled forces Claude to burn 3,000+ reasoning tokens on simple commands (like basic camelCase naming changes) where deep logic is completely unnecessary. Fix: Disable extended thinking globally by default and explicitly toggle it on only for complex architectural or bug-hunting queries.
        7. Git Diff Inflation (~5% waste): Unfiltered or massive git diff outputs being fed into the context window when reviewing changes, rather than targeting specific file modifications. Fix: Configure the workflow to stream only targeted file diffs or summary statistics rather than pulling full repository diff text into active prompts.
        8. Directory Map Re-indexing (~4% waste): Redundant and frequent re-scanning of the entire project directory tree structure instead of utilizing cached file maps. Fix: Adjust system configuration to enforce a strict file-map caching policy that limits full directory re-indexing to manual project structural changes.
        9. File Read Overlap (~4% waste): Repeatedly reading the exact same source files multiple times within a short interaction window because the system lacks a localized, short-term memory of recent file states. Fix: Implement a session-level temporary cache structure that prevents the agent from re-fetching un-mutated target files in consecutive turns.
      • Debunked Optimization Myths: Lowering costs by switching to a smaller model (like Claude Haiku) for simple tasks only yields a negligible ~3% cost reduction, while aggressively running the /clear command between every minor task proves to be completely counterproductive.
      • Actionable Optimization Script: To automatically detect and patch these specific inefficiencies within a local workspace, the text recommends running a dedicated optimization script shared by the author.
    1. The worst job interview I ever had
      • The author discusses how cultural fit is incredibly important for early-stage, small startups (fewer than 10 people), but notes that some interview processes take this priority too far.
      • Three years prior, the author applied for a founding engineer role at a mental health startup focused on improving therapy access for at-risk youth.
      • Following an uneventful initial screening with the founder and head of engineering, the author was invited to a 90-minute "culture fit" video call with the head of engineering.
      • Instead of technical evaluations, the interview consisted entirely of invasive, non-technical "trauma-baiting" questions regarding the author's biggest life challenges and hardest days.
      • Encouraged by an environment presented as a "safe space," the author shared deeply personal details about family struggles and failed relationships, while the interviewer shared very little in return.
      • The session left the author completely emotionally drained without ever writing or reviewing code.
      • After receiving a generic rejection email 24 hours later, the author felt intense shame, anger, and embarrassment, feeling as though their core personhood—rather than their technical skills—had been judged and rejected.
      • The author concludes that hiring managers and founders must evaluate cultural fit through methods that respect candidates' boundaries instead of forcing them to share deeply personal trauma to secure employment.

      Hacker News Discussion

      • Absurd and Unqualified Interviewers: Users shared experiences with incompetent interviewers, including an incident where a mobile developer was tasked with interviewing Machine Learning Engineers; the interviewer read off rigid ChatGPT-style questions, rapid-fired acronym tests, and repeated questions in an unfocused camera feed.
      • Compliance and Ghost Interviews: Commenters noted that highly dysfunctional or overly aggressive interviews sometimes occur when a company has already chosen an internal or preferred candidate but is legally or contractually mandated to interview a public pool of applicants.
      • Over-indexing on Trivia: A sub-discussion emerged around an engineer who was rejected for not instantly recalling a basic Python string method (.find()). Users debated whether failing to recall minor syntax during high-stress situations is a fair reason to disqualify candidates, noting that poor interviewers focus heavily on specific trivia while good interviewers focus on holistic engineering processes.
      • Power Trips and Red Flags: Many agreed that bizarre or overly intense interview behavior functions as an immediate red flag, saving candidates the trouble of working for micromaging executives, "zombie companies" that purely cruise on VC funding, or toxic environments.
    1. Reviewer #1 (Public review):

      Summary:

      During erythroid differentiation, hematopoietic progenitors relinquish multipotency and activate lineage programs. The switch from GATA2 to GATA1 is particularly important in this process, yet GATA2 chromatin‑binding kinetics remain undefined. The authors investigated GATA2-chromatin interaction dynamics during erythroid differentiation in three different cell systems using single‑molecule live‑cell imaging, and they also used CUT&Tag to profile GATA2 chromatin occupancy.

      By single‑molecule imaging, the authors report two interaction modes for GATA2: short‑lived (<1 s) and long‑lived (>5 s) binding. The proportion of long‑lived molecules, the number of binding events, and the duration of long‑lived binding change (or are maintained) during differentiation. Notably, long‑lived chromatin engagement by GATA2 increases during early erythroid differentiation and decreases at the late stage. CUT&Tag identifies regulatory elements selectively occupied by GATA2 during the early transition stage. Together, these results support a model in which transcription factor kinetics form a dynamic chromatin‑engagement profile that characterizes the GATA2‑to‑GATA1 transition.

      Strengths:

      (1) Characterizing transcription‑factor binding kinetics during the GATA2->GATA1 transition addresses a fundamental mechanism in erythroid differentiation.

      (2) Combining single‑molecule live imaging with CUT&Tag provides both dynamic and locus‑specific perspectives.

      (3) Single-molecule analysis across three different cell systems strengthens the potential generalizability of the findings and highlights biological variability.

      Weaknesses:

      I agree that single‑molecule imaging is a powerful approach for investigating GATA2 kinetics, but the single‑molecule data are the most important part of the paper and need improvement. The analyses focus on three measures: (i) duration of long binding, (ii) proportion of short‑ and long‑binding molecules, and (iii) total binding events. However, several methodological and control issues limit confidence in the kinetic interpretations. The authors should address the following major concerns.

      (1) Two binding states: justification and controls

      The authors propose two states of GATA2 binding. Are there only two states? Studies that separate short‑ and long‑lived binding (e.g., Chen et al., 2014, PMID: 25342811) address two states of transcriptional factors very carefully. Some long‑binding duration distributions here are very long‑tailed (e.g., Figure 2D middle), suggesting a possible third state. The authors must explain how they determined that two states provide the "best fit" to the data and how they classified "short" versus "long" binding.

      Controls should be included for long‑lived and short‑lived binding (e.g., histone proteins, HaloTag‑NLS, or a binding‑deficient GATA2 mutant) as in other studies. These controls are essential to exclude alternative explanations (see points below).

      (2) Exclude photophysical and focal‑plane artifacts

      The authors should exclude contributions from (i) photobleaching, (ii) blinking, and (iii) Z‑axis motion (disappearance from the focal plane). Although photobleaching correction is mentioned in the Methods, no details are provided. Describe and quantify the photobleaching correction and demonstrate that it was applied across all cell types and conditions.

      Some spots in the supplementary movies appear to blink or to move substantially between frames. Provide analyses or controls that distinguish true dissociation events from photophysical blinking/bleaching or axial motion.

      (3) HILO illumination and nuclear region sampled

      HILO is powerful but sensitive to illumination angle: slight changes sample different nuclear regions (e.g., nuclear interior versus periphery). The nuclear periphery is enriched in heterochromatin and may bias binding statistics. Explain how the authors controlled the HILO angle and confirmed that comparable nuclear regions were imaged across cells and conditions.

      (4) Quantification of event counts and long‑binding durations

      The number of binding events and measured long‑binding durations are strongly affected by imaging conditions (labeling/staining, bleaching, nucleus size, cell cycle state, focal plane, spot detectability, etc.). Imaging clarity appears to differ among cells/conditions in the supplementary movie. Provide more careful analysis describing how these variables were controlled or corrected for, and assess the sensitivity of results to choices in detection and tracking parameters.

      (5) Evidence that spots are single molecules

      The authors state that spots represent single molecules but do not provide supporting evidence. Spot brightness varies considerably in the movies. Brightness differences may reflect axial position. Provide evidence supporting single‑molecule assignment (e.g., single‑step photobleaching traces, brightness distributions compared to a known single‑molecule control, or photon count analysis).

      (6) Description of spot‑analysis pipeline

      The manuscript lacks a sufficient description of the spot‑analysis method. I reviewed the STRAP pipeline paper cited (Haque and Coleman 2025 bioRxiv) and the GitHub code, but the Methods in the current manuscript should include a detailed STRAP pipeline. This would enable readers to evaluate and reproduce the analyses.

      (7) Differences among cell systems

      The three cell systems yield notably different results (e.g., Figure 2C vs 4C and Figure 2D/3D vs 4D). Provide a more detailed explanation for these differences and discuss how biological variability, technical differences, or imaging biases might account for the discrepancies.

    2. Author response:

      We are writing to provide our provisional response to the public reviews. We note that the reviewers’ comments focus primarily on strengthening technical rigor and quantitative interpretation. We have designed the planned revisions to directly address the reviewers’ major concerns and to strengthen the study’s evidentiary basis. We plan to submit a revised manuscript for the final Version of Record.

      For clarity, we summarize below the major new experiments and analyses that address the reviewers’ primary concerns:

      (1)Validation of Tracking Parameters (Reviewers 1 & 3): We will re-analyze our single molecule tracking data with tighter gap-time allowances (0 seconds) to demonstrate the robustness of our interpretations of short- and long-lived kinetics. We will also generate a supplementary movie with binding trajectories superimposed directly on detected molecules to visually confirm tracking robustness.

      (2) Photobleaching & Two-State Controls (Reviewers 1 & 3): We will report per-cell photobleaching lifetimes derived from our global fluorescence decay. To strengthen this analysis, we will include supplementary measurements using a H2B-HaloTag control under matched imaging conditions and perform single-molecule tracking of GATA2 zinc-finger deletion mutants (N-terminal, C-terminal, and double) as a binding-deficient functional control.

      (3) Protein Expression & Labeling Efficiency (Reviewers 1 & 2): To address concerns about transgene expression and competition with endogenous proteins, we will quantify Halo-GATA2 levels in G1E-ER4 and HPC7 cells and SNAP-GATA2 levels in primary cells using standardized titration methods with established Halo-CTCF and SNAP-RPB1 reference systems.

      (4) Integration of SMT and CUT&Tag (Reviewer 3): We have conducted a quantitative foldchange analysis of our existing CUT&Tag dataset to complement our single-molecule kinetics.

      However, as detailed in our specific response below (R3 point 5), we emphasize that directly integrating population-level genomic occupancy measurements with single-cell kinetic measurements is not straightforward. We will therefore frame the relationship between these datasets as a conceptual consistency check rather than a strict quantitative integration. This quantitative analysis supports and refines the Early-restricted peak set, identifying a high confidence strict subset consistent with the broader presence/absence-defined set described in Figure 5 of the manuscript (see Author response images 1–3 and our response to R3 point 7).

      (5) Characterization of the GATA2-SNAP Mouse (Reviewer 3): We have characterized hematopoietic populations in the homozygous knock-in mouse, including lymphoid (CD3<sup>+</sup>/CD4<sup>+</sup>/CD8<sup>+</sup>/B220<sup>+</sup>/CD19<sup>+</sup>), myeloid (CD11b<sup>+</sup>/Gr1<sup>+</sup>), and erythroid (Ter119<sup>+</sup>) compartments. These data, presented in Author response image 4, indicate that normal mature hematopoietic output is preserved across genotypes. Statistical caveats are described in the corresponding figure legend and in our response to R3 point 8.

      Public Reviews:

      Reviewer 1 (Public review):

      (1) Two binding states: justification and controls

      The authors propose two states of GATA2 binding. Are there only two states? Some longbinding duration distributions here are very long-tailed (e.g., Figure 2D middle), suggesting a possible third state. The authors must explain how they determined that two states provide the best fit and how they classified short versus long binding. Controls should be included for long-lived and short-lived binding (e.g., histone proteins, HaloTag-NLS, or a binding-deficient GATA2 mutant).

      Agreed in part; we will attempt the requested binding-deficient control using existing GATA2 deletion constructs, complemented by GRID and H2B-HaloTag controls.

      We will clarify that the two-state framework is an operational model rather than a claim that GATA2 can occupy only two physical states. This approach is widely used in SMT studies of chromatin-associated transcription factors and transcription machinery (Gebhardt et al., 2013; Liu et al., 2014; Hansen et al., 2017; Kenworthy et al., 2022). In particular, Ling et al. (Science, 2026) recently used two-exponential survival-probability fitting across 58 Halotagged transcription-associated proteins to distinguish transient and stable chromatin-binding populations, while explicitly noting that the simplified two-state model provides a tractable framework even when the underlying physical behavior may be more heterogeneous.

      We agree that our current two-state model may under-represent the diversity of GATA2 chromatin-binding populations in single cells. However, even within this simplified framework, the existing analysis already indicates increased upper-tail dispersion of kinetic measurements (e.g., residence time and/or percentage of stable events) at the single-cell level in early erythroid cells. To support the goodness-of-fit metrics from our two-state fitting, as Reviewer 3 recommends, we will provide a supplementary table containing confidence intervals for the rate parameters and an F-test metric describing the differences between one- and two-state fits.

      To determine whether additional binding states exist, we will perform GRID (Genuine Rate Identification from Distributions), which does not bias the model toward a particular number of states and, in our experience across multiple proteins, yields fits with 3-5 binding populations. However, we have found that in many cases, GRID requires aggregating binding events from multiple cells to achieve consistently robust fits for the populations of relatively rare, long-lived (>~30 sec) binding events. Therefore, GRID will assess whether additional populations exist, but we will lose the ability to analyze changes in the cell populations at the single-cell level.

      We will include the multi-state analysis as a new supplementary figure. We will additionally clarify in the Results and Methods exactly how short- and long-lived binding events are classified (1-second threshold consistent with prior single-molecule frameworks for transcription-factor chromatin interactions; Gebhardt et al., 2013; Liu et al., 2014; Kenworthy et al., 2022) and direct the reviewer to these passages.

      For the requested controls, we will include H2B-HaloTag imaging under matched conditions as a long-lived reference for both photobleaching correction and as a positive control for stable chromatin association, addressing R1 point 2 and R3 point 1 simultaneously.

      We will also attempt to address the reviewer’s request for a binding-deficient control. We have lentiviral constructs in hand that encode GATA2 with a C-terminal zinc-finger deletion (which removes the primary DNA-binding domain), an N-terminal zinc-finger deletion, and a double deletion. We will perform single-molecule tracking of these mutants in the engineered cell systems and test whether removing GATA2’s specific DNA-binding capacity produces the predicted reduction in long-lived chromatin engagement, providing a functional perturbation control. The interpretation of these experiments will depend on the mutants expressing and localizing appropriately, which we will validate before drawing kinetic conclusions. We note that an analogous binding-deficient mutant cannot be examined in the physiological context of the Gata2SNAP knock-in mouse, and we will frame the cell-line mutant analyses accordingly. Together with GRID and the H2B-HaloTag control, these mutants provide complementary lines of validation for the two-state kinetic framework.

      (2) Photophysical and focal-plane artifacts

      The authors should exclude contributions from (i) photobleaching, (ii) blinking, and (iii) Z-axis motion. Describe and quantify the photobleaching correction. Provide analyses or controls that distinguish true dissociation events from photophysical blinking/bleaching or axial motion.

      Agreed.

      We will substantially expand the methodological description and provide three new pieces of supplementary analysis:

      - Photobleaching: A per-cell photobleaching-rate distribution will be plotted for each cell type and differentiation stage, and photobleach-corrected residence-time values will be reported alongside apparent values in the relevant figures. We will also perform H2B-HaloTag imaging under matched illumination, exposure, and dye conditions in each cell line as a longlived chromatin-bound reference, establishing per-cell-type bleach lifetimes to which the GATA2 measurements can be referenced. This approach follows recent SMT precedent in which H2B decay was used to correct residence-time measurements for photobleaching, chromatin and nuclear motion, microscope drift, defocalization, and dye photophysics (Ling et al., Science 2026). The right-censoring photobleach-correction model used in our analysis will be described in detail in the revised Methods, including parameter values and per-cell handling.

      - Blinking: The STRAP single-particle tracking pipeline already accommodates fluorophore blinking when linking trajectories across successive frames, following the multiple-targettracing framework of Sergé et al. (Nature Methods, 2008). This use of short gap-frame allowances to avoid artificially splitting trajectories due to fluorophore blinking or transient defocalization is consistent with recent live-cell SMT studies of chromatin-associated factors (Ling et al., Science 2026). We will add an explicit statement to the Methods describing how blinking-tolerant linkage parameters are set, and we will reanalyze representative datasets

      with stricter maximum off-frame settings to ensure this parameter does not drive our conclusions (also addressing R3 point 6).

      - Z-axis motion: Given our 500-ms exposure and the ~500-nm axial detection range of the HiLo configuration, axial loss is expected to be a minor contributor. We will quantify this indirectly by plotting, as a supplementary analysis, the maximum in-plane 2D spatial exploration of each binding trajectory, defined as the long-axis diameter of the 2D trajectory envelope. Although this does not directly measure z-position, it serves as a control for large apparent displacements that could reflect molecules moving out of the HiLo detection volume and demonstrates that observed dissociation events are not dominated by axial drift.

      Representative photobleaching traces from individual cells (lowest, highest, and median bleach rates) will be included to support the single-molecule interpretation (also addresses R1 point 5).

      (3) HILO illumination and nuclear region sampled

      HiLo is sensitive to illumination angle: slight changes sample different nuclear regions. Explain how the HiLo angle was controlled and confirmed comparable across cells and conditions.

      Agreed.

      We will add a Methods subsection describing our HiLo illumination procedure. In brief, we started at a TIRF-supercritical angle and reduced it toward epifluorescence just enough to achieve high imaging depth while minimizing out-of-focus background signal. Within each biological system (cell line or primary cells), the TIRF angle was held constant across Basal, Early, and Late conditions to ensure direct comparability of kinetic measurements across stages.

      (4) Quantification of event counts and long-binding durations

      The number of binding events and the duration of long-binding events are influenced by imaging conditions. Provide a more detailed analysis of how these variables were controlled and assess the sensitivity of the results to detection and tracking parameters.

      Agreed.

      We will (i) normalize per-cell binding-event counts to nuclear cross-sectional area (extracted from the segmented nuclear masks already in the STRAP pipeline) to control for differences in nuclear size; (ii) report the tracking-parameter sensitivity sweep described above; and (iii) confirm in the revised Methods that all imaging conditions (laser power, exposure, dye concentration, sample preparation) were held constant across stages and cell types, consistent with the existing manuscript text. Per the Reviewing Editor’s guidance, the planned labeling-efficiency and absolute-molecule-quantification experiments will further constrain the interpretation of binding-event counts across conditions.

      (5) Evidence that spots are single molecules

      Provide evidence that spots represent single molecules.

      Agreed.

      We will include a small number of per-event intensity traces from our STRAP tracking output, selected to illustrate the single-step photobleaching behavior characteristic of single-molecule emission (intensity remains approximately constant during the binding event and then drops to background in a single step). The nuclear-fluorescence measurements from the planned labeling-titration experiment will also allow us to confirm that bound-spot densities are consistent with single-molecule occupancy at the labeled fraction used for tracking.

      (6) Description of the spot-analysis pipeline

      The Methods should include a detailed STRAP pipeline description.

      Partially agreed; the existing STRAP reference is appropriate, but the Methods will be expanded.

      STRAP (Haque & Coleman, 2025) is a consolidated, automated implementation of two well-established, previously published frameworks: SLIMfast / multipletarget tracing (Sergé et al., 2008) and evalSPT (Normanno et al., 2015), both of which are cited in the original manuscript. We will expand the Methods to describe the parameter set used in our analysis (detection thresholds, linking radii, gap-frame allowance, photobleaching correction model) so that readers can assess the analysis without referring exclusively to the STRAP manuscript and code repository, while preserving the cited STRAP reference for the full algorithmic description. We respectfully suggest that a complete pipeline description duplicating Haque & Coleman (2025) would not be appropriate in a primary research article.

      (7) Differences among cell systems

      The three cell systems yield notably different results. Provide a more detailed explanation for these differences.

      Agreed.

      We will also explicitly describe the caveats of the engineered systems versus the native GATA2-SNAP primary-cell system, in which endogenous GATA2-SNAP remains under physiological regulation. Specifically, we will discuss how variables such as the GATA1null background, ectopic forced nuclear import of GATA1-ERT, and ectopic GATA2-Halo in G1E-ER4 cells, as well as ectopic GATA2-Halo, endogenous GATA1, and cytokine signaling in HPC7 cells, likely contribute to the observed differences in signatures.

      Reviewer 2 (Public review):

      (1) Expression levels of the GATA2-HaloTag transgene

      Determine the expression levels of the GATA2-HaloTag transgene over the course of differentiation under the conditions used for single-molecule imaging.

      Agreed.

      This is the central concern flagged by the Reviewing Editor. For each cell line (G1E-ER4 and HPC7), we will (i) measure total nuclear GATA2-Halo fluorescence per cell under matched acquisition conditions and (ii) convert this fluorescence intensity to absolute molecules per cell using a Halo-CTCF/U2OS reference standard (Cattoglio et al., 2019; absolute CTCF abundance quantification applied previously by our group). This will provide per-cell GATA2Halo molecule counts at each differentiation stage (Basal, Early, Late). For the primary GATA2SNAP cells, we will perform the analogous comparison against a SNAP-RPB1/U2OS standard.

      (2) Fraction of molecules labeled

      Carry out a titration of the HaloTag ligand and compare the amount of labeled protein under single-molecule imaging conditions to that of saturating labeling.

      Agreed.

      We will perform HaloTag-ligand and SNAP-tag-ligand titrations in each cell type, comparing nuclear fluorescence under the limiting-label conditions used for single-molecule tracking with that under saturating labeling. This will yield a per-cell-type labeled fraction and allow us to confirm that comparisons of binding-event counts across conditions are not confounded by differences in labeling efficiency. The labeled-fraction values will be reported in a new supplementary figure and incorporated into our quantification of binding-event rates.

      (3) Robust single-particle tracking

      Show images of particle trajectories or movies superimposing trajectories on imaging data.

      Agreed.

      We will generate visualizations of selected long-lived binding events with single-particle trajectories overlaid on the imaging data — using a multi-frame color overlay (e.g., five sequential frames in distinct colors superimposed) so that linkage of the spot across frames is visually unambiguous — and include them as a new supplementary figure or movie. Examples will be drawn from each cell system to demonstrate consistent tracking quality.

      Reviewer 3 (Public review):

      (1) Photobleaching correction; per-cell bleach lifetimes

      Report the per-stage (or per-cell) photobleaching lifetimes and the photobleachcorrected residence time values alongside apparent values, ideally with an H2B-Halo control.

      Agreed.

      Addressed by the photobleach-rate distribution and H2B-HaloTag control analyses described under R1 point 2. The supplementary figure will explicitly compare per-cell bleach lifetimes across stages, report photobleach-corrected residence-time values alongside apparent values and include H2B-HaloTag controls under matched conditions in each cell line.

      (2) Mechanistic differences across systems

      The three systems show qualitatively different signatures: residence time change in G1EER4, bound fraction expansion in HPC7 and primary cells. Reporting an on-rate proxy alongside k_off would help.

      Agreed.

      Addressed by the cross-system kinetic framing described under R1 point 7 and by the GRID state-spectrum analysis described under R1 point 1. We will explicitly frame the three systems in terms of underlying kinetic mechanism in both Results and Discussion, following the conceptual distinction emphasized by Ling et al. (Science 2026) in which residence time reports binding stability once engaged, whereas changes in bound fraction or event frequency can indicate altered association/recruitment efficiency. In this framework, the G1E-ER4 residencetime signature is consistent with reduced dissociation (a longer-lived bound state), while the longlived-fraction expansion in HPC7 and primary cells is consistent with an increased target-search efficiency or specific-binding-competent pool. Alongside the GRID-derived state-spectrum analysis, we will report an apparent engagement-rate proxy calculated as binding events per unit imaging time normalized to detectable molecule number; this proxy is an approximation, not a direct k_on measurement, as accurate determination of k_on from single-molecule tracking requires concentration-dependent on-rate experiments that are outside the scope of the present study. We thank the reviewer for this suggestion, which we agree sharpens rather than alters the central message.

      (3) Per-cell GATA2 concentration and the uncoupling claim

      Quantify total nuclear GATA2-Halo signal per cell across stages; for primary cells, a western blot or quantitative immunofluorescence on flow-sorted populations would make the uncoupling argument more defensible.

      Agreed.

      For the cell lines, the per-cell nuclear GATA2-Halo quantification described in our response to R2 point 1 addresses this point.

      For primary cells, where the biological claim is strongest, we will exploit the endogenous Gata2SNAP knock-in itself as a quantitative reporter of total GATA2 protein. Specifically, we will label flow-sorted CD71/Ter119 populations from Gata2-SNAP mouse bone marrow with SNAP-Cell 647-SiR at saturating concentration in a parallel acquisition to the limiting-label single-molecule tracking experiment. Total nuclear SNAP-GATA2 fluorescence at saturating labeling provides a measure of endogenous GATA2 abundance per cell at each erythroid stage, in the same chemistry used for our single-molecule measurements, and will be benchmarked against a SNAPRPB1/U2OS reference standard for absolute molecule counting. This approach (i) measures the protein of interest in the labeling chemistry already established in this study; (ii) avoids reliance on quantitative immunofluorescence, which we have not been able to validate under our flowsorted-cell conditions; and (iii) extends the same analytical framework — saturating versus limiting labeling, with U2OS reference standards — across cell lines and primary cells. Quantitative western blotting on flow-sorted populations remains an alternative we will consider if specifically requested by the reviewers.

      (4) Single-cell distribution analysis

      Distribution-based statistics (K-S test, mixture model) rather than (or alongside) meanbased ANOVA, particularly for the Early populations, which look potentially bimodal.

      Agreed.

      We will perform Kolmogorov–Smirnov and Gaussian mixture model analyses of the single-cell long-lived fraction and residence-time distributions across stages, reporting these alongside the existing Welch ANOVA results in a new supplementary figure. This analysis is consistent with the conceptual framework cited in the manuscript (Wheat et al., 2020; Palii et al., 2019) for probabilistic hematopoietic transitions and may reveal subpopulation structure underlying the Early-stage signal. The GRID analysis further complements this by formally testing whether multi-state mixture models are statistically preferred at each stage. However, GRID analysis requires aggregating binding events across cells, which limits our ability to monitor changes in population dispersion at the single-cell level.

      (5) Quantitative integration of CUT&Tag with SMT

      Attempt a back-of-the-envelope calculation of whether the residence-time or fraction changes are quantitatively consistent with the acquisition of the 1,167 Early-restricted sites.

      Partially agreed; will attempt an order-of-magnitude framing.

      We thank the reviewer for this thoughtful suggestion. We agree that more explicit framing of the quantitative relationship between the two datasets will strengthen the integration. We will add a paragraph to the Discussion presenting an order-of-magnitude calculation linking the observed residence-time and long-lived-fraction changes to the steady-state occupancy increase predicted at competent regulatory sites, with explicit caveats regarding (i) the inherently semi-quantitative nature of CUT&Tag signal and (ii) the assumptions required to translate population-averaged occupancy into the genome-wide site count observed. For the G1EER4 cells, we observe relatively minor shifts in population-mean behavior as single-cell dispersion increases. Therefore, it may be difficult to directly link population-based measurements (e.g. CUT&Tag) with single-cell kinetic measurements (SPT). This distinction between occupancy and dynamics is consistent with recent systematic SMT analysis of the eukaryotic transcription machinery, in which factors appearing persistently associated in ensemble genomic assays were shown to exchange on second-scale timescales in living cells (Ling et al., Science 2026), emphasizing that population genomic occupancy and single-molecule residence time are complementary but not directly interchangeable measurements. Closing this gap rigorously is a major hurdle for the field and will require substantial technology development on quantitative single-cell CUT&Tag occupancy measurements. We will therefore frame our analysis as a consistency check rather than a strict quantitative integration. The reviewer notes that this analysis “does not change the central message; it sharpens it,” and we agree.

      (6) Short-lived kinetic interpretation and tracking parameters

      The 1.5 s gap allowance is long relative to the short-lived residence times in primary cells. A sensitivity analysis with tighter gap parameters would help. Also clarify how slowing of search reconciles with increased binding events at Early.

      Agreed.

      Addressed by the tracking-parameter sensitivity analysis described under R1 point 2. We apologize for the lack of clarity in our original description of the gap allowance. Our current maximum off-frame parameter is set to 2 frames, corresponding to a 0.5-s gap allowance. We will rerun the tracking analysis on representative datasets using a maximum off-frame parameter of 1, corresponding to no missed frames, and will report the resulting residence-time distributions alongside the original analysis to demonstrate robustness. We will also clarify in the Results and Discussion how changes in short-lived binding kinetics are reconciled with the increase in detectable binding events at the Early stage, drawing on the apparent engagement-rate proxy interpreted alongside the GRID-derived state-spectrum analysis.

      (7) CUT&Tag peak definition and quantitative analysis

      Report (a) signal intensity distribution at the 1,167 sites across stages (scatter or density plot beyond the heatmap) or (b) differential binding analysis (e.g., DESeq2). State replicate count and overlap of Early-restricted sets across replicates.

      Agreed; normalized fold-change analysis completed, with replicate-aware differential binding analysis planned if additional replicates are generated.

      We have performed a normalized count-based fold-change analysis of the union peak set from the existing GATA2 CUT&Tag dataset (14,468 peaks) using the goodpeaks framework previously used in our group, yielding per-peak log2 fold-change values and discrete dynamicstatus calls (Gained / Lost / Unchanged at |log2FC| ≥ 2) for each of the two transitions (Basal → Early at 0 vs 2 h, and Early → Late at 2 vs 24 h). This provides a conservative quantitative complement to the presence/absence peak-calling analysis presented in Figure 5; if additional replicate data are generated, we will perform replicate-aware differential binding analysis (DiffBind/DESeq2; Love et al., 2014; Stark & Brown, 2011) and report replicate overlap. This analysis addresses option (b) of the reviewer’s request and also enables the visualization requested in option (a) as a cross-stage scatter (Author response image 1). We present the quantitative analysis as a supplement to the presence/absence-defined Early-restricted set in Figure 5 of the manuscript, providing two orthogonal lines of evidence for the same biology. We note that the CUT&Tag experiments were initially performed as a validation step to confirm that the tagged GATA2-Halo constructs recapitulate endogenous chromatin-binding behavior, including appropriate genomic localization and expected GATA switch dynamics. This validation supports the conclusion that the observed single-molecule kinetics reflect physiologically relevant GATA2 engagement. Having established this, we subsequently extended the dataset to perform the quantitative analyses presented here.

      Quantitative findings.

      - 384 peaks were Gained (|log2FC| ≥ 2) at the Basal → Early transition.

      - 1,006 peaks were Lost over the same transition.

      - 178 peaks were Gained at Basal → Early and subsequently Lost at Early → Late, defining the strict differentially-restricted Early set (Author response image 1, red points). This set represents the higher-confidence subset of the manuscript’s broader presence/absence-defined Earlyrestricted set (n = 1,167; defined as MACS2 peaks at q < 0.01 present at Early but absent at Basal and Late).

      - 200 peaks were Gained at Early and retained at Late, indicating stable acquisition.

      - 49 peaks were acquired only at the Late stage.

      The discrepancy between the broader presence/absence set (1,167) and the strict differential set (178) reflects the analytical choice the reviewer raised: presence/absence calls based on a peaksignificance threshold are sensitive to near-threshold peaks, whereas differential analysis with a fold-change cutoff captures only sites with quantitatively pronounced stage-restricted enrichment. We interpret these as two complementary definitions: the broader set captures all peaks meeting a stage-specific peak-calling criterion, and the strict subset isolates the most quantitatively dynamic core of that population.

      Importantly, the three named example loci shown in Figure 5D of the manuscript — Nono (promoter-proximal), Nr3c1 (intron 2), and Gata3 (distal intergenic) — all survive the strict differential criterion (each shows |log<sup>2</sub>FC| ≥ 2 in both transitions, consistent with a clean Gainedthen-Lost signature). The published example panel therefore represents the high-confidence intersection of both definitions, supporting the robustness of the manuscript’s selected illustrative cases.

      We will explicitly state the number of CUT&Tag replicates per stage in the revised Methods and figure legends. Where the differential analysis is currently based on a single replicate per stage, we will explicitly note this and treat the strict subset as a conservative confirmatory analysis. An additional replicate is under consideration for the full revision, and if performed, overlap of Earlyrestricted calls across replicates will be reported.

      Motif cross-validation against a matched-GC background using HOMER and/or MEME-ChIP is planned for the strict differential subset and will be reported alongside the original SeqPos analysis in the revised Figure 5F or its supplement.

      Author response image 1.

      Cross-stage log<sub>2</sub> fold-change scatter for GATA2 CUT&Tag peaks. Each point represents a single peak in the union peak set (n = 14,468). The x-axis shows the log2 fold change from Basal (0 h) to Early (2 h); the y-axis shows the log2 fold change from Early (2 h) to Late (24 h). The sign convention follows the field-standard direction (positive log2FC = increased signal at the later time point). Peaks are colored by dynamic-status classification: unchanged/other (gray; n = 9,794); Lost at Early (blue; n = 109); Gained at Early and retained at Late (orange; n = 200); acquired only at Late (teal; n = 49); and Early-restricted, defined as Gained at Early and Lost at Late with |log2FC| ≥ 2 in both transitions (red; n = 178). The Early-restricted population occupies the lower-right quadrant, consistent with a transient kinetic peak of GATA2 binding.

      Author response image 2.

      Density representation of GATA2 CUT&Tag peak dynamics with Early-restricted peaks highlighted.

      Author response image 2 is shown for illustrative reference and is not annotated with a separate legend; it presents the same data as Author response image 1 in a hexbin density format to emphasize the bulk of unchanged peaks at the origin and the spatial separation of the Early-restricted set.

      Author response image 3.

      Genomic-annotation comparison of newly acquired GATA2 binding at Early. Stacked-bar comparison of genomic annotations (ChIPseeker classification) for two definitions of the newly acquired GATA2 peaks at the Early erythroid stage: all peaks Gained at Basal → Early (orange; n = 384) and the strict Early-restricted subset (Gained then Lost; red; n = 178). Annotation categories shown: Promoter (≤1 kb of TSS), Intron, Distal Intergenic, and Other (Exon, 5′/3′ UTR, Downstream). Both peak sets contain substantial promoter-proximal and distal/intronic components, consistent with the two-subclass model described in Figure 5E–G of the manuscript (GATA2-only promoter-proximal peaks with GATA/RUNX motifs, and GATA2/GATA1 cobound distal peaks with composite GATA/E-box motifs). The strict subset shows a higher proportion of intronic and distal-intergenic sites and a lower proportion of promoter-proximal sites than the full Gained set; this difference will be discussed transparently in the revised Results. Motif analysis (HOMER/MEME-ChIP, planned for the full revision) will be performed on both peak sets to confirm that the GATA/RUNX and GATA/E-box subclass signatures are preserved.

      (8) Knock-in mouse hematopoietic validation

      A brief characterization of basic hematopoietic parameters in homozygotes (CBC, LSK/HSPC frequencies, or colony assays) would confirm the tagged allele is physiological.

      Agreed; data acquired and analyzed.

      We have characterized mature trilineage hematopoietic populations in whole bone marrow from wild-type, heterozygous (Gata2Het), and homozygous (Gata2Homo) Gata2-SNAP knock-in mice (n = 5 per genotype). Bone marrow cells were stained for myeloid (CD11b<sup>+</sup> Gr1<sup>+</sup>), lymphoid (CD3<sup>+</sup>/CD4<sup>+</sup>/CD8<sup>+</sup>/B220<sup>+</sup>/CD19<sup>+</sup>), and erythroid (Ter119<sup>+</sup>) markers and analyzed by flow cytometry. Lineage frequencies are shown as percentages of live bone marrow cells in a new Figure Supplement in the revised manuscript.

      For myeloid and erythroid populations, omnibus one-way ANOVA detected no significant differences across genotypes (Myeloid: F(2,12) = 2.616, P = 0.1140; Erythroid: F(2,12) = 0.4943, P = 0.6219). Dunnett’s multiple-comparisons test against the WT control did not detect significant pairwise differences for either knock-in genotype (Myeloid: WT vs Het P = 0.1351, WT vs Homo P = 0.9926; Erythroid: WT vs Het P = 0.7017, WT vs Homo P = 0.9602).

      For the lymphoid compartment, although the omnibus ANOVA reached significance (F(2,12) = 6.690, P = 0.0112), no pairwise comparison against WT remained significant after multiplecomparisons correction (Dunnett’s adjusted P values: WT vs Het = 0.1217; WT vs Homo = 0.2078). We therefore interpret this result conservatively. Brown-Forsythe and Bartlett’s tests showed no significant differences in variance across genotypes (P = 0.1423 and P = 0.0908), so the result is not attributable to unequal variances. We do not interpret these data as indicating an unambiguous lymphoid phenotype in either heterozygous or homozygous Gata2-SNAP mice; this interpretation is consistent with the broader pattern across all three lineages, in which no pairwise comparison against WT survives multiple-comparisons correction. We will note in the figure legend and in the Results text that more granular HSPC-compartment analysis (LSK, MPP, lineage-restricted progenitor frequencies) and a complete blood count (CBC) remain valuable directions for future characterization of this resource and will be considered for the full revision if specifically requested.

      Author response image 4.

      Bone marrow trilineage frequencies in Gata2-SNAP knock-in mice. Bone marrow was harvested from the femurs and tibias of wild-type (WT), heterozygous (Gata2Het), and homozygous (Gata2Homo) Gata2-SNAP knock-in mice (n = 5 per genotype; mixed sex; 12–14 weeks). After ACK lysis, cells were stained for myeloid (CD11b<sup>+</sup> Gr1<sup>+</sup>), lymphoid (CD3<sup>+</sup>/CD4<sup>+</sup>/CD8<sup>+</sup>/B220<sup>+</sup>/CD19<sup>+</sup>), and erythroid (Ter119<sup>+</sup>) markers and analyzed by flow cytometry. Each dot represents one mouse, and horizontal bars indicate genotype means. Statistical results: Myeloid: ANOVA F(2,12) = 2.616, P = 0.1140; Dunnett’s adjusted P values WT vs Het = 0.1351, WT vs Homo = 0.9926. Lymphoid: ANOVA F(2,12) = 6.690, P = 0.0112 (omnibus); Dunnett’s adjusted P values WT vs Het = 0.1217, WT vs Homo = 0.2078. Erythroid: ANOVA F(2,12) = 0.4943, P = 0.6219; Dunnett’s adjusted P values WT vs Het = 0.7017, WT vs Homo = 0.9602. Brown-Forsythe and Bartlett’s tests for unequal variance were non-significant in all three lineages. Although the lymphoid omnibus ANOVA reached nominal significance, no pairwise comparison with WT remained significant after multiple-comparison correction; we therefore interpret this result conservatively (see response to R3 point 8).

      Summary

      We thank the editors and the three reviewers for the constructive and detailed assessment. The planned revisions consist of:

      - Four new experiments [planned] (HaloTag/SNAP labeling efficiency and absolute molecule counts via U2OS reference standards; H2B-HaloTag photobleaching reference; percell quantification of total endogenous GATA2 in flow-sorted primary Gata2-SNAP populations via saturating SNAP-tag labeling, benchmarked against a SNAP-RPB1/U2OS reference standard; single-molecule tracking of GATA2 N-terminal, C-terminal, and double zinc-finger deletion mutants in the engineered cell systems as a binding-deficient functional control).

      - Six analyses of existing data (GRID multi-state fitting [planned]; per-cell bleach-rate distributions and photobleach-corrected residence times [planned]; tracking-parameter sensitivity [planned]; nuclear-area normalization and total-displacement controls [planned]; normalized fold-change CUT&Tag analysis [completed; motif cross-validation planned], presented in Author response images 1–3; distribution-based single-cell statistics [planned]).

      - One previously-acquired dataset [completed] (trilineage hematopoietic flow cytometry of homozygous Gata2-SNAP knock-in mice; presented in Author response image 4 with full statistical detail).

      - Substantial revisions to text and figures [planned] to address statistical reporting, methodological description, mechanistic framing of cross-system differences, and refinement of the Figure 6 schematic.

      With respect to the requested binding-deficient single-molecule control, we will attempt to address this directly using sequence-validated lentiviral constructs in hand encoding GATA2 mutants lacking the C-terminal zinc finger, the N-terminal zinc finger, or both. These mutant analyses will be complemented by GRID multi-state analysis and H2B-HaloTag controls, providing converging lines of validation for the two-state kinetic framework. We note that an analogous mutant cannot be examined in the physiological context of the Gata2-SNAP knock-in mouse, and we will frame the cell-line mutant analyses accordingly.

      We believe these revisions directly address the editors’ specific guidance regarding labeling efficiency and methodological clarification. We thank the editors and reviewers for their time and look forward to submitting the revised manuscript.

      References cited in this response:

      References listed below are cited in this provisional response in support of the planned analyses and methodology.

      Cattoglio, C., Pustova, I., Walther, N., Ho, J. J., Hantsche-Grininger, M., Inouye, C. J., Hossain, M. J., Dailey, G. M., Ellenberg, J., Darzacq, X., Tjian, R., & Hansen, A. S. (2019). Determining cellular CTCF and cohesin abundances to constrain 3D genome models. eLife, 8, e40164. https://doi.org/10.7554/eLife.40164

      Gebhardt, J. C. M., Suter, D. M., Roy, R., Zhao, Z. W., Chapman, A. R., Basu, S., Maniatis, T., & Xie, X. S. (2013). Single-molecule imaging of transcription factor binding to DNA in live mammalian cells. Nature Methods, 10(5), 421–426. https://doi.org/10.1038/nmeth.2411

      Hansen, A. S., Pustova, I., Cattoglio, C., Tjian, R., & Darzacq, X. (2017). CTCF and cohesin regulate chromatin loop stability with distinct dynamics. eLife, 6, e25776. https://doi.org/10.7554/eLife.25776

      Haque, N., & Coleman, R. A. (2025). Dynamic transcription pre-initiation complex assembly governs initiation efficiency. bioRxiv. https://doi.org/10.1101/2025.05.07.652662

      Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P., Cheng, J. X., Murre, C., Singh, H., & Glass, C. K. (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Molecular Cell, 38(4), 576–589. https://doi.org/10.1016/j.molcel.2010.05.004

      Kaya-Okur, H. S., Wu, S. J., Codomo, C. A., Pledger, E. S., Bryson, T. D., Henikoff, J. G., Ahmad, K., & Henikoff, S. (2019). CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nature Communications, 10(1), 1930. https://doi.org/10.1038/s41467-019-09982-5

      Kenworthy, C. A., Haque, N., Liou, S.-H., Chandris, P., Wong, V., Dziuba, P., Lavis, L. D., Liu, W.-L., Singer, R. H., & Coleman, R. A. (2022). Bromodomains regulate dynamic targeting of the PBAF chromatin-remodeling complex to chromatin hubs. Biophysical Journal, 121(9), 1738–1752. https://doi.org/10.1016/j.bpj.2022.03.027

      Ling, Y. H., Liang, C., Wang, S., & Wu, C. (2026). Live-cell single-molecule dynamics of eukaryotic RNA polymerase machineries. Science, 391, eads0960. https://doi.org/10.1126/science.ads0960

      Liu, Z., Legant, W. R., Chen, B.-C., Li, L., Grimm, J. B., Lavis, L. D., Betzig, E., & Tjian, R. (2014). 3D imaging of Sox2 enhancer clusters in embryonic stem cells. eLife, 3, e04236. https://doi.org/10.7554/eLife.04236

      Loeffler, D., Wang, W., Hopf, A., Hilsenbeck, O., Bourgine, P. E., Rudolf, F., Martin, I., & Schroeder, T. (2018). Mouse and human HSPC immobilization in liquid culture by CD43- or CD44-antibody coating. Blood, 131(13), 1425–1429. https://doi.org/10.1182/blood-2017-07-794131

      Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNAseq data with DESeq2. Genome Biology, 15(12), 550. https://doi.org/10.1186/s13059-014-0550-8

      Machanick, P., & Bailey, T. L. (2011). MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics, 27(12), 1696–1697. https://doi.org/10.1093/bioinformatics/btr189

      Normanno, D., Boudarène, L., Dugast-Darzacq, C., Chen, J., Richter, C., Proux, F., Bénichou, O., Voituriez, R., Darzacq, X., & Dahan, M. (2015). Probing the target search of DNA-binding proteins in mammalian cells using TetR as model searcher. Nature Communications, 6, 7357. https://doi.org/10.1038/ncomms8357

      Palii, C. G., Cheng, Q., Gillespie, M. A., Shannon, P., Mazurczyk, M., Napolitani, G., Price, N. D., Ranish, J. A., Morrissey, E., Higgs, D. R., & Brand, M. (2019). Single-cell proteomics reveal that quantitative changes in co-expressed lineage-specific transcription factors determine cell fate. Cell Stem Cell, 24(5), 812–825.e5. https://doi.org/10.1016/j.stem.2019.02.016

      Sergé, A., Bertaux, N., Rigneault, H., & Marguet, D. (2008). Dynamic multiple-target tracing to probe spatiotemporal cartography of cell membranes. Nature Methods, 5(8), 687–694. https://doi.org/10.1038/nmeth.1233

      Stark, R., & Brown, G. D. (2011). DiffBind: Differential binding analysis of ChIP-Seq peak data. Bioconductor. http://bioconductor.org/packages/release/bioc/html/DiffBind.html

      Taylor, S. J., Stauber, J., Bohorquez, O., Tatsumi, G., Kumari, R., Chakraborty, J., Bartholdy, B. A., Schwenger, E., Sundaravel, S., Farahat, A. A., Dutta, A., Koche, R. P., Steidl, U., & Wheat, J. C. (2024). Pharmacological restriction of genomic binding sites redirects PU.1 pioneer transcription factor activity. Nature Genetics, 56(10), 2213–2227. https://doi.org/10.1038/s41588-024-01911-7

      Wheat, J. C., Salsman, J., Reekie, I., Mathhwala, A., Black, K. L., Tiedt, R., Shroff, H., & Steidl, U. (2020). Single-molecule imaging of transcription dynamics in somatic stem cells. Nature, 583(7816), 431– 436. https://doi.org/10.1038/s41586-020-2432-4

    1. Reviewer #2 (Public review):

      Summary:

      Hann and colleagues introduce a gaze-based analytical framework designed to capture, on a trial-by-trial basis, how people form and revise their predictions during implicit probabilistic sequence learning. Using an eye-tracking adaptation of an alternating sequence task, they record the first anticipatory saccade during the response-stimulus interval and classify each such saccade along two dimensions: whether it was directed toward a high- or low-probability upcoming stimulus (the learning-dependent vs. not-learning-dependent distinction), and whether the anticipated location coincided with the stimulus that actually appeared. A complementary iterative-updating metric codes whether a participant's prediction for a given three-element context is repeated or revised on successive encounters of that context.

      On the basis of these measures, the authors report that errors congruent with the inferred regularity - which they interpret as reflecting environmental noise - become progressively more frequent than errors reflecting an inaccurate internal model; that participants show a pronounced tendency to repeat their previous prediction rather than revise it; and that updates depend more on whether a prior belief is congruent with the task's statistical structure than on whether the previous prediction was confirmed. They interpret these results as evidence that statistical learning is less error-driven and more repetition-based (Hebbian in character) than is typically assumed.

      Strengths:

      The methodological ambition of the work is considerable, and the paper makes several contributions that are likely to be useful to the implicit-learning and predictive-processing communities. Using the first anticipatory saccade as a pre-response behavioral readout of prediction is conceptually well-motivated: it provides a trial-by-trial index of predictive orienting at a temporal resolution that manual reaction times cannot deliver, and it does so before the outcome of the trial is known. The explicit distinction between errors arising because the task's outcome is stochastic - that is, predictions congruent with the statistical structure but unconfirmed by the stochastic sample - and errors arising because the internal model is inaccurate is a theoretically meaningful move: predictive-coding and Bayesian accounts have long argued that these two sources of surprise should carry different weight for model revision, and the authors offer a behavioral operationalization of that distinction. The analytical pipeline is not tied to the specific paradigm used here and could be applied to other probabilistic sequence-learning tasks, which gives it broader methodological utility than a single-paradigm report. Finally, the demonstration that learners maintain their prior across successive occurrences of the same context, even when it has been disconfirmed by the most recent outcome, is a robust behavioral observation that speaks directly to an unresolved debate about whether statistical learning is dominantly error-driven.

      Weaknesses:

      The framework and the core behavioral observations are valuable, but several inferential steps - from the gaze signal to the cognitive constructs the authors invoke - are not fully supported by the present design, and these gaps affect how readers should interpret the stronger theoretical conclusions.

      The "process-pure" framing conflates sensitivity with construct purity. The authors repeatedly describe the eye-tracking measure as providing a more process-pure index of statistical learning than manual-response paradigms. Anticipatory saccades are themselves a learned motor behavior - the oculomotor system is among the most plastic motor outputs the primate brain generates, and sequence learning in the saccadic system is well-documented. The present design does not dissociate learning of the statistical structure from learning of the oculomotor sequence that expresses it, so the measure is not, on its face, free from the motor-learning confound that the authors criticize in button-press paradigms. The framing should be read as aspirational rather than as demonstrated by the present data.

      The oculomotor reaction-time data do not show the canonical signature of statistical learning. Reaction times for low-probability trials rise across epochs while those for high-probability trials remain approximately flat (Figure 5). The emerging difference between the two trial types, therefore, appears to be driven by a slowing of responses to low-probability stimuli rather than by a facilitation of responses to high-probability ones, and the authors do not rule out the alternative interpretations that this pattern reflects fatigue, a motor floor effect, or inhibition of unexpected locations. Because no fixation constraint is imposed during the response-stimulus interval, pre-stimulus gaze drift toward the anticipated location will artifactually reduce reaction time on precisely those trials the authors wish to treat as learning-driven; the fact that measured reaction times remain well above zero even on trials classified as correct anticipations is itself evidence that this contamination is present. The oculomotor reaction-time data, therefore, do not provide as clean a verification of learning as the manuscript implies.

      The correct/error labeling of anticipatory saccades incorporates information that the participant did not have. Because the first saccade occurs during the response-stimulus interval - that is, before the upcoming stimulus is revealed - the participant's internal predictive state is identical whether the trial is subsequently classified as a learning-dependent correct response or a learning-dependent error. Any difference in the epochwise frequency of these two categories must therefore be driven, at least in part, by the external stochastic structure of the task rather than by a difference in the predictive process itself. In particular, the observation that learning-dependent errors are the most frequent saccade type (Figure 7) is predicted by the prior probabilities of the outcomes alone, given a high-probability prediction, without appeal to any difference in predictive state. Readers should recognize that the theoretically meaningful contrast is between learning-dependent and not-learning-dependent anticipations (two categories), and that the four-way split risks confounding predictive state with outcome stochasticity.

      The iterative-updating metric does not distinguish prior revision from alternative processes. The binary update / no-update code, computed across non-contiguous occurrences of the same three-element context, does not discriminate between a genuine update of the internal model, simple episodic retrieval of a previously encountered triplet, and oculomotor perseveration. Without a formal generative model to anchor the interpretation, the central theoretical claim - that statistical learning is less error-driven than commonly assumed - is underdetermined by the data. The repetition pattern the authors observe is equally consistent with an error-driven model equipped with a low learning rate in a stable environment, an interpretation the authors themselves acknowledge in the Discussion. Adjudicating between these possibilities requires comparison against explicit computational models, which the present manuscript does not provide.

      Data loss and the absence of fixation control. An interpretable saccade is detected on fewer than half of all trials (48.76%; line 889), and the manuscript does not report the distribution of saccade counts per interval, the per-condition trial counts after all exclusions, or the decomposition of the 20% missing-data threshold into its underlying causes. Given that the entire inferential apparatus rests on this subset of trials, the degree of data loss is a relevant context for the reader. Separately, no fixation constraint is imposed between trials: the participant's starting gaze position at the onset of each response-stimulus interval is whatever position was reached at the end of the preceding response, and this starting position carries trial-history information correlated with the upcoming stimulus. This leaves open the possibility that what is classified as predictive orienting partly reflects the mechanical consequences of where the eye happened to be at the end of the previous trial. The authors defend the absence of a fixation cross on the grounds that it would transform the transitional structure of the task, but this is an empirical claim presented without a supporting citation.

      Heterogeneity within the high-probability condition is not addressed. The two routes to a high-probability triplet in the design - pattern-random-pattern (50% of trials) and random-pattern-random (12.5%) - differ both in their base rate and in the reliability of the contextual cue they provide. Collapsing across these subtypes is an analytical choice that may conceal heterogeneity in the underlying learning process.

      Appraisal: Do the results support the authors' conclusions?

      The framework succeeds in providing a trial-by-trial behavioral readout of predictive orienting that is more fine-grained than conventional reaction-time measures, and the behavioral dissociation between errors congruent with the regularity and errors reflecting an inaccurate internal model is a genuine empirical contribution. The conclusions about the mechanistic nature of statistical learning should be read as motivating hypotheses for future modeling work rather than as settled empirical claims.

      Impact and utility:

      The analytical framework introduced here is likely to be useful to researchers working on implicit learning, predictive processing, and Bayesian models of perception and cognition. The measure of predictive orienting and the iterative-updating code could be adapted to a range of probabilistic learning paradigms, and the behavioral dissociation between noise-driven and model-mismatch errors fills a methodological gap that the field has long acknowledged. The authors share their data and code openly, which will facilitate reuse. The most durable contribution of the paper is methodological; the theoretical claims about the nature of statistical learning will require additional computational modeling before they can be regarded as established.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors conducted a comprehensive benchmarking and evaluation of co-folding platforms, including AlphaFold3, Boltz-2, Chai-1, and the docking algorithm Dock3.7, which employs a physics-based scoring function that incorporates van der Waals interactions, electrostatics, and ligand desolvation energies. The system of interest was the SARS-CoV-2 NSP3 macrodomain (Mac1), an increasingly popular antiviral target, and the ligand sets comprised 557 unseen ligand poses (keeping the training for these co-folding platforms in mind). Additionally, the authors investigated whether the co-folding models could distinguish true ligands from non-binding small molecules. The study is thorough, with extensive statistical support and consensus across multiple metrics (chemoinformatics for quantifying ligand similarity and efficacy). The questions that the authors aim to address are whether the co-folding models struggle with memorization, whether they can distinguish between a true and a false binder, whether they replicate experimental binding affinities and efficacy, and how they compare to the physics-based docking algorithm (Dock3.7).

      We thank Reviewer 1 for this thoughtful summary of our work.

      Strengths:

      Overall, this is a scientifically solid paper. The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment.

      Weaknesses:

      My main concern is that the study's aim is a bit unclear. Modern benchmarking studies comparing physics-based docking with deep learning-based co-folding approaches (e.g., AF3, Boltz-2, Chai-1, and others) are increasingly expected to go beyond aggregate performance metrics.

      Indeed, we have gone into several examples of failures and successes for each of these methods. As we are not developing these methods ourselves, we also think this dataset will be a valuable contribution for improving them further.

      In addition to rigorous dataset construction, transparent methodology, and appropriate statistical evaluation, high-impact benchmarks typically provide actionable guidance on when each method class is most appropriate, reflecting their distinct inductive biases and practical constraints. Failure-mode analyses that link performance differences to protein flexibility, ligand chemistry, or binding-site characteristics are particularly valuable, as they move comparisons beyond "scoreboard" assessments toward mechanistic understanding.

      Right now, we do not observe meaningful trends that separate the failure modes for any individual method. This is covered in Supplementary Figures 6 and 7.

      While full biological validation is not expected, qualitative interpretation grounded in physical and biological principles strengthens conclusions. Providing reproducible workflows or reference pipelines is not mandatory, but it is increasingly viewed as a best practice because it facilitates adoption and helps contextualize results for practitioners.

      We note that our code is available (https://github.com/jongbin99/Cofolding/) and all structural data will be publicly accessible in the PDB alongside publication (we only held it back only for “blinding” during peer review to avoid contamination with any new deep learning methods).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Kim et al. evaluates the performance of three modern AI-based methods in predicting complex structures and binding affinities between proteins and chemical compounds. An honest 'prospective' evaluation is achieved by studying benchmark structures and chemical compounds that did not exist in the PDB at the time the AI structure prediction models (AlphaFold3, Chai-1, Boltz-2) were trained.

      Strengths:

      (1) The study addresses an important question in modern computational biology and drug discovery, and establishes the strengths and limitations of the three tools in solving various computational chemistry tasks, including compound pose prediction, active-inactive discrimination, and potency ranking.

      (2) The conclusions are based on examination of four separate targets and respective compound datasets, where for one of the targets, the authors also obtained numerous X-ray structures to serve as experimental answers for the binding pose prediction task.

      (3) The study reports relationships between structure prediction confidence, predicted energies (DOCK3.7), and affinity predictions (Boltz-2) with the geometric accuracy of compound pose prediction as well as the experimentally measured potency.

      (4) One of the key findings is the limited ability of co-folding methods to predict conformational rearrangements, which does not correlate with their ability to predict binding poses of the compounds inducing these rearrangements.

      (5) The findings could serve as useful guidelines for computational chemists in selecting appropriate software and scoring schemes for each task.

      We appreciate Reviewer 2’s summary of the novelty of the dataset and analysis.

      Weaknesses:

      While I consider this a solid study, several aspects would need to be addressed to make it really strong:

      (1) DOCK3.7 docking and scoring experiments were performed using one experimental structure of Mac1, selected from dozens of structures based on a criterion that is not sufficiently well justified. For sigma2 receptor, dopamine D4 receptor, and AmpC β-lactamase, it is not clear which structures or models were selected for docking at all. It is well known that geometry predictions, scoring, and active-inactive ROC AUCs are all strongly influenced by the selected structure. It would be important to attempt Mac1 docking using all available experimental Mac1 structures, or at least against representative structures in various conformations; it would also be quite insightful to compare results to docking of the same compound sets to AF3, Boltz-2 and Chai-1 predicted structures of Mac1. Same goes for the docking studies of sigma2, D4, and AmpC β-lactamase.

      In any program, a decision has to be made as to which template will be used for docking, we justified the choice in the methods:

      “We used this structure because the inhibitor (Z5014193706) was the most potent molecule with a structure determined around the same time as the ligands in this dataset were tested.”

      We stand by this as a reasonable assumption. Similarly, for sigma2, D4, and AmpC β-lactamase, the template was chosen in the respective papers:

      a) The σ2 receptor bound to cholesterol (PDB ID: 7MFI) was used in the docking calculations.

      - This structure was determined in the paper, the first structure of sigma2 and therefore a worthy template

      b) The D4 receptor campaign used PDB 5WIU

      - This was one of two D4 structures available and chosen because it was not bound to sodium

      c) For AmpC, the campaign used the structure in the Protein Data Bank (PDB) 1L2S

      - This maximizes comparisons to other docking studies that used the same receptor template.

      The major goal of this study is to compare different methods under reasonable (but perhaps as the reviewer points out, not optimal) conditions, not to optimize docking score.

      (2) For binding affinity predictions, as a control, authors should consider compound co-folding with an unrelated protein, or even with a pseudo-peptide that consists of a few random single amino acids - this would provide an honest baseline for such predictions.

      This suggestion would be valuable for understanding the performance for these methods from the perspective of ligand specificity (a valuable, but separate, goal). Surely this will generate some number or some prediction - but what would this baseline mean and how would it be relevant for drug discovery? Therefore, we do not think this suggestion is relevant for the issues being investigated in this manuscript.

      (3) ROC curves Figure 3 and elsewhere should be shown, and AUCs quantified/reported on a log or square-root scaled x-axis, to emphasize early enrichment, which is the area of practical significance for these predictions. For example, Figure 3A currently suggests that the pose prediction performance of AF3 exceeds that of Boltz-2 whereas the early enrichment is clearly better for Boltz-2.

      We agree with this, and added a semi-logAUC plot for Figure 3A. For Figure 5, we also generated a semi-logAUC plot to see early ligand enrichment clearly, added as Supplementary Figure 11. We added the text:

      “Considering its early enrichment performance, Boltz-2 Ligand ipTM was the strongest predictor of pose accuracy based on normalized logAUC (20.5% above random, Fig. 3a). In contrast, although Boltz-2 pIC50 showed poor overall discrimination, it overestimated its ability to enrich true positive poses at low false positive rates, despite having a weak early enrichment behavior”

      (4) 'Trained set' in figures and text should probably be 'training set'? Or otherwise explain this new term the first time it is introduced.

      Thank you for pointing out this for clarification. ‘Training set’ is the correct word, and we made changes appropriately across all figures and texts.

      (5) Figure 1 illustrates a projection onto the first two principal components of a space that apparently had only one (scalar) metric for each compound pair (% maximum common substructure or Tanimoto coefficient); the authors need to better explain the principle behind this analysis and visualization.

      This suggestion is valuable, since we often use PCA to reduce dimensionality for more complex features. For clarification, we actually have a full pairwise similarity matrix for all tested Mac1 compounds based on each of Tc and MCS%. PCA for each MCS% and Tc is a representation of each pairwise similarity matrix. We also made a change in Figure 1 caption to make this point clearer:

      “projection of compounds represented by their full pairwise similarity vectors (by ECFP-4 Tc and MCS%)”

      Reviewer #3 (Public review):

      Summary:

      This study's core conclusions are well-supported by data. It is shown that co-folding outperforms docking in known ligand pose/affinity prediction (validated by RMSD and IC₅₀ correlation), struggles with false-positive discrimination in virtual screens (lower AUC values), and is complementary to docking (non-correlated errors, distinct strengths in drug discovery stages).

      Strengths:

      (1) Unprecedented prospective design with 557 novel Mac1-ligand complexes ensures rigorous, independent evaluation of co-folding methods.

      (2) Comprehensive comparison of 3 co-folding tools (AlphaFold3, Chai-1, Boltz-2) with DOCK3.7 across diverse targets and metrics enables nuanced performance assessment.

      (3) The study clearly demonstrates complementary roles of co-folding (superior pose/affinity prediction for known ligands) and docking (better hit prioritization), and addresses deep learning memorization concerns via ligand similarity analysis.

      We thank Reviewer 3 for pointing out the unprecedented and comprehensive nature of our study

      Weaknesses:

      (1) Limited generalization to diverse protein families (e.g., no ion channels/transporters).

      We agree - we have not explored the entire proteome and these are important target classes that will surely be investigated by future studies. We focused on targets here where we had large number of X-ray crystal structures (Mac1) and affinity/inhibition measurements from docking (the other three targets).

      (2) Ambiguity in the mechanism underlying co-folding's failure to predict rare conformational changes.

      Again, we agree. We are not the developers of these methods. We observe that these methods do not predict conformational changes with high fidelity and this weakness is an area that co-folding methods will surely prioritize in the future.

      (3) Virtual screen comparison is unbalanced (docking-prioritized hit lists bias results).

      We acknowledge this in the results: “An important caveat is that the hit-lists were composed of molecules prioritized by docking in the first place, giving it an advantage on these particular sets.” and discussion: “Finally, comparing co-folding to docking based on hit-lists themselves selected by docking is arguably unfair to co-folding. Counter-balancing this is the inclusion, in each of the three hit lists, of molecules that had mediocre and poor docking scores intentionally selected to test the correlation between docking score and hit-rate. Here too, the correlation between co-folding score and likelihood to bind, what we sometimes call a “dock-response-curve” was no better than docking’s, often worse (SFig.11).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are suggestions for revisions:

      (1) The writing is at times obtuse and hard to follow.

      This happens sometimes when multiple authors are writing together. We apologize and are happy to respond to specific areas that can be streamlined to be easier to follow.

      (2) In the Results section, "A set of 557 previously unreported Mac1 ligand complexes", the authors have compared the ligand poses across different metrics such as Tc - a standard, highly effective method in chemo-informatics and MCS (maximum common substructures); these are standard metrics for quantifying the structural similarity between pairs of small molecules. This part of the analysis checks whether this is memorization; it is critical to compare the two metrics, but it is not sufficient to draw a conclusion.

      Thank you for pointing out about the structural similarity of molecules co-folded to those present in the training set (resolved as Mac1 complexes and deposited in PDB before training dates). We have conducted an analysis where we do a pairwise similarity comparison for all ligands present in the PDB (regardless of the target), by both Tc and MCS, and overlay the cluster of ligands we tested (Mac1, AmpC, sigma2, D4). This should show where our tested benchmark datasets lie in the chemical space covered in the entire PDB. Each cluster (around 500 to 1300 compounds per target system) is overlaid on the cluster of all ligands deposited in PDB (over 50,000 compounds), and each cluster was relatively diverse by both Tc and MCS.

      (3) In the "Co folding can accurately reproduce poses of ligands dissimilar to those trained." Subsection under Results, the authors' conclusions are hard to follow; they state that the co-folding models often mispredict or miss the alternative conformation, but they also predict poses that are distinct from the training set. What does that imply?

      Our interpretation is actually a somewhat unsettling one: co-folding gets the ligand pose right even when it gets the protein wrong, and even when the ligand is novel. This suggests the models may be anchoring on conserved pharmacophoric interactions (like the adenosine-mimicking purine scaffold) rather than truly modeling the physics of the full complex. We added to the results section:

      This result suggests that co-folding reliably recapitulates dominant ligand-binding interactions even in the absence of accurate protein conformational modeling, providing further support to the idea that they are learning specific interaction patterns rather than a deeper physics-based representation (Masters et al. 2025).

      (4) The Discussion section connects the results and conclusions, but it can be challenging to grasp the study's overall message.

      We think the final paragraph hits on three major points:

      - Co-folding accurately predicts ligand poses for known binders, but fails to capture conformational changes

      - Co-folding does not reliably distinguish true binders from false positives in virtual screening hit lists

      - Docking and co-folding are complementary rather than competing tools

      (5) The work is highly detailed and well executed, featuring thorough data analysis and statistical assessment. The value of the paper would be further enhanced by explaining how it differs from seemingly similar results reported in other studies, including the one cited in this manuscript (see https://www.biorxiv.org/content/10.64898/2025.12.04.692352v1).

      The Mac1 results are completely unique. However, the docking datasets are exactly the same as those analyzed in the Menon et al manuscript. We don’t think our results differs from conclusions of the Menon et al manuscript as we wrote: These observations are supported by a fascinating study on some of the same ligand sets as investigated here, using AlphaFold3, reaching similar conclusions (Menon et al. 2025).

      Reviewer #3 (Recommendations for the authors):

      (1) Expand target diversity to include ion channels, transporters, etc., beyond enzymes and GPCRs.

      (2) Investigate the cause of co-folding's failure in predicting rare conformational changes (e.g., adjust sampling, MSA inputs, or add experimental constraints).

      (3) Mitigate docking bias in virtual screens (e.g., re-analyze unbiased compound libraries).

      We addressed these three points in the public review above

      (4) Test Boltz-2's affinity predictions without linear calibration and compare with FEP.

      The data without linear calibration are included in the manuscript. Comparing such a large number of compounds with FEP is currently beyond our capabilities.

      (5) Conduct proof-of-concept to test co-folding-docking integration for better hit rates.

      We think this is well beyond the scope of this manuscript - but look forward to testing this idea in the future.

      We also got one community review that we respond to below:

      Summary

      This manuscript evaluates the performance of co-folding models when tasked with 1) the recapitulation of a large number of experimentally determined co-crystal structures of Mac1 with a series of Mac1 ligands and 2) the rescoring of hits to identify false positives originally derived from a set of large docking-based virtual screens. The evaluation leverages a dataset of crystal structures and affinity data from high-throughput crystallographic and biophysical screens, respectively. These data uniquely enable this report to focus on the ability of co-folding models to handle ligands, resulting in an analysis that is particularly timely given the wide adoption of co-folding models and the relative scarcity of such ligand-focused benchmarks among existing evaluations, which have primarily focused on protein structure prediction or binder design.

      Thank you for this thoughtful summary of our work

      Feedback

      The experiments and analyses in the manuscript are well thought-out and do not have any significant issues. There are a few high-level points that may improve the clarity and completeness of the results. Importantly, none of the suggested additional experiments will affect the conclusions of the paper, but rather help provide additional context for the results:

      The first section presents an exciting opportunity to frame the Mac1 ligands against ligands in the PDB more broadly. It would be informative to assess whether chemotypes that are easier or harder to predict accurately and confidently are over- or under-represented in the PDB as a whole. Note that this is not a recommendation that new scaffold similarity metrics be incorporated into the analysis, but rather that analyses similar to those already performed in the manuscript are performed using all ligands in the PDB. For example, PCA-based analyses similar to those in Fig. 1c could be used to examine Mac1 ligands in the context of all PDB ligands enabling questions such as whether similarity to a nearest PDB neighbor, cluster size in a Tc/MCS PCA space, or other frequency-based measures show any relationship with prediction vs. crystal structure RMSD. Such analyses could provide additional insight into how effectively models leverage ligand information present in the PDB overall, as opposed to biases arising specifically from scaffolds represented in Mac1 structures in the PDB, which are already well covered in the manuscript. The conclusion that Tc/MCS do not correlate with the ligand RMSDs for the ligands already associated with the Mac1 is well supported, and presumably suggests that a correlation would not exist against the backdrop of the PDB, but it would be interesting to see the data using analyses similar to those already done in the manuscript nonetheless.

      We are adding new figures in SFig.1 that consider how different clusters of ligands tested for our co-folding analysis are distributed across the chemical space in PDB. This is done by making a similarity comparison between every ligand in PDB and those tested in our analysis by Tc and MCS%, then plotting in PCA space for each metric. We are excited to see that each dataset covers a wide scope in PCA space, but at the same time, there are unexplored areas in the chemical space of PDB by co-folding.

      Similarly, even though the four proteins used in this manuscript are not themselves the primary focus of the analysis, it would be valuable to perform a high-level assessment of the precedent for each protein in the PDB (beyond the count of liganded structures in Table S6), either in protein sequence space (e.g., MSAs) or structural space (e.g., FoldSeek). An analysis like this would provide important context about whether any of the proteins in the study have close homologs with liganded structures in the PDB, or are generally overrepresented in the PDB. The fact that the AUC for L-pLDDT for AmpC is higher than σ2 and D4, for example, is notable given the relative abundance of liganded AmpC structures in the PDB (this raises potentially interesting questions related to where DOCK3.7 and AF3 actually place the ligands, given the orthosteric β-lactam binding pocket in AmpC, although this is outside of the scope of this manuscript).

      High-level assessment of the precedent for each protein in the PDB will definitely help to understand if proteins we used have close homologs with liganded structures in the PDB. Our Supplementary Table 6 covers the extent to which these liganded structures were available by cutoff dates for AF3, Chai-1 and Boltz-2. AmpC had more homologs than sigma2 and D4, and this may explain a better AUC for AF3 L-pLDDT specifically for this target.

      A discussion of the affinity probability results (`affinity_probability_binary`) from Boltz-2 is likely warranted in the second section in addition to the pIC50s that are already reported (`affinity_pred_value`). The former seems like it would be more applicable for section 2 of the manuscript, but both warrant inclusion—they should both be calculated by default when the affinity pipeline in Boltz-2 is turned on, so it wouldn't involve any more inference.

      As boltz-2 affinity module outputs both affinity probability binary output and affinity predicted value, we kept track of both metrics. So we tried re-ranking hit lists using both metrics. Where boltz-2 performed better (Sigma2, D4), binary probability values were more representative as a metric to differentiate true actives from non-binders. This was more clear in semi-logarithmic ROC plots. However, in AmpC, both Boltz-2 scoring metrics performed similarly. Such inconsistency in trend made it difficult to draw conclusions.

      Minor points

      A more detailed description of the experimental methods used to generate the ground-truth data in the introduction (even though these have been explained in prior works) would help orient the reader early on, and ground the benchmarking aspect of the story. In general, the abstract and introduction would benefit from a more cohesive through-line to tie the two complementary but orthogonal sections of the paper together.

      We will include a more thorough description alongside the PDB depositions. As for the two sections, we have tried to tie them together from the perspective of drug discovery workflows…

      The cutoffs in the "Co-folding can accurately reproduce..." section shift between 2.5 Å (from the ligand center of mass) and 2.0 Å. Is there a reason for this? Along similar lines, mentioning cutoffs for true positives/negatives when introducing the ROC analyses later on in the Mac1 section seems unnecessary since no cutoff should be necessary here.

      We used 2.5A distance to COM to just get at “broadly the correct binding site” for fast filtering and 2.0A RMSD because that is the broadly accepted standard in the field for “relatively correct binding pose”.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2026-03407

      Corresponding author(s): Laura Cantini, Julio Saez-Rodriguez

      [The "revision plan" should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      • *

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      • *

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank both reviewers for their thorough and constructive evaluation of our manuscript.

      Reviewer 1 highlighted that the manuscript would benefit from 1) a stronger positioning of ReCoN within the existing literature on multicellular modelling and network exploration, 2) a justification of our methodological choices, including the use of Random Walk with Restart (RWR), 3) the choice of input datasets for GRN inference and an assessment of the robustness of ReCoN's predictions to noise in these networks, 4) a more systematic exploration of ReCoN's parameter space (restart probability, layer transition probabilities, filtering thresholds).

      Reviewer 2 raised concerns about 1) the generalisability of the α parameter value (by default, 0.8) across independent datasets, 2) the expected contribution of the indirect effect in prediction performances, 3) the robustness of GRN across datasets and systems, and 4) the need for more quantitative validation in the spatial/microenvironment showcase. They also pointed out an unsupported claim regarding gene knockout prediction in the abstract.

      Several clarifications on figures, methods, and writing were also requested by both reviewers.

      As the main addition to the manuscript, we propose a new showcase based on the recently published Human Cytokine Dictionary (Oesinghaus et al., 2025). This showcase will simultaneously address several reviewer concerns by allowing us to 1) test the robustness and performance of α = 0.8 in an independent dataset, 2) evaluate the impact of different GRN inference methods (HuMMuS, SCENIC+, CellOracle, GRNBoost2) and noise on ReCoN's predictions..

      We will conduct a systematic parameter exploration on the Heart Atlas showcase, covering restart probability and inter-layer transition probabilities. We will additionally strengthen the validation of the microenvironment showcase by providing additional comparison to matched single-cell fibroblast data.

      Regarding the manuscript, we will substantially expand the discussion to better contextualise ReCoN within existing multicellular modelling approaches and the methods to justify our methodological choices (RWR/MultiXrank, dataset selection). We will remove the unsupported gene knockout claim from the abstract and reframe it as a future direction. In addition, we will clarify the distinction between ReCoN variants and rename them for clarity in the results section 1.2., improve figure legends. Finally, we will also work on the tool's documentation, including new tutorials on using spatial data and on running ReCoN with scRNA-seq-only GRN inference.

      We believe these revisions will substantially strengthen the manuscript and address the reviewers' concerns regarding method's robustness, generalisation, and contextualisation.

      2. Description of the planned revisions

      Reviewers' comments are in blue

      Authors' answers are in black

      Proposed text modifications are in green

      Reviewer #1

      R1.1. This is a very well-written paper; the methods used are adequate, and the use cases are relevant and broad, exploiting state-of-the-art datasets and tools.

      The author's claims are mostly justified. The authors could make an effort to more explicitly cite other efforts in similar directions. The claim 'We envision ReCoN as an extension to prior multicellular modelling, offering an interesting compromise between prediction of cell type responses and understanding of their molecular coordination.' is very general and could be better substantiated. In fact, the authors do not really give examples of alternative approaches to study systems of interacting cells, other than mechanistic agent-based models, which are clearly very different.

      Response:

      We thank the reviewer for pointing out the lack of contextualisation for ReCoN in this closing discussion.

      We wanted to remind that ReCoN builds notably on multicellular factor decomposition methods. We also want to emphasise the interest in completing cell communication methods that describe the big picture in multicellular interactions.

      • *

      We proposed to *explicitly state these two points with such rephrasing: *

      • *

      Network-based representations of multicellular systems have been an active field for many years, from early conceptual cytokine networks (Frankenstein, Alon, and Cohen 2006) to curated ligand-receptor cascades of hematopoietic tissue (Kirouac et al. 2010, Qiao et al. 2014). In parallel, and from bulk RNA-seq, the consideration of tissue specificities in GRN inference has been another way to consider the importance of the context in molecular mechanisms reconstruction (Sonawane et al. 2017). Single-cell analysis allowed decomposing tissue composition and quantifying gene expression, opening the possibility of scaling the inference of these networks and the inference of multicellular mechanisms in general, to large sets of molecules. Several methods have been developed to recover multicellularity. A first direction extends ligand-receptor interaction inference into the receiver cell response through curated signalling cascades, yielding ligand to target cascades (Browaeys, Saelens, and Saeys 2020, Jin et al. 2021, Zhang et al. 2021, Yan et al. 2025). A second direction leverages spatial context through explainable multi-view models that decompose marker variation in both intra- and intercellular contributions (Arnol et al. 2019, Tanevski et al. 2022), without considering the mediating cascades. Finally, the more recent family of multicellular factor decomposition methods focuses on the coordinated aspect of cellular programs rather than on the mechanisms. ReCoN's methodology proposes a network-based approach based on single-cell data and the philosophy of this last group of methods. Indeed, ReCoN aims to retrieve links between molecular drivers and such coordinated multicellular programs by bridging and exploring CCC inference and GRN modelling (Badia-i-Mompel et al. 2023) within large and coherent heterogeneous multilayer network.

      Arnol D, Schapiro D, Bodenmiller B et al. Modeling Cell-Cell Interactions from Spatial Molecular Data with Spatial Variance Component Analysis. Cell Rep 2019;29(1):202-211.e6. https://doi.org/10.1016/j.celrep.2019.08.077.

      Badia-i-Mompel P, Casals-Franch R, Wessels L et al. Comparison and evaluation of methods to infer gene regulatory networks from multimodal single-cell data. Preprint, bioRxiv, 21 Dec. 2024, 2024.12.20.629764. https://doi.org/10.1101/2024.12.20.629764.

      Badia-i-Mompel P, Wessels L, Müller-Dott S et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24(11):739-54. https://doi.org/10.1038/s41576-023-00618-5.

      Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 2020;17(2):159-62. https://doi.org/10.1038/s41592-019-0667-5.

      Frankenstein Z, Alon U, Cohen IR. The immune-body cytokine network defines a social architecture of cell interactions. Biol Direct 2006;1(1):32. https://doi.org/10.1186/1745-6150-1-32.

      Jin S, Guerrero-Juarez CF, Zhang L et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12(1):1088. https://doi.org/10.1038/s41467-021-21246-9.

      Kirouac DC, Ito C, Csaszar E et al. Dynamic interaction networks in a hierarchically organized tissue. Mol Syst Biol 2010;6(1):MSB201071. https://doi.org/10.1038/msb.2010.71.

      Oesinghaus L, Becker S, Vornholz L et al. A single-cell cytokine dictionary of human peripheral blood. Preprint, bioRxiv, 15 Dec. 2025, 2025.12.12.693897. https://doi.org/10.64898/2025.12.12.693897.

      Qiao W, Wang W, Laurenti E et al. Intercellular network structure and regulatory motifs in the human hematopoietic system. Mol Syst Biol 2014;10(7):MSB145141. https://doi.org/10.15252/msb.20145141.

      Radig J, Droit R, Doncevic D et al. Tracking biological hallucinations in single-cell perturbation predictions using scArchon, a comprehensive benchmarking platform. Preprint, bioRxiv, 27 June 2025, 2025.06.23.661046. https://doi.org/10.1101/2025.06.23.661046.

      Sonawane AR, Platig J, Fagny M et al. Understanding Tissue-Specific Gene Regulation. Cell Rep 2017;21(4):1077-88. https://doi.org/10.1016/j.celrep.2017.10.001.

      Tanevski J, Flores ROR, Gabor A et al. Explainable multiview framework for dissecting spatial relationships from highly multiplexed data. Genome Biol 2022;23(1):97. https://doi.org/10.1186/s13059-022-02663-5.

      Yan L, Cheng J, Nie Q et al. Dissecting multilayer cell-cell communications with signaling feedback loops from spatial transcriptomics data. Genome Res published online 12 May 2025. https://doi.org/10.1101/gr.279857.124.

      Zhang Y, Liu T, Hu X et al. CellCall: integrating paired ligand-receptor and transcription factor activities for cell-cell communication. Nucleic Acids Res 2021;49(15):8520-34. https://doi.org/10.1093/nar/gkab638.

      R1.2. Moreover, the exploration of the multilayer networks with RWR is a very reasonable choice but could there be other approaches? I think the authors could discuss this issue to briefly support their choice of this method.

      Response:

      It is a very relevant comment, as this choice has not been discussed in the paper; we propose extending the method section about ReCoN's networks exploration with a justification about this choice.

      • *

      There is currently a limited set of network exploration methods that have been implemented for multilayer networks. It includes notably pymnet (Nurmi et al., 2024), natively adapted to heterogenous multilayer networks, and multinet (Bagavathi et al., 2019) and muxviz (De Domenico et al., 2015), initially developed for multiplexed networks (e.g. social network where the same set of nodes is present in each layer) but adaptable to more complex multilayer networks. However, to our knowledge, only MultiXrank proposes a robust measurement of proximity between each pair of nodes.

      Indeed, pymnet does not propose implementation for pairwise distance, similarly for muxViz, which focuses on community and motif detection. Multi-net does propose pairwise distance based on shortest paths, but implements it only for nodes of the same multiplex (e.g. in our network, it would only be two genes, or two receptors, respectively). https://www.rdocumentation.org/packages/multinet/versions/4.3.2/topics/multinet.distance

      • *

      We provide the additional justification for choosing RWR and MultiXrank over a reimplementation of another method or an extension of another method.

      • *

      • The total complexity of the RWR is O(δm) - when the number of nodes is negligible compared to the number of edges, with m the number of edges and δ the number of iterations in the walk (Baptista et al., 2022 - Supp Notes 2.A; Jin W. et al, 2019). This linear increase with the number of edges is particularly interesting for large networks, such as ReCoN ones that can contain several million* edges. The number of iteration δ and the computational time increases inversely to the restart probability, which is an important factor to keep this probability high. *

      • *

      • *MultiXrank is particularly interesting for its flexibility as it allows to easily attribute different weights to the different layers and to precise the direction of the exploration easily. *

      • *

      • It also produces deterministic results by prolonging exploration until convergence.

      • *

      • Additionally, in the context of ReCoN, the indirect effect of each cell is run independently. We previously extended the implementation of multiXrank for running RWR in parallel in a previous work (Trimbour et al., 2024), making it already adapted for optimising ReCoN's explorations.

      • *

      For all these reasons MultiXRank implementation seemed to be the best choice for robust and efficient exploration of ReCoN's HMLN.

      • *

      Bagavathi, A., Krishnan, S. (2019). Multi-Net: A Scalable Multiplex Network Embedding Framework. In: Aiello, L., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L. (eds) Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 813. Springer, Cham. https://doi.org/10.1007/978-3-030-05414-4_10

      Manlio De Domenico, Mason A. Porter, Alex Arenas, MuxViz: a tool for multilayer analysis and visualization of networks, Journal of Complex Networks, Volume 3, Issue 2, June 2015, Pages 159-176, https://doi.org/10.1093/comnet/cnu038

      Nurmi et al., (2024). pymnet: A Python Library for Multilayer Networks. Journal of Open Source Software, 9(99), 6930, https://doi.org/10.21105/joss.06930

      Jin, Woojeong, Jinhong Jung, and U. Kang. "Supervised and extended restart in random walks for ranking and link prediction in networks." PloS one 14.3 (2019): e0213857

      R1.3. Generally the discussion should provide the reader the context in the existing literature in which the work can be set, detailing its impact. I think this could be improved.

      Response:

      • *

      We hope that the correction on the context proposed for comment R1.1 offers a first clarification on the context in the literature.

      • *

      We also propose to extend the description of ReCoN's impact with the following sentences in the discussion: "Unlike purely data-driven approaches, ReCoN contextualizes prior knowledge balancing both robustness through literature data, and specificity through new measurements. This mechanistic approach opens new possibilities for understanding how cellular coordination shapes tissue-level responses and for designing targeted molecular interventions."

      • *

      R1.4. Regarding the choice of datasets, it is clear that the method is quite demanding, requiring single cell and different omics to build the model, in addition to the expression dataset that is used as a use case. This inevitably leads to using a mix of datasets.

      For example in the mouse experiments the gene regulatory network was inferred from both a lymph node scRNA-seq dataset and a splenic scATAC-seq dataset, presumably due to the lack of multiome data in this setting. However the cell-cell communication network was inferred from the control case of the Immune Dictionary. Why can't the authors use the control data also for inferring GRNs?

      Is atac-seq really necessary in the inference of the GRN? What is the impact of the fact that lymph node and spleen samples might be different?

      :

      • *

      Is it a very *interesting comment, and we propose to add both 1) an explanation about our dataset choice to generate the GRN as a Supplementary text, and 2) a new experiment about the effect of GRNs built from multi-omics and scRNA-seq alone. *

      • *

      • Dataset choice

      • *

      We decided to infer a GRN using multiomics data, as these methods seem to perform better and are becoming the state of the art (Badia-i-Mompel et al. 2023, Trimbour, Deutschmann, and Cantini 2024, Yuan and Duren 2025).

      As scATAC-seq data was not produced for the Mouse Immune dictionary, we tried to find an external dataset, used HuMMuS, the method we previously developed, as it is also based on RWR and performs well on unpaired data.

      • *

      scATAC-seq

      Our first criteria was to match the mouse model used in the immune dictionary dataset, which reduced importantly the number of multicellular immune cell datasets available. We extended our research to a splenic dataset, as spleen is itself classified as a high specialised lymphatic structure, (check) and contains notably the same cell types than classical lymph nodes.

      • *

      scRNA-seq

      While we could technically use the control mice of the Immune Dictionary single-cel RNA-seq data with the spleen scATAC-seq data, the Immune Dictionary only provides 100 or less cells for each cell types per stimulation, which would results in a low number of cells. As GRN quality seems to depend a lot on the number of cell used, we favoured choosing a larger dataset.

      • *

      Our choice to use single-cell multiomics methods was driven by the novelty of these methods over scRNA-seq based ones, the performance improvement that they seemed to offer in several benchmarkings, and the will of developing a pipeline integrating the most complete data available for contextualization (Badia-i-Mompel et al. 2024).

      • *

      • GRN impact over the Human Immune Dictionary

      • *

      While it does not relate directly to this showcase, we will also add a new dataset analysis, detailed in the the comment R1.12. In the Human Cytokine Dictionary showcase,, we propose exploring the effect of choosing different GRNs, built from external multi-omics data or from the control scRNA-seq data of the dataset itself. We hope it can partially help users to decide in general wether to use external datasets of higher quality or sample-specific datasets.

      • *

      Finally, we propose to add in the documentation of the tool, a section showing how to use ReCoN with only scRNA-seq for the GRN inference, and the performance of different GRNs for the Human Cytokine Dictionary dataset directly in the paper.

      • *

      R1.5. The code is very clear, we were able to install and run it and it is quite well-documented. However, a few more details should be given in the text regarding how the evaluation of the performance is carried out.

      For example: If I understand correctly, when predicting the impact of cytokine perturbations the ReCoN predictions of genes impacted are compared to differentially expressed genes identified through traditional DEG analysis. What is compared is the ranking of these genes from ReCoN with the ranking provided by DEseq2. There is no description of how this comparison of ranking gives rise to AUROC values. Also, is it just the ranking that is predicted or can they also estimate how well they can predict the effect size?

      Response:

      • *

      We are thankful for pointing out the unclear technical details. DEG results were binarised, to obtain the list of differentially genes using the thresholds indicated in the section 4.4.4. We considered a gene as perturbed in each cytokine treatment if the comparison of control and treated cells had a t-test p-value below 0.1 and if the log-fold change was above 1.

      • *

      The second, and more general point of the reviewers, ReCoN scores should be considered to provide ranking on the possible regulations, but cannot be considered proportional to the effect size. As they are represent a likelihood more than a score, the binarisation should be the most appropriate transformation for the validation

      • *

      *Moreover, as the scores can be seen as the probability to end up the exploration on each node, they are always summing to one. This also prevents interpreting the scores as the amplitude of change. As an illustration example: if a receptor regulates three genes identically, they would (hopefully) all be having a score of (1 - R)/3, R being the restart probability in ReCoN, whether their expression doubles or is multiplied by 10. *

      • *

      While it can legitimately be seen as a downside, we believe it is similar in practice to most methods inferring GRN methods in practice, where trying to predict the true amplitude of gene perturbations usually results in very low performances (Badia-i-Mompel et al. 2024).

      • *

      We propose changes related to this comment.

      • *

      • We would modify the section 4.4.4. of the method with the following paragraph to explicit that it consists in a binary selection: "For each cytokine-cell type pair, differentially expressed genes were binarised: genes passing the significance thresholds (FDR P-val 1) were labelled as positives, and all remaining genes as negatives. ReCoN scores were then used to rank all genes, and AUROC values were computed from this ranking against the binary labels."

      • *

      • We will also include a section "ReCoN scores interpretation" on the documentation website, as score interpretation precisions will be particularly useful for users.

        R1.6. When describing the use cases, I think a bit more detail would help.

      For example 'To identify the cell-type-specific genes associated with HF, we used the MOFAcell scores of the multicellular factor 1 (MCP1) reported in ReHeat236' I supposed the explanation is on the dataset but for the sake of clarity it would be good to expand this sentence to give at least an idea of the approach.

      Response:

      • *

      We completely agree that more explanations should be provided, to avoid for the reader having to switching between articles to understand the concepts behind this showcase. As suggested by the reviewer, we propose a general description of the approach with the short paragraph, and to remove the term "loading":

      • *

      "In the ReHeat2 study, the first multicellular factor (MCP1) was associated with heart failure. We used the gene loadings of MCP1 as a proxy for the cell-type-specific transcriptomic changes associated with heart failure, ranking genes by their absolute loading values."

      • *

      We also propose to complete the method section: "MOFAcell is a multicellular factor analysis method that decomposes multi-sample single-cell data into latent factors representing coordinated gene expression patterns across cell types. Each factor is characterised by cell-type-specific gene scores, reflecting their individual contribution to the coordinated program. In this showcase, we use the first multicellular program (MCP1), as it was associated with heart failure"

      R1.7. Regarding the calculation of the R matrix from the NichNet matrices L and G, I gather that the R matrix is calculated once and is thus fully data-independent and available just like the L and G matrices from NichNet. This was not very clear in the tutorials.

      Response:

      • *

      We are very thankful for the reviewers' involvement in testing the tools itself and its documentation. First, we propose a new website page explaining the pre-computed resources available for receptor - gene links, and added a descriptive paragraph in the tutorial themselves.

      *Second, we notice a typo in the equation, where it should actually be L = R * G with the current definition. We corrected it in the next version, and precised that R is fully data independent and solely inferred from prior knowledge. *

      R1.8. Also, this might just be a typo in the tutorial: 'The default α = 0.8 gives more weight to direct effects, which has been empirically validated. You can adjust this based on your biological question." I believe the manuscript says alpha>0.5 refers to indirect effects dominating.

      Response:

      • *

      We corrected the saying in the tutorials. Indeed, a high alpha represents a stronger indirect effect. Additionally, a similar typo was in the first equation of the paper, we are correcting it too.

      R1.9. Same for the pre-processing of the spatial data for the third use case, a little more details on how this was done would help the users and readers.

      Response:

      • *

      We propose adding a specific section about the spatial pre-processing and analysis in the methods.

      We are also adding a tutorial on spatial data. Since spatial data processing is computationally intensive without GPUs, we will also provide the data already processed, in order to allow anyone to test this tutorial too.

      • *

      R1.10. I don't see issues with the statistical power of the analysis.

      Rather, I think the authors should provide some examination of the parameter space for their model. Whereas ana analysis of the impact of the Alpha parameter is provided, I believe there are several more parameters that have a crucial impact and choices for their values should be discussed.

      For example 'In the GRN reconstruction only the links with a score above 1.5e-7 were retained in ReCoN's gene regulatory layer. How was this chosen?

      We have identified the following parameters that are somehow justified but could be explored to have a better feel for how they impact the results

      Restart probability: How often the walker goes back to the starting seed/molecule

      Layer transition probability: How often the walker stays in the same layer - different cell? - different layers? Gamma

      Node transition within a layer: How often one jumps to a different layer

      Response:

      This is a very valid point raised by the reviewer about parameters explorations.

      • *

      We focused on exploring the alpha (direct/indirect effect) parameter, as its value was the incertitude when designing the model.

      • *

      We would like to address this comment by adding new explorations for the restart probability and the transition probability between layers. The probability to transition between specific nodes inside a layer directly depends itself on 1) the restart probability, 2) the transition probabilities, and 3) the weights of the edges, that are determined before and independently to ReCoN's exploration.

      • *

      The Heart Atlas showcase allows to evaluate each set of parameters in around 10 min instead of 10h for the Immune Dictionary. We thus propose to evaluate restart probability and layer transition probabilities on the data of this showcase.

      • *

      • We would explore the restart probability of 0.1 * N, with N between 1 and 9.

      • *

      • For transitions probabilities we propose varying GRN, receptor, and cell communication importance with the following configurations: - Staying in CCC probabilities (- not jumping to receptor layer) among (0.1, 0.3, 0.5, 0.7, 0.9), staying in receptor layer (- not jumping to GRN) of (0.25, 0.5, 0.75), staying in GRN layer (- not jumping to CCC) of (0.25, 0.5, 0.75). It would result in 9 intracellular variations combined with 5 intercellular variations.

      • *

      We envision an evaluation by measuring the correlation between the results of the different configurations, and the time before convergence of the results, as it could potentially increase drastically when decreasing the restart probability. If correlations below 0.9 are observed between some results, we will compare their absolute performances.

      • *

      We would include the figures related to these explorations in the supplementary data. We would highlight the main findings in the method section dedicated to the random walk with restart. Finally, we would briefly describe the parameter exploration design in the first section of the results, for curious readers who would like to verify parameter choice before reading the showcases.

      • *

      R1.11. Weighting parameters: How much weight for direct or indirect effect to account for the combined effect - alpha - this is the only one that is explicitly explored.

      Response:

      We are very thankful for this comment, and we decided to modify our tutorial guidelines to make this choice more intuitive and general.

      • *

      Indeed, 1.5e-7 would hardly make sense for most methods, which would not produce such low scores. We now propose to select the first 2 million connections of GRNs, in order to keep a complete or a large portion of the network if other methods than HuMMuS are applied.

      • *

      In our case, 1.5e-7 was empirically determined from the distribution of HuMMuS scores, to keep the 2 million top connections as HuMMuS networks are generally almost fully connected, which is a particularity for classical GRN inference methods, and keeping it entirely would make exploration time much longer.

      • *

      R1.12. Finally, this might be considered OPTIONAL but would greatly improve the work in our opinion:

      The method crucially depends on the networks that are used in the different layers and to connect layers and cell types. As we know, biological data is noisy and incomplete (FP and FN) at each level and in each datatype. It would be really useful to estimate what is the robustness of the results to this noise. Particularly, from personal experience, we think the GRNs reconstructed from data are often almost fully connected and it is exceedingly difficult to validate them in specific contexts. This means that some 'errors' are likely to be present.

      Since several methods exist for inferring GRNs one could simply compare the results using different methods for this part of the network.

      A related point involves the characteristics of the RWR algorithm, that will be quite impacted by the presence of hubs in these networks (either in single layers or across several) that is likely to impact the exploration. If proteins that are hub are effectively important, that is not a problem, but in some layers, for example, the receptor-receptor layer that presumably will contain PPIs, there might be biases in hubs being just better studied proteins, and these hubs might have an 'unjustified' weight in the walks.

      One potential approach to assess the robustness of the method to these issues could be an empirical one that just randomly perturbs the networks in ReCoN to see to what extent similar predictions are achieved.

      *Response: *

      • *

      We are thankful for this relevant comment on GRN and prediction stability, and would like to take it as an opportunity to support the hypothesis that different GRN methods can be used in ReCoN.

      • *

      When developing our previous HMLN-based tool, HuMMuS (Trimbour et al. 2024 - Supp Figure 6), we observed that its multilayer structure provided more robust results than individual layers. We would like to reproduce such an analysis, verifying that ReCoN results have less variability than the GRN layers individually.

      We propose to integrate a new showcase on the Human Cytokine Dictionary (Oesinghaus et al. 2025), trying to predict cytokine downstream effects similarly to the Mouse Immune Dictionary showcase.

      This showcase would be useful to confirm the contribution of the indirect effect and test the impact of different GRN on the results.

      We would generate different GRN with several other GRNs methods: SCENIC+, CellOracle, and GRNBoost2 - the latest using only the scRNA-seq of the control samples in the Human Cytokine Dictionary.

      • *

      The GRN methods produce generally output with very low overlap (Badia-i-Mompel et al. 2024)*. *

      *If we observe high correlations between the ReCoN predictions associated with the different GRNS, it would provide already a validation of ReCoN's robustness to GRN noise. *

      If lower correlations between ReCoN's predictions are obtained, we will add a specific permutation experience over the HuMMuS GRN, creating different level of artificial noise and assessing more precisely the robustness of ReCoN to GRN stochasticity.

      • *

      Regarding PPI hub justification, our *applications did not use receptor PPI and are not affected by bias at this level in the showcases. This bias could specifically be present in the receptor-gene links, as we derive it from the ligand-gene connections of Nichenet which was itself partially based on prior knowledge. It is thus possible that some receptor are reached more often due to this bias and not a stronger effect. It seems however, hard to control in this context, as ReCoN currently relies on this prior knowledge. Currently, we hope that the combination of personalised, literature-agnostic GRN with literature-based receptor - gene can provide an interesting trade-off. In future development, we could imagine a receptor-gene network based solely on perturbations, but it would require controlling also the bias of ligand - receptor binding couples, which limits even the use of ligand-based experience. *

      We propose adding a short point in the discussion about hub effects from RWR-based methods.

      • *

      R1.13. Please add page numbers.

      *Response: *

      • *

      We will add the page numbers.

      • *

      R1.14. Figures are nice and clear.

      Some specific minor points are listed here below.

      Define hMLN on first appearance fig1 caption (no page numbers..

      2nd appearance heterogeneous multilayer structure (HMLN) ...

      Response:

      • *

      We updated the legend of the figure to include the definition of the acronym, as it arrives before first text occurrence. (Or define at both positions ?)

      R1.15. Bi_j not so clear to what it refers when first mentioned

      Response:

      • *

      *Bi_j represents a weight that can be attributed to favour some cell-to-cell transitions. It is usually not necessary to use them.

      *

      *It is of interest notably to model 1) known spatial patterns in situ and hypothesis/design where cell types favour some connections. *

      • *

      E.g.: for modelling the skin, a user might notably want to increase connections between epidermic and dermic cells, and between dermic and hypodermic cells.

      • *

      We propose a new explanation of Bi_j to both explain it's meaning in the modelling, and illustrates situations for using it: "The coefficient B_{i,j} modulates the influence of cell type i on cell type j in the indirect effect computation. By default, all B_{i,j} are set to one, weighting each cell type's contribution equally per cell. However, it can be adjusted to encode additional biological knowledge, such as spatial proximity between cell types or known cooperation patterns. For instance, when modelling the skin, a user might increase B_{i,j} between epidermal and dermal cells, and between dermal and hypodermal cells, to reflect their spatial organisation."

      R1.16. personalized interaction specificity. - maybe better word than personalised (contextualised?)

      Response:

      • *

      We agree that contextualised explicits better the meaning behind this model. Personalised might notably lead to expect patient-specific data, which is not the case here.

      • *

      We propose to rephrase all the model names to : Receptor-matrix, ReCoN-no-CCC, ReCoN-no-context, ReCoN-complete.

      R1.17. ReCoN-genetic and ReCoN, ( generic?)

      Response:

      • *

      We will correct this typo.

      R1.18. responses. It is expected to observe common behaviors in-between cell-type, that the GRN

      and the generic CCC network already contribute captures.

      • not very clear

      Response:

      • *

      We aimed here to provide an explanation to the already good performance of the "ReCoN-no-context" (or its name updated according to comment R1.16), which could be surprising as no cell-type specific information is used. The explanation proposed is the good prediction of several properties shared by all immune cell types, such as similar metabolic pathways, despite their specific roles. If we adopt a quantitative view on their transcriptome like in this showcase, it can be expected that the cell type responses are relatively well predicted through the common properties only.

      • *

      As this is a very relevant comment, and that several comments pre-submission we received were also related to this result, we would like to keep an explanatory sentence.

      • *

      R1.19. Figure 2b the icon of cells with double arrows might suggest phenotype shift when instead this is just communication

      Response:

      (left side) We are very thankful for paying attention to the details of the paper and fully agree with this analysis. We propose to represent ligand emission instead of arrows, reusing the convention of the Figure 1.

      R1.20. eTACs explain acronym and what they are

      Response:

      • *

      We update the first occurrence of eTACS to extrathymic Aire-expressing cells (eTACS).

      R1.21. Due to very few genes being differentially

      expressed, only cDC1 was conserved and evaluated for IL22,

      Not so clear

      Response:

      • *

      As we are commenting on IL22 stimulation results, we reorganised the sentence to make it less convoluted: "For IL22 stimulation, only cDC1 presented enough genes being differentially expressed."

      R1.22. In this showcase (not very clear, use case?)

      Response:

      • *

      We perceive "use case" as describing a type of use for the method, while a show case is a specific example of a use case. We thus find showcase more appropriate here. We will however go over all use of the word, to be sure it is only used for the precise examples we provided, and not to describe "use cases".

      R1.23. different fibroblast specializations - maybe phenotypes?

      Response:

      • *

        • It is a very good suggestion, as specialisation would involve functional aspects (that we can't really be sure of), and a chronological evolution*
      • Phenotype generally includes numerous properties, such as morphology, that we cannot validate here. We think the use of phenotype might be stronger than specialisation here. To simplify, phenotype can work, to be more precise: transcriptomic specialisation? I am honestly not sure of the best change here.

      R1.24. Figure 4b

      1. b) Schematic view of the deconvolution process and cell type-specific count inference from the spatial niches.

      Not so clear what the heatmap shows, rows and columns

      Spots heatmap : label niche on rectangles in cols

      And each col is a spot

      Rows are cell types or cells?

      In the cell types x spot

      Response:

      This figure can indeed benefit strongly from legend modifications. On both matrix, lines represent the genes, while columns represent the spot / individual cells deconvoluted per spots

      • *

      • We would annotate the niche legend (here the colour surroundings) by a symbolic drawing instead of writing it on the matrix

      • *

      Legend "genes" on the first matrix

      • *

      Write deconvolution ON the figure directly

      R1.25. Cell2location. Add reference, maybe explain basic functionality?

      Response:

      • *

      Cell2location was not referenced in the results section, and was only referenced in the section 4.6.2 of the methods, as the 72th citation. We corrected this oversight, and propose 1) a brief explanation of deconvolution right before, 2) a brief explanation of Cell2location particularity in inferring individual cell profiles - which is not common in spatial deconvolution.

      R1.26. reconstructing different patients, tissues, and microenvironments to predict

      context-specific molecular treatments.

      Unclear

      fibrosis in different - at

      molecular levels

      Response:

      • *

      We will modify this section title according to the reviewer's citation and the different reformulation.

      R1.27. Figure 5d myeloid and endothelial colour code inversed from 5 BC

      Response:

      • *

      The legends are individually correct, but there is no reason to not make them coherent across panels. We will update the legend of the panel 5.d..

      • *

      R1.28. 5d indicate important pathways in organe should not change the colour of the nodes (purple=common, blue or green specific). Use border colour maybe?

      Response:

      • *

      We had forgotten to precise the colour code of this panel, where the choice of orange highlighted here the gene set related to molecular pathways instead of functional annotations. As the name already explicits pathway, we now think that the orange background is redundant informations and may create some confusion. We thus would like to update Wnt and TNFA pathways backgrounds to ___ (more enriched in cell type), and purple (significantly enriched in all cell types).

      R1.29. 5e is not a venn diagram

      1. e) Venn diagram showing the overlap between transcription factors (TFs) predicted by ReCoN (green) and those previously

      implicated in fibrosis (orange) or cardiac diseases (violet). Only the top 10 TFs were annotated from literature

      sources; full sizes of fibrosis- and cardiac disease-related receptor sets can therefore not be represented.

      1. f) also not a venn diagram e/f now in supp

      the "NABA ECM collagens" gene set. Nodes are

      grouped by molecular type (e.g., transcription factors, receptors, ligands), and links represent the weighted,

      direct regulatory interactions present in the ReCoN-constructed

      Response:

      • *

      As the diagrams do not indicate the total number of receptor/TF that are in the literature, it cannot be Venn diagrams. We updated the legend to :Venn diagram showing the Overlapp between [...]

      • *

      As we reorganised the paper, these plots are now only in supplementary; we removed the duplicate occurrence in the figure 5 legend.

      R1.30. Why Sankey plot? Normally sankey plot represents flow (of regions changing from 1 state to another) but here this is just a weighted network?

      No communication from firbos back to other cell types? No communication between ventricular/myeloid/lymphoid?

      Response:

      • *

      We are thankful for this useful feedback which helped us realising interesting details were missing from the paragraph.

      • *

      *This is only intended for visualising regulatory cascade, so users have to decide on one receiving cell, a set of target genes, and sending cells. It includes a specific subset of regulatory cells, and only their interactions with the target cells. Here, we illustrated the regulation of some ECM genes produced by fibroblast. *

      • *

      Sankey Diagram might indeed not be the clearest representation, as we are not modelling the all diffusion, and not a flow per se. We propose to replace by another representation that we hope will be more intuitive for biologists (and more aesthetic), such as illustrated below:

      R1.31. as a extension to - an

      underrepresented in the current. - current framework?

      Response:

      • *

      framework works perfectly to fill the missing word in the sentence

      • *

      R1.32. However, it can't represent more - cannot

      Borrowing representation from hypergraphs, which introduces

      The network exploration implementation of ReCoN also present some limitations.

      limitations. While random walks

      with restarts offer a stable and fast exploration workflow for multilayer networks, it

      currently only considers positive weights to predict regulation strengths. It involves that the

      nature of the regulation, as activation or inhibition, has to be identified a posteriori.

      • check concordance/grammar

      Response:

      • *

      We will update the raised grammatical errors

      • *

      R1.33. Only the nodes that are included in one of the layers are present in the

      final results, ignoring the ones present only in bipartites.

      Unclear

      Response:

      • *

      Layers and bipartites are treated differently by the algorithm, and layer presence is necessary to appear in the results.

      • *

      In practice, it just means that receptors/ligands not paired in the CCC, or genes not regulated by any TF in the GRN, won't appear.

      • *

      We propose clarifying with this second explanation

      • *

      "In practice, a node must have at least one connection in its layer to appear in the final results. It thus means that receptors or ligands absent from the CCC network and genes not targeted by any transcription factor in the GRN will not receive a score from the random walk exploration."

      • *

      R1.34. a scATAC - an

      • *

      Barsi et al is published https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013188

      Response:

      • *

      We updated the reference with the published article.

      R1.35. effects, allowing for modulating in a second

      time their contribution. - word order

      Response:

      • *

      We propose to formulate "allowing in a second time to modulate their contribution"

      R1.36. others. However, it is possible to adjust the Beta coefficient to

      represent it based on the available information for each dataset.

      Represent- adjust?

      Response:

      • *

      We agree with the reviewer's suggestion to use adjust.

      R1.37. We use the latter to compare the different models. - what is the latter?

      Response:

      • *

      The latter referred to the 25 cytokines of the Immune Dictionary which had at least one connection in the inferred cell communication network with CellPhoneDB. We propose clarifying this formulation to "..."

      R1.38. It resulted in the scRNA-seq in 1,789 cells with 13,167

      genes, and for the scATAC-seq in 3,759 cells with 254,545 regions.

      Check english

      Response:

      • *

      We propose replacing this sentence by the following: "It resulted in a scRNAseq dataset of 1,789 cells with 13,167 genes, and a scATACseq dataset of 3,759 cells with 254,545 regions."

      R1.39. GRETA pipeline.- reference

      Response:

      • *

      We added the citation to the paper of the GRETA pipeline in the section 4.5 of the methods: "Badia-i-Mompel et al., 2026"

      R1.40. We kept all the cells whose annotations through unsupervised clustering,

      followed by marker gene annotations, through scANVI were coherent.

      Word order

      Response:

      • *

      We propose the following reformulation to correct the sentence: "We kept all cells whose annotations were coherent between unsupervised clustering with marker-gene labelling and scANVI-based label transfer"

      R1.41. In parallel, pairs of ligands and receptors with both associated with scores above

      an absolute gene loading of 0.1 were considered potential driver interactions in HF.

      Unclear

      Response:

      • *

      In the MOFAcell results, factors correspond to linear combination of genes that explain a large part of the data variance; the contribution of each gene is called loading. We chose the factor that classified the best patient with and without fibrosis, and kept all the top genes, all of those with a score above 0.1.

      • *

      We propose reformulating this sentence as the word "loading" could overcomplicate here for most readers: "To identify the ligand and receptors driving heart failure, we considered all of those with an absolute contribution to the multicellular factor of 0.1."

      R1.42. gseapy Python - reference?

      Response:

      • *

      The gseapy package was indeed not cited, we now include the citation : "Zhuoqing Fang, Xinyuan Liu, Gary Peltz, GSEApy: a comprehensive package for performing gene set enrichment analysis in Python, Bioinformatics, 2022;, btac757, https://doi.org/10.1093/bioinformatics/btac757"

      R1.43. and to calculate average for each spatial context the average cell type expression.

      Unclear

      Response:

      • *

      we propose to reformulate the sentence to: "These cell-type-spot profiles were used later for each spatial context to create a specific cell-cell communication networks and to calculate cell type average expressions."

      R1.44. We only used the loadings of all cell

      types but the fibroblasts to consider the effect of the sole environment.

      Unclear

      Response:

      • *

      we propose to use "APART from the fibroblast" to clarify the sentence and "to ONLY consider the environment effect".

      R1.45. We realised a downstream - performed

      Response:

      • *

      We fully agree with the reviewer's suggestion.

      R1.46. The profiles inferred by ReCoN were first very correlated in all three contexts. - unclear

      Response:

      • *

      The sentence was missing clarity and deserved being rephrased. We propose: "When looking at the absolute scores of ReCoN in all three contexts, results were initially highly correlated. To focus on context-specific differences, enrichments were performed using the log-ratio of each context profile over the mean of the other profiles."

      • *

      R1.47. Potentially the closest results are models that can predict the effect of perturbations on cell line cultures. Several approaches in the literature employ either transformers or optimal transport to predict the effect of perturbations in single cell datasets. One of the main issues is an underlying necessary assumption that the perturbation effect will be larger than the heterogeneity (in cell lines for example), which becomes increasingly difficult when considering in-vivo experiments. ReCoN obviously goes beyond this by considering explicitly the presence of different cell types but distinctions of cell types are sometimes quite arbitrary and potentially application of ReCoN to some of the in-vitro culture datasets, even on cell lines, could be a way to test its performance and benchmark it against other methods.

      The main bottleneck in the application of this framework to 'personalisation' of therapies, mentioned even in the abstract as a potential future goal for such an approach, will be the lack of data. This approach requires single cell level descriptions of the system at hand, plus additional datasets to build the model structure. To a certain extent, public data of related tissues/contexts can be used, but it will be necessary to test the dependence of performance on coherence of the input data to develop sufficient trust to use it for new predictions, especially in a medical field.

      • *

      We thank the reviewer for these reflections, which raise several distinct points that we would like to add in the discussion.

      Cell line perturbation is indeed a close and active field of research, with notably numerous models based on optimal transport and VAE and relevant benchmarks(Radig et al. 2025)*. In our view, ReCoN tries to take a complementary angle, by both focusing on the environment effect and using a network-driven approach providing explainability. *

      These perturbation methods are typically benchmarked on single cell line screenings, where cell-cell communication is highly limited or absent by design, while ReCoN is specifically designed to exploit multiple cell types interactions. Furthermore, ReCoN relies on a network that aims to provide only explainable hypotheses and molecular cascades. They also typically learn from different data, as ReCoN only uses single-cell data and best perturbation prediction methods learn from a subset of perturbation experiments.

      Exploring the performance of ReCoN in perturbation predictions would require designing extensive comparisons with the state-of-the-art taking into account all these nuances which we believe goes outside of the scope of the present study. It however still raises a fundamental question for the development of the next methods and the need to assess whether the perturbation effect is actually larger than the heterogeneity, and we propose to extend the discussion to cover these aspects.

      Secondly, this comment raised a point about cell type definition, which can be a hard task and sometimes a wrong description of cells heterogeneity. We note that even if ReCoN relies on grouping cells in some way, it does not impose any particular cell type ontology: users can define their own cell types or cell states, since the CCC layer is typically inferred from single-cell RNA-seq alone and does not require canonical cell-type annotations. This flexibility allows ReCoN to accommodate finer or coarser groupings depending on the biological question. We do not propose a framework to take into account diversity in other ways than homogeneous clusters of cells, but we think that it constitutes an interesting future development of ReCoN or new multicellular modelling methods.

      Lastly, we fully agree that an important limitation for ReCoN's use is data availability and generation, which was also a limitation when identifying datasets for the manuscript's applications. We hope that the development of open source atlases will make it easier to leverage tissue-specific prior knowledge and increase potential application, prediction performances, and trust in ReCoN results.

      In conclusion, we propose to state in the discussion two new points:

      *1) extending multicellular perturbations (including gene knock-out) to conditions where cell types cannot be defined prior to the analysis, or are more to consider across a spectrum, will be an interesting future direction. *

      2) there is new a need for broad benchmarks covering both multicellular and single-cell line tasks to evaluate the trade-off between accounting for cell heterogeneity and overall prediction accuracy.

      Radig, J., Droit, R., Doncevic, D. et al. scArchon: a scalable benchmarking framework for assessing single-cell perturbation models. Genome Biol 27, 162 (2026). https://doi.org/10.1186/s13059-026-04104-z

      R1.48. The authors could comment on how their method compares to others that do not require single cell level information. Despite clear differences, it might be important to show the advantage of using this more complex approach that requires data that is less available. Given the ease with which bulk profiles can be constructed from single cell data, it might be possible to compare the approaches directly. For example, see

      1. Wang, S. Patkar, J.S. Lee, E.M. Gertz, W. Robinson, F. Schischlik, D.R. Crawford, A.A. Schäffer, E. Ruppin Deconvolving Clinically Relevant Cellular Immune Cross-talk from Bulk Gene Expression Using CODEFACS and LIRICS Stratifies Patients with Melanoma to Anti-PD-1 Therapy

      Mike van Santvoort, Óscar Lapuente-Santana, Maria Zopoglou, Constantin Zackl, Francesca Finotello, Pim van der Hoorn, Federica Eduati,

      Mathematically mapping the network of cells in the tumor microenvironment,

      Cell Reports Methods 2025

      We propose to extend the discussion with additional methods, notably from before single-cell technology developments. We did not plan to include this two specific methods, as to our knowledge, they don't provide output directly comparable to ReCoN's purpose.

      • The first work proposes to deconvolute the bulk RNA-seq profile into cell-type-specific expression profiles. It is an interesting reference, as it could allow applying ReCoN even to bulk RNA-seq, but they do not provide comparable results, as their final task corresponds to inferring the ligand-receptor interactions, without providing downstream molecular mechanisms.
      • The second method proposed in this paper, RaCInG builds cell-to-cell networks for individual patients. They do not explore the molecular interactions inside the cells themselves, which could be used to build personalised ReCoN's model but seem to be more a prior to recent CCC than ReCoN itself.
      • *

      • *

      Reviewer #2

      R2.1. It is not clear how well it performs in independent validations. Authors showed that it can predict the effect of cytokine perturbations in the immune dictionary by selecting an optimal alpha. Authors should validate that using the same alpha value of 0.8, it is possible to accurately predict the effect of cytokine perturbations in independent datasets. This is particularly concerning for cytokine-cell type pairs where the optimal alpha is not known. Therefore, the potential utility of Recon to estimate the effect of multicellular perturbations is not well established.

      • *

      Response:

      • *

      *The reviewers raised a very relevant point by pointing out that the alpha coefficient might vary between datasets. *

      • *

      The value of 0.8 was chosen because it produced the best results in two independent datasets, the immune dictionary and the heart failure showcases. We could here observe some cross-dictionary reproducibility. To complete these findings, we will also verify that 0.8 provides the best performance in a new showcase: the Human Cytokine Dictionary (Oesinghaus et al. 2025)

      • *

      We tried to contrast this choice by opening on the need to confirm the importance of the indirect effect. We propose to add a sentence explicitly commenting on the impact of these new findings on the alpha coefficient and its robustness value.

      • *

      It is also accurate to say that ReCoN cannot currently estimate the alpha parameter autonomously. We proposed this default value as it worked on both datasets, but it is possible that no default value could fit them all. The value of alpha is currently a default value, but users are completely free in the current implementation of ReCoN to modify its value depending on their needs

      If it was not the case, one option could be to fit its value using similar prior perturbations, when such data is available. For example, perturbing one or a few cytokines, a user could choose the value that explained the best the gene expression responses.

      • *

      R2.2. Authors claimed that optimal alpha value of 0.8 implies the dominance of indirect effect. But in contrast to this claim, the performance across cytokine-celltype pair only increased from 0.72 to 0.76, which seem to imply that indirect effects do not add much.

      *Response: *

      • *

      The range of performance improvement is an interesting point to discuss for us, as it roughly doubles the computational time and consequently a trade-off between resource usage and this improvement.

      • *

      While the average improvement from combining the direct and indirect effects observed on the first showcase was around 5%, it reached more than 10% in some cell types. We consider that it still corresponds to an interesting improvement for the current task. Indeed, it here "only" incorporates the coordination of immune cells to a cytokine stimulation, which should not necessarily change their profile drastically compared to isolated exposition.

      R2.3. How does the cell-type specific effects prediction perform by just considering the intracellular layers? The authors constructed multiple variants of ReCoN to estimate unicellular and multicellular effects. How is the variant ReCoN-grn different from full ReCoN where gamma is set to zero.

      *Response: *

      • *

      We are thankful for this comment, which will help to restructure the section 2.2.

      • *

      As the ReCoN-GRN differs from the full ReCoN model, even with a gamma value of 0, as the latest include ligand-to-receptor weights. However, the ReCoN-GRN would correspond to the ReCoN-generic with an alpha of 0, which does not weight ligand-to-receptor links.

      • *

      We propose to clarify this detail in the section 2.2.2 by adding after the introduction of the ReCoN-generic model the sentence: "Note that ReCoN-grn corresponds to the ReCoN-generic model with alpha set to zero, where no indirect effects are considered. It differs from the full ReCoN model with alpha set to zero, which still includes ligand-to-receptor weights through the receptor-gene bipartite network."

      R2.4. In section 2.2, authors assert that if matching datasets are not available, GRN layer can be extracted from other datasets. How well does the GRN layer from one system generalizes to the other system in terms of perturbation prediction?

      *Response: *

      • *

      It is, of course, a complex question, as it probably strongly depends on the studied system. However, we believe while it is important to consider similar systems, using the same samples for the cell-communication and the GRN layer is not necessary.

      • *

      The first showcase that we propose explores exactly this case. We built the GRN from two unpaired datasets, and the cell communication from a third one. It provided convincing performances, justifying our earlier claim. It is additionally something done in most methods contextualising prior knowledge, which usually comes from other samples and sometimes even other organs (Browaeys, Saelens, and Saeys 2020, Jin et al. 2021, Badia-i-Mompel et al. 2023).

      • *

      To provide additional insights, we will run the new Human Cytokine Dictionary showcase using both 1) multiomics methods on external PBMC datasets, and 2) a single-cell RNA-seq only method on the Human Dictionary directly. We will then be able to show performances using both data and corresponding methods.

      • *

      To justify more clearly our claim according to reviewer's comment, we propose highlighting in the showcase itself this justification: ".... this showcase highlights the possibility to combine networks obtained from distinct datasets...".

      Related to combining datasets, we propose to clarify the reasons behind our choices for the Immune Dictionary showcase with the additional supplementary text proposed in response to the comment R1.4.

      • *

      Badia-i-Mompel P, Wessels L, Müller-Dott S et al. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24(11):739-54. https://doi.org/10.1038/s41576-023-00618-5.

      Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 2020;17(2):159-62. https://doi.org/10.1038/s41592-019-0667-5.

      Jin S, Guerrero-Juarez CF, Zhang L et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 2021;12(1):1088. https://doi.org/10.1038/s41467-021-21246-9.

      R2.5. In the abstract, authors claimed that ReCoN can predict the effect of gene knockouts. But authors did not show any application or validation to support this claim.

      Response:

      • *

      We indeed had no showcase that could explicitly measure the performance of ReCoN directly for gene knockout, while the possible application was introduced in the abstract.

      * We believe that ReCoN could be used in the future to infer such perturbations, but we fully agree that this claim cannot be presented without justification.

      We propose to remove the introduction of gene-knockout there, and to introduce it in the discussion opening instead, specifying that it will require specific experience and constitutes a possible future extension of the work.*

      R2.6. The communication between cells might be dependent on their spatial proximity. Is it possible to construct the CCC layer by incorporating the context-matched spatial data? How would that affect the performance of multicellular response prediction?

      Response:

      • *

      *This is a very interesting comment as numerous methods using spatial transcriptomic data have been published recently. *

      • *

      In the current formulation, the beta coefficient Bi_j modulates the impact of the cell type i on the cell type j. If the spatial transcriptomic data can inform on the proximity between cell types, and its overall impact on their communication, users could enforce more communication between some.

      • *

      However, as ReCoN is a cell-type centric model, adding spatial information can only be done at a general scale, or by modelling independently spatial regions such as presented in the Microenvironments heart infarction showcase. It means that ReCoN cannot beneficiate from the potential of spatial transcriptomic as much as models representing the tissue structure.

      R2.7. In the fibroblast application in Fig 4d, based on the cardiac cell types expression in region type, they are predicting fibroblast gene expression. Wouldn't the most direct benchmarking be comparison with observed fibroblast expression from the ST (after deconvolution perhaps)?

      Response:

      • *

      This was a helpful comment to guide the restructuration of the microenvironment heart infarction showcase, as we believe the whole showcase objective was not formulated clearly enough.

      • *

      We aim at modelling the impact of the environment on the transcriptome. As the complete transcriptome of a cell results from numerous interacting variables, we believe that comparing the correlation between ReCoN's scores and the transcriptome would not evaluate the prediction of the environment impact.

      • *

      For this reason, we wanted to compare the results to the specific differences from the microenvironment. We focused on gene set enrichment that seemed less noisy for such a comparative experiment, in particular from Visium10X data that has a particularly high dropout rate.

      • *

      We propose to strengthen the validation by providing molecular insights into the three groups of cells studied.

      The spatial data themselves are bulk, adding a layer of noise over the small number of genes captured by Visium. Instead of a correlation with the deconvoluted spots, we have equivalent single-cell RNA-seq fibroblast data annotated in the same study, which matches the three modelled niches. We propose to conduct a differential expression here and try to compute a correlation between these groups and ReCoN scores, providing a quantitative analysis.

      If the correlation was low because of the noise in the data (notably leading to the permutation of individual gene orders even if overall biological signals and gene set orders are conserved), we will additionally do a pathway enrichment over this data, enriching also the qualitative validation.

      R2.8. Section 2.6 Besides the cytokine section, it is difficult to assess the added value of this approach. Likely there is a lot of valuable findings here but difficult to say because the assessment is very qualitative.

      Response:

      • *

      One of the challenges around this work was to find relevant dataset to evaluate ReCoN. We tried to complete the direct quantitative evaluation from the Immune Dictionary with another quantitive evaluation from the heart atlas multicellular programs, despite a much less direct validation.

      • *

      We hope that the production of new perturbation experiments over multicellular datasets, especially cell-type targeted perturbations, will provide more opportunities to validate the different findings and claim from our current manuscript.

      • *

      On a similar note, no method seemed proposing similar predictions to be compared to. It led to the use of Nichenet score and the current decomposition of the ReCoN model in the section 2.2.1 to evaluate the contribution of the model.

      R2.9. The article is dense and writing should be reorganized for better readability.

      Minor issues -

      No p-values in figures.

      *Response: *

      • *

      We agree that integrating values directly in the panels would make the reading of the figure easier. We would like to introduce the p-values in the panels 2d, 2e, 2f, 2g. We had forgot to indicate in the legend of the panel 4.d that all bold scores were associated with a p-value *

      R2.10. Typo - ReCoN-genetic should be - ReCoN-generic.

      • *

      Response:

      • *

      We are thankful for noticing the typo and corrected it in the new version.

      • *

      R2.11. Authors may consider adding figures to describe their results on balance between direct and indirect effects in section 2.2.2.

      • *

      Response:

      • *

      Depending on the new findings on the indirect effect iterations, we propose adding an additional panel on their combination or a supplementary figure.

      • *

      R2.12. Redundancy in the following two lines -

      o While these approaches effectively describe what tissue-wide programs are coordinated, they generally offer limited insight into the molecular mechanisms that establish or regulate these programs.

      o Despite their ability to identify coordinated tissue-wide programs, multicellular program analyses typically offer limited insight into the underlying molecular mechanisms that orchestrate these programs.

      • *

      Response:

      • *

      We propose in the version of the manuscript to remove the first sentence. In our opinion, starting the next paragraph by this clarification seems more helpful to guide the reader than having it at the end of the previous one.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      R2.13. The direct and indirect effects are treated in two separate steps. In reality of course these effects are operating simultaneously. I wonder if this could be better modelled by iterating through the two steps. It might be worthwhile

      trying to see if that improves the performance.

      We thank the reviewer for this interesting idea, and propose to add a supplementary text to present the result of this discussion to the readers.

      • *

      The direct effect is supposed to be measurable from the first iteration only, as we try to represent the effect of direct receptor binding. Regarding the indirect effect, iterations could be done to model the indirect effect, which could represent more distant effect in time.

      • *

      On an algorithmic note, the indirect effect already allow several "iterations" of this effect, as each random walk can loop between all cell types until restart. However, it does not allow to control the weight of the different successive transition. In practice, with a high restart probability, an extreme weight is given to the first "iteration" over the second, as there is three layers to cross to explore the next cell.

      • *

      First, we propose clarifying this section of the manuscript, to explain the depth of the indirect effect explorations.

      • *

      Biologically, it is highly possible that these iterations have an important role to explain the complete reaction of the cells. However, we believe that it hits a major limitation of our modelling, and RWR based exploration in general, as it goes against the enforcement of restarts.

      • *

      We aim to represent pairwise measurements, representing the impact of one node on another. But random walks without restart are not naturally well fitted to this problem, as they naturally converge to a stationary distribution ((László, Lov, and Erdos 1996)). In the case of ReCoN, it means that each gene and receptor, if we pushed the exploration indefinitely, would have the same probability to end up on each node of the system.

      • *

      The restart mitigates this impact and enforces the impacts of the seeds by ensuring that the walkers stay close to the seed. (Tong, Faloutsos, and Pan 2006). By iterating successively from the new distribution obtained from the RWR, we would go against this important probability and progressively converge toward the stationary distribution from classical random walks.

      • *

      So we completely share the opinion of the reviewer that the iterative nature of the indirect effect should be explored too, but we don't believe that ReCoN can model them accurately. We hope that new exploration methods will be able to decipher the importance of these iterations, once additional arguments have been gathered to justify the global interest of considering the indirect effect.

      • *

      Bibliography:

      • *

      László L, Lov L, Erdos O. Random Walks on Graphs: A Survey. 1 Jan. 1996:1-46.

      • *

      Tong H, Faloutsos C, Pan J yu. Fast Random Walk with Restart and Its Applications. Sixth Int Conf Data Min ICDM06 Dec. 2006:613-22. https://doi.org/10.1109/ICDM.2006.70.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors propose an approach to model complex regulatory processes in tissue or cell collections in specific environments taking into account intra- cellular regulatory processes at multiple levels and inter-cellular communication, importantly offering a chance to estimate the importance of indirect effects of perturbations on one cell type via processes in other cell types. Increasingly more complete models allow testing the impact of each component and of integrating data as context-specific information versus general prior knowledge. 3 main use cases are provided exploiting public datases: prediction of the effect of specific in-vivo cytokine perturbations on mouse lymph node tissues Healthy and disease myocardium in a heart failure multiome dataset Myocardial infarction spatial transcriptomics to identify how different cellular neighbourhoods are related to fibroblast phenotype and fibrosis The main framework is an extension of their previous HuMMus framework to investigate multilayer networks of regulation within a single cell type to also consider inter-cellular interactions, thus including i) tf-target GRN, ii) receptor a receptor layer based on PPI, and cell-cell communication based on LR interactions. These complex networks are then explored within the framework of Random Walk with Restart, which allows to establish 'interaction weights' between different nodes in the network, based on repeated simulations of spreading on the network that thus produce scores of proximity between network nodes, across possible paths. In this study first RWR that only allow intra-cell type walks are performed to calculate direct interaction of perturbation on node states, then RWRs across layers are also enabled, to calculate the importance of inter-cell interactions (via coeff gamma). The importance of each cell type is given by another coeff B that can either correspond to cell type proportions or spatial proximity of cell pairs and finally the scores of within and inter-cell interactions are weighted with a coefficient alpha.

      The central contribution that allows coupling of intra with inter-cellular interactions is the establishment of receptor-gene links. Instead of inferring it from data, they propose to express the receptor-gene matrix as: R = L ⋅ G taking ligand-receptor (L) and ligand-gene (G) adjacency matrices from NicheNet and using NNLS to compute R.

      Generally, for all these cases, comparison between performance in inferring the effect of perturbation or the upstream regulators or downstream targets are provided with assessment of AUROC/AUPRC values.

      • Are the claims and the conclusions supported by the data or do they require additional experiments or analyses to support them?

      This is a very well-written paper, the methods used are adequate and the use cases are relevant and broad, exploiting state of the art datasets and tools.

      The author's claims are mostly justified. The authors could make an effort to more explicitly cite other efforts in similar directions. The claim 'We envision ReCoN as a extension to prior multicellular modelling, offering an interesting compromise between prediction of cell type responses and understanding of their molecular coordination.' is very general and could be better substantiated. In fact, the authors do not really give examples of alternative approaches to study systems of interacting cells, other than mechanistic agent based models, that clearly are very different. Moreover, the exploration of the multilayer networks with RWR is a very reasonable choice but could there be other approaches? I think the authors could discuss this issue to briefly support their choice of this method.

      Generally the discussion should provide the reader the context in the existing literature in which the work can be set, detailing its impact. I think this could be improved.

      Regarding the choice of datasets, it is clear that the method is quite demanding, requiring single cell and different omics to build the model, in addition to the expression dataset that is used as a use case. This inevitably leads to using a mix of datasets. For example in the mouse experiments the gene regulatory network was inferred from both a lymph node scRNA-seq dataset and a splenic scATAC-seq dataset, presumably due to the lack of multiome data in this setting. However the cell-cell communication network was inferred from the control case of the Immune Dictionary. Why can't the authors use the control data also for inferring GRNs? Is atac-seq really necessary in the inference of the GRN? What is the impact of the fact that lymph node and spleen samples might be different?

      '

      • Please request additional experiments only if they are essential for the conclusions. Alternatively, ask the authors to qualify their claims as preliminary or speculative, or to remove them altogether.

      • If you have constructive further reaching suggestions that could significantly improve the study but would open new lines of investigations, please label them as "OPTIONAL".

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated time investment for substantial experiments.

      • Are the data and the methods presented in such a way that they can be reproduced? The code is very clear, we were able to install and run it and it is quite well-documented. However, a few more details should be given in the text regarding how the evaluation of the performance is carried out. For example: If I understand correctly, when predicting the impact of cytokine perturbations the ReCoN predictions of genes impacted are compared to differentially expressed genes identified through traditional DEG analysis. What is compared is the ranking of these genes from ReCoN with the ranking provided by DEseq2. There is no description of how this comparison of ranking gives rise to AUROC values. Also, is it just the ranking that is predicted or can they also estimate how well they can predict the effect size?

      When describing the use cases, I think a bit more detail would help. For example 'To identify the cell-type-specific genes associated with HF, we used the MOFAcell scores of the multicellular factor 1 (MCP1) reported in ReHeat236' I supposed the explanation is on the dataset but for the sake of clarity it would be good to expand this sentence to give at least an idea of the approach.

      Regarding the calculation of the R matrix from the NichNet matrices L and G, I gather that the R matrix is calculated once and is thus fully data-independent and available just like the L and G matrices from NichNet. This was not very clear in the tutorials.

      Also, this might just be a typo in the tutorial: 'The default α = 0.8 gives more weight to direct effects, which has been empirically validated. You can adjust this based on your biological question." I believe the manuscript says alpha>0.5 refers to indirect effects dominating.

      Same for the pre-processing of the spatial data for the third use case, a little more details on how this was done would help the users and readers.

      • Are the experiments adequately replicated and statistical analysis adequate? I don't see issues with the statistical power of the analysis. Rather, I think the authors should provide some examination of the parameter space for their model. Whereas ana analysis of the impact of the Alpha parameter is provided, I believe there are several more parameters that have a crucial impact and choices for their values should be discussed.

      For example 'In the GRN reconstruction only the links with a score above 1.5e-7 were retained in ReCoN's gene regulatory layer. How was this chosen?

      We have identified the following parameters that are somehow justified but could be explored to have a better feel for how they impact the results

      Restart probability: How often the walker goes back to the starting seed/molecule Layer transition probability: How often the walker stays in the same layer - different cell? - different layers? Gamma Node transition within a layer: How often one jumps to a different layer Weighting parameters: How much weight for direct or indirect effect to account for the combined effect - alpha - this is the only one that is explicitly explored.

      Finally, this might be considered OPTIONAL but would greatly improve the work in our opinion: The method crucially depends on the networks that are used in the different layers and to connect layers and cell types. As we know, biological data is noisy and incomplete (FP and FN) at each level and in each datatype. It would be really useful to estimate what is the robustness of the results to this noise. Particularly, from personal experience, we think the GRNs reconstructed from data are often almost fully connected and it is exceedingly difficult to validate them in specific contexts. This means that some 'errors' are likely to be present. Since several methods exist for inferring GRNs one could simply compare the results using different methods for this part of the network. A related point involves the characteristics of the RWR algorithm, that will be quite impacted by the presence of hubs in these networks (either in single layers or across several) that is likely to impact the exploration. If proteins that are hub are effectively important, that is not a problem, but in some layers, for example, the receptor-receptor layer that presumably will contain PPIs, there might be biases in hubs being just better studied proteins, and these hubs might have an 'unjustified' weight in the walks. One potential approach to assess the robustness of the method to these issues could be an empirical one that just randomly perturbs the networks in ReCoN to see to what extent similar predictions are achieved.

      Minor comments:

      • Specific experimental issues that are easily addressable.
      • Are prior studies referenced appropriately?
      • Are the text and figures clear and accurate?
      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Please add page numbers. Figures are nice and clear. Some specific minor points are listed here below.

      Define hMLN on first appearance fig1 caption (no page numbers..;) 2nd appearance heterogeneous multilayer structure (HMLN) ... Bi_j not so clear to what it refers when first mentioned personalized interaction specificity. - maybe better word than personalised (contextualised?) ReCoN-genetic and ReCoN, ( generic?) responses. It is expected to observe common behaviors in-between cell-type, that the GRN and the generic CCC network already contribute captures. - not very clear

      Figure 2b the icon of cells with double arrows might suggest phenotype shift when instead this is just communication eTACs explain acronym and what they are Due to very few genes being differentially expressed, only cDC1 was conserved and evaluated for IL22, Not so clear In this showcase (not very clear, use case?) different fibroblast specializations - maybe phenotypes?

      Figure 4b b) Schematic view of the deconvolution process and cell type-specific count inference from the spatial niches. Not so clear what the heatmap shows, rows and columns Spots heatmap : label niche on rectangles in cols And each col is a spot Rows are cell types or cells? In the cell types x spot

      Cell2location. Add reference, maybe explain basic functionality?

      reconstructing different patients, tissues, and microenvironments to predict context-specific molecular treatments. Unclear fibrosis in different - at molecular levels

      Figure 5d myeloid and endothelial colour code inversed from 5 BC 5d indicate important pathways in organe should not change the colour of the nodes (purple=common, blue or green specific). Use border colour maybe? 5e is not a venn diagram e) Venn diagram showing the overlap between transcription factors (TFs) predicted by ReCoN (green) and those previously implicated in fibrosis (orange) or cardiac diseases (violet). Only the top 10 TFs were annotated from literature sources; full sizes of fibrosis- and cardiac disease-related receptor sets can therefore not be represented. f) also not a venn diagram e/f now in supp the "NABA ECM collagens" gene set. Nodes are grouped by molecular type (e.g., transcription factors, receptors, ligands), and links represent the weighted, direct regulatory interactions present in the ReCoN-constructed

      Why Sankey plot? Normally sankey plot represents flow (of regions changing from 1 state to another) but here this is just a weighted network? No communication from firbos back to other cell types? No communication between ventricular/myeloid/lymphoid?

      as a extension to - an underrepresented in the current. - current framework? However, it can't represent more - cannot Borrowing representation from hypergraphs, which introduces The network exploration implementation of ReCoN also present some limitations. limitations. While random walks with restarts offer a stable and fast exploration workflow for multilayer networks, it currently only considers positive weights to predict regulation strengths. It involves that the nature of the regulation, as activation or inhibition, has to be identified a posteriori.

      • check concordance/grammar

      Only the nodes that are included in one of the layers are present in the final results, ignoring the ones present only in bipartites. Unclear a scATAC - an Barsi et al is published https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1013188 effects, allowing for modulating in a second time their contribution. - word order

      others. However, it is possible to adjust the Beta coefficient to represent it based on the available information for each dataset. Represent- adjust?

      We use the latter to compare the different models. - what is the latter?

      It resulted in the scRNA-seq in 1,789 cells with 13,167 genes, and for the scATAC-seq in 3,759 cells with 254,545 regions. Check english GRETA pipeline.- reference

      We kept all the cells whose annotations through unsupervised clustering, followed by marker gene annotations, through scANVI were coherent. Word order In parallel, pairs of ligands and receptors with both associated with scores above an absolute gene loading of 0.1 were considered potential driver interactions in HF. Unclear gseapy Python - reference?

      and to calculate average for each spatial context the average cell type expression. Unclear

      We only used the loadings of all cell types but the fibroblasts to consider the effect of the sole environment. Unclear We realised a downstream - performed

      The profiles inferred by ReCoN were first very correlated in all three contexts. - unclear

      Significance

      Provide contextual information to readers (editors and researchers) about the novelty of the study, its value for the field and the communities that might be interested.

      This is a very timely paper, dealing with an important gap in the literature. It is not an entirely new framework, but it integrates different existing approaches to solve a complex issue in a creative way. To my knowledge, it is the first attempt to consider and formalise regulation processes involving both intra- and inter-cellular interactions. The results support the importance of distinguishing the different paths that can relate the impact of a perturbation to specific genes/functions in different cells and their overall ecosystem.

      General assessment: provide a summary of the strengths and limitations of the study. What are the strongest and most important aspects? What aspects of the study should be improved or could be developed?

      The tool offers a combination of approaches, providing a coherent framework. The code is well documented and functional. The use cases are quite compelling. Sadly, the only type of validation possible involves confirmation of known facts from the literature, which makes it hard to evaluate the full impact of some of the predictions. I think the details of how the method works and especially how the performance was evaluated could be expanded and an assessment of how different parameters and choices impact the results would also be very helpful. An effort to compare the presented variations of the method to some other approach would be very welcome, but I am finding it hard to identify what an alternative approach could be comparable.

      Advance: compare the study to the closest related results in the literature or highlight results reported for the first time to your knowledge; does the study extend the knowledge in the field and in which way? Describe the nature of the advance and the resulting insights (for example: conceptual, technical, clinical, mechanistic, functional,...).

      Potentially the closest results are models that can predict the effect of perturbations on cell line cultures. Several approaches in the literature employ either transformers or optimal transport to predict the effect of perturbations in single cell datasets. One of the main issues is an underlying necessary assumption that the perturbation effect will be larger than the heterogeneity (in cell lines for example), which becomes increasingly difficult when considering in-vivo experiments. ReCoN obviously goes beyond this by considering explicitly the presence of different cell types but distinctions of cell types are sometimes quite arbitrary and potentially application of ReCoN to some of the in-vitro culture datasets, even on cell lines, could be a way to test its performance and benchmark it against other methods. The main bottleneck in the application of this framework to 'personalisation' of therapies, mentioned even in the abstract as a potential future goal for such an approach, will be the lack of data. This approach requires single cell level descriptions of the system at hand, plus additional datasets to build the model structure. To a certain extent, public data of related tissues/contexts can be used, but it will be necessary to test the dependence of performance on coherence of the input data to develop sufficient trust to use it for new predictions, especially in a medical field.

      The authors could comment on how their method compares to others that do not require single cell level information. Despite clear differences, it might be important to show the advantage of using this more complex approach that requires data that is less available. Given the ease with which bulk profiles can be constructed from single cell data, it might be possible to compare the approaches directly. For example, see K. Wang, S. Patkar, J.S. Lee, E.M. Gertz, W. Robinson, F. Schischlik, D.R. Crawford, A.A. Schäffer, E. Ruppin Deconvolving Clinically Relevant Cellular Immune Cross-talk from Bulk Gene Expression Using CODEFACS and LIRICS Stratifies Patients with Melanoma to Anti-PD-1 Therapy

      Mike van Santvoort, Óscar Lapuente-Santana, Maria Zopoglou, Constantin Zackl, Francesca Finotello, Pim van der Hoorn, Federica Eduati, Mathematically mapping the network of cells in the tumor microenvironment, Cell Reports Methods 2025

      Audience: describe the type of audience ("specialized", "broad", "basic research", "translational/clinical", etc...) that will be interested or influenced by this research; how will this research be used by others; will it be of interest beyond the specific field?

      Broad interest to biomedical researchers and also biologists in other fields. While the method allows advances in basic research on biological process regulation, a clear clinical application can be envisaged in immuno-oncology for example/ immunology and even general molecular medicine.

      Please define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      I am a computational biologist with expertise in network models, regulatory networks, agent-based models and especially familiar with the tumour microenvironment and processes therein. I can more or less appreciate the meaningfulness of the biological findings related to the mouse lymphnode example. I am much less of an expert on heart tissue modeling, heart failure, fibrosis etc, required to fully comprehend the impact of the second and third use cases.

    1. Briefing : Audition de la Secrétaire d’État chargée de l’Égalité entre les Femmes et les Hommes

      Résumé Exécutif

      Ce document synthétise les points clés de l'audition de Salima Saa, Secrétaire d’État chargée de l’égalité entre les femmes et les hommes, devant la délégation aux droits des femmes de l'Assemblée nationale.

      L'audition s'articule autour de la présentation de sa feuille de route et des discussions relatives au Projet de Loi de Finances (PLF) 2025.

      Les priorités majeures identifiées sont :

      • La lutte contre les violences sexistes et sexuelles (VSS) : Placée au cœur de l'action gouvernementale, avec une attention particulière sur la notion de consentement, la soumission chimique et le renforcement des dispositifs d'urgence.

      • L'autonomie économique : Focus sur l'égalité salariale, la représentation des femmes dans les instances dirigeantes et l'encouragement des carrières scientifiques (STEM).

      • Le soutien aux familles monoparentales : Reconnaissance de la précarité spécifique des mères isolées.

      • La diplomatie féministe : Réaffirmation du rôle moteur de la France à l'échelle internationale face à la montée des mouvements anti-droits.

      Sur le plan budgétaire, le programme 137 voit ses crédits augmenter de 10 %, atteignant 85,1 millions d'euros, bien que les rapporteurs et les associations soulignent des besoins croissants, notamment pour le financement de la "prime Ségur" et l'hébergement d'urgence.

      --------------------------------------------------------------------------------

      I. Lutte contre les Violences Sexistes et Sexuelles (VSS)

      La lutte contre les violences demeure la priorité absolue.

      La Secrétaire d'État reconnaît que, malgré les avancées depuis 2017, le compte n'y est pas, soulignant que 1,5 million de femmes ont déclaré avoir subi des violences sexuelles hors cadre familial en 2023, avec seulement 2 % de plaintes déposées.

      Évolutions Législatives et Judiciaires

      • Notion de consentement : Un consensus se dégage sur la nécessité d'intégrer la notion de consentement dans la définition pénale du viol pour réduire le taux de classement sans suite (actuellement de 2/3 des plaintes).

      Le gouvernement se dit prêt à travailler sur une évolution du code pénal, en collaboration avec les travaux parlementaires en cours.

      • Soumission chimique : Une mission est en cours (portée par la députée Sandrine Josso) pour formuler des propositions concrètes face à ce fléau mis en lumière par l'actualité judiciaire.

      • Cellule d'urgence : Installation d'une instance pour identifier des mesures rapides et pragmatiques, avec des annonces prévues pour le 25 novembre 2024.

      Dispositifs de Protection et d'Accompagnement

      • Aide Universelle d'Urgence (AUU) : Depuis décembre 2023, 26 525 aides ont été versées pour un montant moyen de 870 €.

      Le budget passera de 13 à 20 millions d'euros en 2025.

      • Pack Nouveau Départ : Expérimentation visant à coordonner les acteurs (CAF, justice, etc.) pour accompagner les victimes vers l'autonomie.

      Prévu pour être généralisé après évaluation dans cinq départements pilotes.

      • Maisons des Femmes : 75 structures sont actuellement ouvertes, avec l'objectif d'atteindre une structure par département.

      --------------------------------------------------------------------------------

      II. Égalité Professionnelle et Autonomie Économique

      L'autonomie financière est présentée comme le rempart principal contre l'emprise et la précarité.

      Carrières et Représentation

      • Femmes dans la science : Constat d'un "plafond de verre" persistant.

      Seuls 24 % des ingénieurs sont des femmes.

      La Secrétaire d'État pointe les stéréotypes parentaux : 61 % des garçons sont encouragés vers le numérique contre 33 % des filles.

      • Directives européennes : Transposition de la directive "Women on Boards" pour l'équilibre dans les conseils d'administration et de la directive sur la transparence salariale.

      • Index Égalité : Volonté d'évaluer et de renforcer cet outil six ans après sa création.

      Familles Monoparentales

      • Constat : 82 % des familles monoparentales sont dirigées par des femmes ; 41 % des enfants de ces familles vivent sous le seuil de pauvreté.

      • Mesures en débat : Discussion sur la création d'une "carte famille monoparentale" pour faciliter l'accès aux droits, et sur la défiscalisation des pensions alimentaires (sujet de tension budgétaire entre le gouvernement et certains groupes parlementaires).

      --------------------------------------------------------------------------------

      III. Analyse Budgétaire (PLF 2025)

      Le budget du Programme 137 ("Égalité entre les femmes et les hommes") est au centre des échanges.

      | Indicateur | Valeur / Évolution | | --- | --- | | Budget Programme 137 | 85,1 millions € (contre 77M € en 2024) | | Augmentation | \+ 10 % | | Aide Universelle d'Urgence | 20 millions € alloués | | Fonds de soutien (organisations féministes) | 250 millions € (période 2023-2027) |

      Points de vigilance soulevés par les Rapporteurs

      • Prime Ségur : Les associations (FNSF, CIDFF, etc.) alertent sur leur incapacité financière à verser cette prime de revalorisation salariale sans compensation étatique directe, ce qui menace la pérennité de leurs structures.

      • Hébergement d'urgence : Inquiétude sur la généralisation des appels à projets qui diluerait la spécificité des places dédiées aux femmes victimes de violences.

      • Sortie de la prostitution : Les objectifs (950 parcours en 2025) sont jugés insuffisants par rapport aux besoins du terrain.

      Seuls 51 départements ont effectivement ouvert des commissions de sortie de prostitution en 2023.

      --------------------------------------------------------------------------------

      IV. Santé, Éducation et Sport

      Santé des Femmes

      • Ménopause et Santé Mentale : La santé mentale des femmes sera un volet de la "Grande Cause Nationale 2025".

      • Endométriose : Demandes parlementaires pour une accélération de la prise en charge et une reconnaissance en Affection Longue Durée (ALD).

      Éducation à la Sexualité

      • Application de la loi de 2001 : Constat d'un échec public de 20 ans.

      La Secrétaire d'État s'engage à rendre effectives les trois séances annuelles obligatoires d'éducation à la vie affective et sexuelle, essentielles pour instaurer une culture du consentement dès le plus jeune âge.

      Héritage des Jeux Olympiques

      • Parité : Les JOP de Paris 2024 sont cités comme un modèle de parité.

      L'objectif est de maintenir cette dynamique dans la gouvernance du sport et de lutter contre le sexisme et les violences dans ce milieu.

      --------------------------------------------------------------------------------

      V. Diplomatie Féministe et Enjeux Internationaux

      La France entend maintenir son leadership international face au "backlash" (retour de bâton) conservateur.

      • Stratégie Internationale : Une nouvelle stratégie pour l'égalité sera dévoilée prochainement par le ministère de l'Europe et des Affaires étrangères.

      • Objectif 2025 : 65 % des projets financés par l'aide publique au développement devront favoriser l'égalité entre les sexes.

      • Français de l'étranger : Attention particulière portée aux "conjointes suiveuses" et aux victimes de violences isolées à l'international, nécessitant une coordination accrue avec le réseau consulaire.

      --------------------------------------------------------------------------------

      Citations Clés

      « L'éducation à la vie affective, relationnelle et sexuelle doit être renforcée, la loi appliquée, si nous voulons lutter efficacement contre les violences et la culture du viol. » — Véronique Riotton, Présidente de la Délégation.

      « Sept ans après MeToo, le compte n'y est pas. [...] Près de 1,5 million de femmes ont déclaré en 2023 avoir subi des violences sexuelles hors du cadre familial et elles ne sont que 2 % à avoir porté plainte. » — Salima Saa, Secrétaire d’État.

      « Les associations sont en train de pousser un cri du cœur : elles ne pourront pas assumer la prime Ségur sans moyens supplémentaires, alors qu'elles portent à bout de bras la politique de l'État. » — Céline Thébo-Martinez, Co-rapporteur.

    1. Briefing : Lutte contre les Violences Sexistes, Sexuelles (VSS) et Intrafamiliales

      Résumé Exécutif

      Ce document synthétise les échanges récents entre la délégation aux droits des femmes de l'Assemblée nationale et le garde des Sceaux concernant les réformes de la justice en matière de violences sexistes, sexuelles (VSS) et intrafamiliales. Face à un constat d'urgence marqué par la persistance des féminicides et une augmentation massive du contentieux, l'État engage une transformation structurelle de la réponse pénale. Les points saillants de cette stratégie incluent la création d'une Direction des victimes au ministère de la Justice, l'expérimentation de juridictions spécialisées inspirées du modèle espagnol, et l'obligation de formation pour les magistrats. Un axe majeur réside également dans l'amélioration du dépôt de plainte par la présence systématique d'avocats et la coordination accrue entre les procédures civiles et pénales pour éviter la victimisation secondaire.

      --------------------------------------------------------------------------------

      I. Modernisation de l'Accueil et de l'Accompagnement des Victimes

      Le système judiciaire français entame une transition pour placer la victime au centre du processus, rompant avec une culture jugée parfois trop centrée sur l'auteur ou la procédure.

      • Création d'une Direction des victimes et des usagers : Prévue pour le 1er juin, cette nouvelle entité au sein du ministère de la Justice vise à humaniser le parcours judiciaire et à assurer un suivi lisible pour les plaignants.- Présence de l'avocat lors du dépôt de plainte : Pour réduire le taux de classement sans suite (actuellement de 70 % pour les affaires sexuelles), le gouvernement propose de généraliser l'assistance d'un avocat dès le dépôt de plainte, mesure éligible à l'aide juridictionnelle (estimée entre 30 et 70 millions d'euros).- Transparence numérique : Déploiement d'un portail numérique entre septembre et décembre pour permettre aux victimes de suivre l'état d'avancement de leur plainte, sur le modèle du suivi de colis, afin de rompre l'isolement face aux délais judiciaires.

      --------------------------------------------------------------------------------

      II. Spécialisation et Formation de l'Appareil Judiciaire

      L'efficacité de la réponse judiciaire dépend de la montée en compétence des acteurs et de la spécialisation des structures.

      Formation Obligatoire des Magistrats

      Une réforme majeure conditionne désormais le droit de siéger dans les tribunaux jugeant des VSS à une formation annuelle obligatoire de 5 jours. Tout magistrat n'ayant pas suivi cette formation (portant sur le consentement, le contrôle coercitif et les traumatismes) ne pourra plus traiter ces dossiers.

      Expérimentation de Juridictions Spécialisées

      Inspiré par le succès du modèle espagnol, le ministère prévoit la création de juridictions spécialisées dans les tribunaux de "groupe 1".

      | Villes pressenties pour les pôles spécialisés | Objectifs de la spécialisation | | --- | --- | | Lyon, Nancy, Lille, Créteil, Rennes, Bordeaux, Toulouse, Saint-Denis de la Réunion, Clermont-Ferrand, Orléans | Coordination 360° entre le civil et le pénal. | | | Doctrine homogène et accélération des délais. | | | Meilleure prise en compte du contrôle coercitif. |

      --------------------------------------------------------------------------------

      III. Évolutions Législatives et Procédurales

      Le cadre légal s'adapte pour mieux cerner la réalité des violences et harmoniser l'application des lois sur tout le territoire.

      • Code VIF (Violences Intra-Familiales) : Publication prévue fin juin d'un recueil regroupant l'ensemble des dispositions, de la jurisprudence et de la doctrine pour guider les magistrats.- Coordination Civil-Pénal : Généralisation des "audiences uniques" en cour d'appel, permettant de juger les aspects civils et pénaux d'une même situation familiale le même jour afin d'éviter des décisions contradictoires.- Circulaire sur le consentement : Face aux disparités d'application de la nouvelle définition pénale du viol, une circulaire d'interprétation est soumise aux parlementaires pour garantir que les parquets poursuivent conformément à la volonté du législateur.- Levée du secret médical : Le garde des Sceaux se prononce en faveur d'une obligation de signalement par les professionnels de santé pour les cas de violences conjugales ou sexuelles, sur le modèle existant pour les mineurs.

      --------------------------------------------------------------------------------

      IV. État des Lieux et Données Statistiques

      L'analyse des chiffres révèle une tension extrême sur les moyens de la justice malgré une augmentation des condamnations.

      • Évolution de la population carcérale : En 2017, 8 % des détenus étaient incarcérés pour viol ou agression sexuelle ; ce chiffre atteint désormais 22 %.- Volume des poursuites : Le nombre d'informations judiciaires pour viol a triplé, passant de 10 000 en 2017 à 30 000 aujourd'hui.- Taux de condamnation : Si la majorité des plaintes sont classées (souvent faute de preuves conservées au départ), 87 % des informations judiciaires pour viol aboutissent à une condamnation à de la prison ferme.- Délais de jugement : Les délais restent critiques, avec une moyenne de 6 ans pour juger un viol en première instance et jusqu'à 8 ans pour un homicide.

      --------------------------------------------------------------------------------

      V. Défis et Points de Vigilance

      Les débats soulignent plusieurs obstacles persistants à une justice efficace et protectrice.

      • Saturation des tribunaux : L'exemple du tribunal de Thonon-les-Bains illustre la détresse du terrain (hausse de 70 % des signalements en 6 ans), confronté à un manque d'attractivité et à un absentéisme que le ministère tente de pallier par l'envoi de "magistrats placés".- Plaidé coupable criminel : L'extension possible de cette procédure aux crimes suscite des inquiétudes parlementaires quant à l'éviction du jury populaire et au risque d'une "justice expéditive". Le ministère propose d'en restreindre le champ pour exclure les crimes de viol.- Cyber-harcèlement et Masculinisme : Une vigilance accrue est demandée sur la violence en ligne et les contenus misogynes, nécessitant une spécialisation des sections judiciaires sur le volet numérique.- Obsolescence technologique : Le ministère reconnaît un retard numérique important ("préhistoire numérique"), affectant la productivité des agents et la communication avec les victimes.
    1. Briefing : Formation et Prise en Charge des Victimes de Violences Sexistes et Sexuelles (VSS)

      Synthèse de haut niveau

      Ce document de synthèse détaille les conclusions de la table ronde de l'Assemblée nationale consacrée à la formation des forces de sécurité intérieure (Police et Gendarmerie) et des magistrats sur la prise en charge des victimes de violences sexistes, sexuelles (VSS) et intrafamiliales (VIF).

      Face à une réalité massive — une femme victime de violence sexuelle toutes les deux minutes en France — les institutions ont opéré une mutation profonde depuis 2019.

      L'analyse met en lumière une augmentation significative du volume de formation, le déploiement d'outils numériques innovants et une volonté de placer la victime au centre du processus judiciaire.

      Toutefois, des défis structurels subsistent : l'hétérogénéité de l'accueil sur le territoire, le risque de victimisation secondaire, la précarité du financement des intervenants sociaux et la nécessité de briser les silos entre les juridictions civiles et pénales.

      --------------------------------------------------------------------------------

      1. La Gendarmerie Nationale : Doctrine et Dispositifs d'Intervention

      La Gendarmerie nationale fonde son action sur une priorité absolue : la protection de la victime.

      Cette doctrine se décline à travers un corpus de trois textes traitant des violences conjugales, des VSS et des mineurs.

      Stratégie d'accueil et d'investigation

      • Obligation de plainte : Tout gendarme est tenu de prendre la plainte et de mener une enquête exhaustive.

      La pratique de la "main courante" est strictement interdite pour ces faits.

      • Diversification des canaux :

        • Réseau de plus de 3 000 brigades territoriales.
      • Brigade numérique et plateforme d'accompagnement basées à Rennes (réponse par chat 24h/24).

      • Auditions en mobilité (mairies, espaces France Services) et dans les hôpitaux, particulièrement pour les cas de viol ou d'agression sexuelle.

      • Lutte contre la victimisation secondaire : Les actes d'investigation (saisies de vêtements, prélèvements ADN, examens gynécologiques) sont expliqués pédagogiquement pour minimiser le traumatisme additionnel.

      Dispositifs de formation

      La gendarmerie a structuré sa formation autour du Centre National de Formation au Renseignement et à l'Investigation (CNFRI) :

      • Formation initiale : Inclut plus de 40 heures sur les techniques d'audition générale, complétées par 8 heures spécifiques aux VSS/VIF.

      • Formation continue :

        • Niveaux 1 et 2 : Sensibilisation déconcentrée en groupement.
      • Niveau 3 : Spécialisation en techniques d'audition de mineurs et mécanismes de VIF (200 militaires formés ou recyclés par an au centre national).

      • Réseau de formateurs relais : Environ 200 formateurs répartis sur le territoire pour assurer une expertise de proximité.

      • Innovation : Publication récente d'un guide spécifique pour l'audition des victimes mineures présentant des troubles du spectre de l'autisme.

      --------------------------------------------------------------------------------

      2. La Police Nationale : Restructuration et Modernisation

      La Police nationale appuie sa politique sur deux piliers : une prise en charge globale (policière, judiciaire et sociale) et un développement numérique volontariste.

      Piliers de la prise en charge

      • Ressources humaines spécialisées : Déploiement de délégués et référents à tous les échelons (national à local), présence d'une centaine de psychologues et de près de 500 intervenants sociaux (en partage avec la Gendarmerie).

      • Outils numériques : Utilisation de la plateforme nationale d'accompagnement des victimes et développement du logiciel "Parole" pour la retranscription automatique des auditions (notamment dans les salles Mélanie).

      Volume et méthodes de formation

      | Corps de métier | Durée/Volume de formation spécifique | | --- | --- | | Gardiens de la Paix (Initiale) | 64 heures dédiées aux VSS et VIF (sur 12 mois). | | Officiers de Police Judiciaire (OPJ) | 46 heures d'adaptation au premier emploi. | | Cadres (Officiers/Commissaires) | 25 à 30 heures (volume triplé en 8 ans). | | Formation continue (obligatoire) | 2 heures en distanciel pour tout agent en contact avec le public. |

      Méthodes pédagogiques : Utilisation de pavillons de simulation pour des mises en situation réelle (interventions à domicile) et recours à des supports culturels comme la pièce de théâtre "Je vais bien" pour illustrer les mécanismes de l'emprise.

      --------------------------------------------------------------------------------

      3. Le Ministère de la Justice : Réformes et Défis Judiciaires

      Le système judiciaire est en phase de transition pour passer d'un modèle centré sur le couple "État-auteur" à un modèle plaçant la victime au cœur de la procédure.

      Évolution de la formation des magistrats

      • Formation initiale (ENM) : Un cursus de 31 mois incluant des enseignements obligatoires sur le psychotraumatisme, la médecine légale et l'économie de l'inceste.- Vers l'obligation de formation continue : Une réforme est en cours pour rendre obligatoire la formation aux VSS et VIF pour les magistrats traitant ces contentieux.- Cycle Approfondi (CAVIF) : Une formation d'expertise de 15 jours répartie sur deux ans, incluant des stages d'immersion (ex: CIDFF), avec un taux de satisfaction de 100 %.

      La lutte contre la victimisation secondaire à l'audience

      La victimisation secondaire est identifiée comme un "angle mort" persistant.

      Plusieurs leviers sont identifiés :

      • Police de l'audience : Nécessité pour le président d'audience de réguler les questions "illégitimes" ou dégradantes de la défense.

      • Évolutions législatives proposées : Introduction de la notion de "dignité" dans le code de procédure pénale pour encadrer les débats et possibilité de notifier à la victime son droit de ne pas répondre à certaines questions.

      • Changement de regard : Intégration du témoignage direct des victimes dans les sessions de formation pour confronter les magistrats aux failles du parcours judiciaire.

      --------------------------------------------------------------------------------

      4. Analyse des défaillances et leviers d'amélioration

      Les échanges ont permis d'identifier des zones critiques nécessitant une attention immédiate.

      Retours d'Expérience (RETEX) et Féminicides

      L'analyse systématique des homicides conjugaux depuis 2019 révèle que :

      • Dans 30 % des cas, la victime était déjà connue des services de police ou de justice.

      • Les failles principales résident dans le défaut de communication entre services et l'absence de réévaluation dynamique du danger.

      • Le cas de Poitiers : Illustration du risque de donner un "outil" (Téléphone Grave Danger - TGD) sans l'adosser à des mesures coercitives immédiates (garde à vue, inscription au fichier des personnes recherchées).

      La problématique des "Silos"

      Le cloisonnement entre le civil (juge aux affaires familiales) et le pénal (tribunal correctionnel) est une source de traumatisme pour les familles.

      Une expérimentation d'audiencement unique en phase d'appel est encouragée, tout comme l'idée d'un "juge des familles" unique en première instance pour traiter la globalité de la situation familiale et de la protection des enfants.

      Le Contrôle Coercitif

      Ce concept, encore peu incriminé de manière autonome, est souvent invisible.

      Il inclut le contrôle des finances, des communications (trackers, lecture de SMS) et des comportements quotidiens.

      Sa détection précoce est jugée cruciale pour prévenir le passage à l'acte violent, notamment par l'intégration de critères comme la strangulation dans la grille d'évaluation du danger (GED).

      Précarité des Intervenants Sociaux (ISCG)

      L'objectif de la loi LOPMI (600 intervenants en 2027) est menacé par l'instabilité des financements locaux.

      Ces professionnels sont pourtant indispensables pour traiter les aspects non policiers de la crise (logement, finances) et pour aider les victimes à sortir durablement du cycle de la violence.

      --------------------------------------------------------------------------------

      Conclusion

      Si la montée en compétence des professionnels est réelle et documentée par une augmentation des volumes horaires et des budgets, l'efficacité du système repose désormais sur :

      • La réévaluation permanente du danger (sortir de la vision "instantanée").

      • La fin des silos procéduraux entre civil et pénal.

      • La régulation stricte de la dignité lors des audiences pour éviter que le procès ne devienne une nouvelle agression pour la victime.

    1. The competitive landscape in AI infrastructure has made this gap impossible to ignore. Teams building custom CUDA, Triton, and Helion kernels are striving for every percentage point of throughput. Until now, there hasn't been a way to fine-tune code generation for a specific workload.

      大多数人认为GPU编译器已经提供了足够的优化选项,开发者可以通过手动调整获得最佳性能。但作者指出,在当前AI基础设施的竞争环境下,这种观点已经过时,暗示传统方法无法满足现代AI工作负载的性能需求。

    2. CompileIQ is not a magic tool that automatically turns poorly-written code into high-performing code. To get the best value from CompileIQ, you need to start with reasonably high-performing code, which then enables the final compiler-heuristics tweaks to take you to maximum performance.

      大多数人可能认为AI驱动的自动调优工具可以弥补代码质量不足的问题,但作者明确表示,即使是CompileIQ这样的先进工具也需要基于已经相当优化的代码才能发挥最大作用。这挑战了"自动化工具可以解决一切性能问题"的常见误解。

    1. Regulation stops being a document that people interpret and becomes code that systems execute.

      大多数人认为合规主要是人类专家解读和执行法规的过程。但作者认为法规将从人类解释的文档转变为系统执行的代码。这挑战了合规工作的本质认知,暗示AI将彻底改变合规领域的基本工作方式,从人类主导转向系统主导。

    1. Besides that, hacks can lead to SSRF (server-side request forgery) exploits and, in some cases, remote code execution.

      大多数人认为单个漏洞通常只导致一种类型的安全问题,但作者指出这个漏洞可能导致从认证绕过到远程代码执行等多种攻击,这挑战了'单一漏洞单一影响'的普遍认知,展示了基础框架漏洞可能引发的连锁安全风险。

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) All outcomes are attributed specifically to L6b neurons, but the genetic manipulation is not specific to L6b neurons. The authors acknowledge this as a limitation, but in my view, this global manipulation is more than a limitation - it affects the overall interpretations of the data. The Hoerder-Suabedissen et al., 2018 paper shows sparse, but also dense, expression of Drd1a+ neurons in brain regions outside of the L6b. Given this issue, the results are largely overstated throughout the paper.

      We appreciate the reviewer’s careful reading and concern that some of our statements may have overstated the implications of our data. The Drd1a Cre mouse model used (FK164) has a relatively selective expression of Drd1a Cre in cortex, but indeed some expression is seen subcortically. This is an acknowledged limitation which is now explicitly addressed in the revised manuscript.

      (2) It is not clear to me that the "silencing" of Drd1a+ neurons was verified.

      In our previous publications, we showed confirmation of the loss of regulated synaptic vesicle release from the Cre-positive neuronal population (Marques-Smith et al., 2016; Hoerder-Suabedissen et al., 2018; Messore et al., 2024). This has now been described in the revised manuscript.

      (3) There were various discrepancies (and potentially misattributions) between the stated significant differences in Supplementary Table T1 data and Figure 3a & S2 spectral plots. This issue makes it difficult to effectively evaluate the main text and stated outcomes.

      We thank the reviewer for their careful attention to the statistical analyses and for noting the inconsistencies in how the results of the spectral analysis were presented: in the text we described two-way ANOVAs with according posthoc tests but in the figures significance markers were positioned based on multiple t tests. We have now carefully revised the spectral results and implemented a consistent approach in statistical reporting and spectral plots. We have updated Supplementary Table T1, Figure 3a and S2 to ensure that all statistics are presented consistently throughout the manuscript, i.e. with two-way ANOVAs and accompanying posthoc tests. Please note that we performed all spectral analyses in the range between 0.5 and 128 Hz (excluding the range between 49-51.5 Hz due to electrical noise from the power grid) but only plot the range between 0.5-30 Hz as the spectral bands most relevant for sleep neurophysiology are contained in this range.

      Related, the authors stated that post hoc comparisons of EEG spectral frequency bins were not corrected for multiple testing. Instead, significance was only denoted if changes in at least two consecutive frequency bins were significant. However, there are multiple plots in which a single significance marker is placed over an isolated bin (i.e., 4c, 6, S5, S6). Unless each marker is equivalent to 2 consecutive frequency bins, these markers should be removed from the plots. Otherwise, please define the frequency and size of these markers in the main text.

      In line with the previous comment, we have adjusted markers to reflect the results from posthoc tests after two-way ANOVAs.Please note that Figure 6 and the related supplementary figures S5 and S6 have now been removed from the manuscript, as careful re-analysis indicated that the sample size was too low to support a strong conclusion regarding the comparison of orexin effects between genotypes. We stated in the text that we would only include posthoc significance when at least two consecutive bins were significant, but this was indeed not supported in our figure, where each marker reflects one 0.25 Hz bin. We have now adjusted our code to ensure that only markers are plotted when at least two consecutive bins are significant in bin-wise posthoc comparisons.

      (4) A rainbow color scale, as in Figure 3, we've now learned, can be misleading and difficult to interpret. The viridis color scale or a different diverging color scale are good alternatives.

      Thank you for pointing this out, we have adjusted the colour scale.

      (5) How much time elapsed between vehicle/orexin A & B infusions?

      There were 2-4 non-infusions days between infusions. We have added this information to methods.

      (6) For Figure 6, there are statistical discrepancies between the main text and the plots (pg. 10):

      (a) The text claims post hoc differences for relative ORXA frontal EEG, but there are no significance markers on the plot.

      (b) The text states that there were no post hoc differences for the relative ORXA occipital EEG, but significance markers are on the plot.

      (c) The main test for the relative ORXB frontal EEG was not significant, but there are post hoc significance markers on the plot.

      (d) For relative ORXB occipital EEG, there are significant markers on the plot outside of the stated range in the text.

      We agree with the reviewer, and we decided to exclude this figure from the manuscript as the sample size for some key comparisons was too low to support any strong conclusions and therefore presenting this analysis is potentially misleading. We explain the rationale for excluding this analyses in the revised manuscript.

      (7) Some important details are only available in figure captions, making it difficult to understand the main text. For example, when describing Figure 3c in the main text on page 7, it is not clear what type of transitions are being discussed without reading the figure caption. Likewise, a "decrease," "shift," and "change" are mentioned, but relative to what? Similar comment for the EEG theta activity description on pages 7 - 8. Please add relevant details to the main text.

      We have adjusted the wording in the main text to reflect more precisely which comparisons are shown in the figures.

      (8) Statistical comparisons for data in Figure 3e, post hoc analyses for data in Figure S7a-b REM data, and post hoc analyses for Figure S7c (not b) occipital EEG should be included to support differences claims. Please denote these differences on the respective plots.

      Please note that the previously named Supplementary Figures S5 and S6 have been removed from the manuscript, and that the Supplementary Figure S7 in this comment refers to the figure currently named Supplementary Figure S5.

      We have added the statistical comparisons for Figure 3e, Supplementary Figure S5A and Figure S5b to the results section. In Figure S5c, there was an overall genotype difference, but there was no significant time x genotype interaction, so we have not performed posthoc tests and did not plot posthoc significance markers for this figure. We have adjusted the wording in the results section to make this clearer. We have adjusted the reference to the figure S5c which was incorrect, thank you for your careful attention.

      (9) In the subsection titled "Layer 6b mediates effects of orexin on vigilance states (pg. 8)," there does not seem to be any stated differences between control and L6b silenced mice. A more accurate subtitle is needed.

      We agree with the reviewer and the title of this sub-section has now been changed accordingly.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Although the authors used a highly selective approach to silence layer 6b neurons, the observed changes in EEG oscillations cannot be solely attributed to layer 6b neurons because of the ICV route for orexin administration.

      We thank the reviewer for this important comment. The ICV route of orexin administration cannot guarantee that only cortical Drd1a-Cre–expressing neurons are reached by orexin, and the Drd1a-Cre driver line is highly selective but not entirely specific for layer 6b neurons (see also response to reviewer #1, comment 1). We have therefore changed the wording of the stated effects and addressed this consideration in the Limitations section of the manuscript. Please note that, as mentioned above, Figure 6 has now been excluded from the manuscript.

      (2) The rationale for using only male rats is not provided.

      We thank the reviewer for highlighting this omission. We now provide the rationale for using only male mice in the methods section as follows: “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Better descriptions of L6b connectivity will improve clarity in the second paragraph of the Introduction (pg. 3). For example, it is not explicitly stated that L6b projects to L5 before the authors describe L5. Therefore, the L5 description seems irrelevant.

      We thank the reviewer for this request for clarification. We mention the connectivity between L6b and L5 because L5 pyramidal neurons have recently been found to play a key role in sleep-wake regulation (Krone et al., Nat. Neurosci. 2021; Honjo et al., 2025; Wasilczuk et al, 2025; Krone et al., 2025). We have now amended the corresponding section of the introduction to emphasise the potential functional relevance of this connection as follows:

      “L5, the major output layer of the cortex, is also bidirectionally communicative with higher order thalamic nuclei (Hoerder-Suabedissen et al., 2018) as well as layer 5 pyramidal neurons (Zolnik et al., 2024). Since several subtypes of L5 pyramidal neurons have recently been shown to play important roles in distinct aspects of sleep-wake regulation (Krone et al., 2021, 2025; Hong et al. 2023; Wasilczuk et al. 2025; Honjo et al., 2025; Chouafeev et al., 2025); depth of anaesthesia (Wasilczuk et al. 2025), and the influence of stress on sleep (Chouafeev et al. 2025) the projections of orexin-sensitive L6b to L5 pyramidal neurons may be a key circuitry in the top-down regulation of brain states.”

      (2) There are plots where the y-axis tick label appears to be offset from the tick mark (4a, S5b, S6a).

      Thank you for spotting this graphical issue. We have removed the y-axis tick labels from Figure 4a to avoid confusion. Please note that we decided to remove Figure S5 and Figure S6, because after careful re-analysis we concluded that the group size was too small to draw conclusions on orexin spectra and that any results could be potentially misleading.

      (3) The 2-h time constant, I believe, is depicted in Figure 4H (not 4G).

      Thank you for spotting this. We have corrected the figure legends accordingly and double-checked that Figure 4G depicts the 2-h time constant and Figure 4H the 6-h time constant.

      (4) "...although there was an indication of a higher absolute theta-peak power in layer 6b silenced mice (Figure S6)," pg. 10. It is not clear to me how the data lead to this conclusion.

      Thank you for identifying this inconsistency, which resulted from a preliminary statistical analysis subsequently corrected. We have now improved the statistical analysis of spectral data (for more details see comments to both reviewers in public response) and removed this statement, which in fact is no longer supported by the data.

      (5) Exclusion of female mice is not listed as a limitation.

      We now discuss this limitation as follows:

      “In the current study, only male mice were used, because our experimental protocol precluded the possibility of accurately monitoring the oestrous cycle, which has marked effects on brain activity, arousal and vigilance states. We therefore decided to use male mice only for the current study but are planning to use both sexes in future work.”

      (6) A brief description of why Cplx3 and Tbr1 antibodies are being used will be helpful to include in the Methods (pg. 21) in addition to what is in the figure caption.

      We have added the following information to the methods section to clarify why we used these two antibodies: “rabbit α-Cplx3 to distinguish between L6a and L6b” “mouse α-Tbr1 to identify the L5-6 boundary”

      (7) Including a label/title for the Figure 2c spectral plots will be helpful. It is not immediately clear if these are light period & dark period data or frontal & occipital data.

      Thank you for pointing this out, we have updated the figure legend to clarify what is shown on this Figure

      Similar comments for S2 and S3a plots. Including a state label on the plots will be helpful in addition to the caption description.

      We have now added the state labels for Figure panels S2 and S3a for improved clarity.

      Reviewer #2 (Recommendations for the authors):

      This is a soundly conducted and well-written study that enhances our understanding of the cortical control of states of consciousness. I do not have any major concerns, but would like the authors to consider some alternate possibilities as suggested in my comments below:

      We thank the reviewer for this positive assessment of our manuscript and the helpful suggestions.

      (1) Given that the inactivation of layer6b neurons did not affect the time spent in sleep-wake states, to me it appears that these neurons likely have a role in creating the background neural conditions/oscillations supportive of an activated state rather than a direct role in behavioral state control.

      We completely agree with the reviewer and have made the wording more consistent throughout the manuscript, now using “brain state control” rather than “behavioural state control” to clarify that the main effect observed in the L6b-silenced mouse model is a change in spectral characteristics reflecting brain oscillations, rather than effects on vigilance states, which were modest.

      (2) Does the observed shift in REM sleep-related theta-peak frequency in the occipital derivation suggest changes in local neural processes, or could it be just a matter of better signal detection because theta is most prominent at or around the hippocampal region, which is approximately the location of occipital electrodes in this study.

      The source of the shift in REM sleep–related theta peak frequency in the occipital derivation cannot be established with EEG recordings alone. Additional intracortical or intrahippocampal recordings would be necessary to distinguish between the two possible explanations proposed by the reviewer. We have discussed this further in the revised manuscript.

      (3) Orexinergic system innervates multiple subcortical sites and widely covers the cortex too, because of which the effect of ICV orexins cannot be attributed to just layer6b neurons as described in the manuscript ("Layer 6b mediates effects of orexin on brain activity.").

      We agree with the reviewer that this is a limitation. We have now adjusted the subtitle of the paragraph describing the results from the ICV administration of orexin and further mention this important consideration in the ‘limitations’ section of the discussion.

      (4) While the current study is focused on sleep-wake mechanisms, the findings reported here have much broader implications for behavioral and/or brain state arousal and provide a mechanistic bridge between different states of consciousness, including general anesthesia. Therefore, the authors may consider tying these findings with the recent work on the role of the prefrontal cortex in arousal from general anesthesia and slow-wave sleep (PMID: 35436248, PMID: 29937348, PMID: 33328847).

      We thank the reviewer for this excellent recommendation. We are now citing these papers in the revised manuscript.

      (5) It's up to the authors, but I do not see the need for the section on Clinical Implications. It's very speculative, and it makes the entire discussion section heavy.<br />

      We have considerably shortened the discussion of potential clinical implications to make the manuscript more concise.

      (6) Figure 1: It's difficult to compare the EEG power the way figures are set up right now. I think it would enhance clarity if the authors separate the plots based on state and show power from the control and silenced neuronal group in the same plot. Also, the colors are too similar (essentially a shade of green/blue) to provide effective visual resolution. This is especially true in panel d. Please consider changing the color scheme.

      This comment seems to refer to Figure 2 and subsequent figures with analysis of vigilance states and EEG spectra (Figure 1 contains histological images). We have selected the colour scheme for colour-blind individuals. Therefore, the main difference is in the saturation, not the colour of the plots. We have tested the visibility of the colour scheme on a high-resolution screen with the original image files and can reassure the reviewer that the genotype differences, which are slightly blurred in the reduced-resolution figures provided within the combined text file for the review process, are easily distinguishable in the final figure quality.

      (7) I don't understand the y-axis scale in Figure 1. How can this be 500% and if it is, then 500% of what?

      This comment also seems to refer to the analysis of slow wave activity (SWA) in Figure 2 rather than to Figure 1 (histology figure). The percentage of SWA is normalised to the average SWA across the recording. Since NREM sleep is characterised by considerably higher SWA than wakefulness and REM sleep, the level of SWA during NREM sleep is in the range of 200-300%, and can be even higher after long wake episodes which are followed by a rebound of NREM sleep SWA. Hence, the upper limit of the y-axis in these (and subsequent) plots of SWA is 500% (of the average SWA). We have amended the figure legend to clarify that SWA is presented here as percentage of average SWA across the recording.

    1. Your Obsidian Vault Is a Knowledge Graph. Here’s How to Make It Think (quickly)
      • Core Premise: An Obsidian vault maps perfectly onto a code repository structure. It functions as an implicit graph database where notes act as nodes, wikilinks serve as directed edges, tags categorize subgraphs, and YAML frontmatter defines attributes.
      • The Claude Code Solution: Instead of basic autocomplete plugins, users can navigate, search, and manage their knowledge vaults by connecting Anthropic's Claude Code via the terminal command line (cd ~/my-vault && claude).
      • The Power of CLAUDE.md: Placing a CLAUDE.md file in the root directory establishes clear instructions, vault context, active projects, formatting rules, and strict negative constraints (e.g., prohibiting modification of templates or automated deleting).
      • Integration Tooling Ecosystem:
        • Tier 1: Direct file system integration enhanced by obsidian-skills to natively understand format elements like wikilinks and callouts.
        • Tier 2: Model Context Protocol (MCP) servers like MCPVault or obsidian-mcp-tools for compressed token usage, structured search, and semantic discovery.
        • Tier 3: High-performance engines like TurboVault (Rust-based) for graph operations, multi-hop traversal, and SQL querying.
        • Tier 4: Embedded sidebar plugins (e.g., Claudian, Cortex) for users wanting a unified workspace layout.
      • High-ROI Workflows:
        • Automated Backlinking: Scraping daily journal notes to dynamically match and generate links to existing or new entity stubs.
        • Cross-Domain Synthesis: Instructing the AI to exclusively reference personal notes to map structural parallels across seemingly unrelated folders.
        • Vault Maintenance: Identifying disconnected "orphan" notes, repairing broken wikilinks, and generating gap analysis reports to guide future writing.
      • Safety Protocols: It is highly recommended to track the entire vault using Git to review changes via diffs, isolate all AI outputs inside a specialized draft directory (_ai-drafts/), and rigidly scope prompts to prevent hallucinated external data injection.
    1. agentic

      是的,“agentic” 确实是一个形容词,而且它正是从名词 agent(代理人/主体)演变而来的。

      在语言学和心理学中,“agency” 指的是一种“主观能动性”或“代理能力”。因此,agentic 的字面意思是“具有代理能力的”、“展现出能动性的”。在当今的 AI 领域(如你正在浏览的 Claude Code 官方文档),它常被用来形容“智能体化的”或“具备自主执行任务能力的”系统。

      💡 后缀 -ic 的构词规律 在英语中,名词 + -ic 是一种非常经典的构词方式,意为 “与……有关的”、“具有……特性的” 或 “由……产生的”。

      像这样通过加 -ic 从名词变成形容词的常见单词还有很多,我们可以把它们分为几类:

      1. 基础人称/身份名词 ➔ 形容词 这类词和 agent ➔ agentic 的演变轨迹最像,直接从某种身份或角色变为形容词:

      Prophet(预言家)➔ Prophetic(预言性的)

      Patriot(爱国者)➔ Patriotic(爱国主义的)

      Nomad(游牧民)➔ Nomadic(游牧的、流浪的)

      1. 抽象概念/领域名词 ➔ 形容词 Chaos(混乱)➔ Chaotic(混乱的)

      Irony(讽刺)➔ Ironic(具有讽刺意味的)

      Climax(高潮)➔ Climactic(高潮的的)

      System(系统)➔ Systemic(系统的、影响全局的)

      1. 学科/艺术/科学名词 ➔ 形容词 History(历史)➔ Historic(具有历史意义的)

      Atom(原子)➔ Atomic(原子的)

      Artist(艺术家)➔ Artistic(艺术的、有艺术才能的)

      Therapy(治疗)➔ Therapeutic(有疗效的、有益健康的)

    1. I was laid off by Atlassian
      • Introduction and Context: The author reflects on his experiences after being affected by layoffs at Atlassian, where he worked for approximately 8 years. He shares details about the technical architecture he built, key achievements, and non-technical lessons learned to inspire others in similar situations.
      • The Interview Process (8 Years Ago):
        • Began with an online coding quiz on HackerRank, which he aced with full marks.
        • The first technical round involved reading a Cloudflare white paper on custom domains for 10 minutes and then answering architectural questions regarding microservices and containers.
        • The second technical round was a live troubleshooting simulation of a real Atlassian incident (an application issue causing a Denial of Service). He also faced questions about latency-based DNS routing.
        • During the values interview, when asked what success would look like in 12 months, the interviewers outlined the need for an internal platform application to provide self-service load balancing for Atlassian dewelopers.
      • Building the Open Service Broker (OSP):
        • In his first few weeks, the author built an application adhering to the Open Service Broker API specification to automate infrastructure provisioning in a Kubernetes environment.
        • Internal developers declared their infrastructure requirements using configuration files in version control, which build servers then uploaded to the broker.
        • The system was originally built in Python using the Connexion library (routing based on OpenAPI documents), later migrated to pure Flask, and eventually transitioned to FastPI.
        • The architecture utilized an asynchronous task queue model: FastAPI received requests, pushed task details to AWS SQS, and background workers handled tasks (like creating DNS records or CloudFront distributions) while writing status updates to DynamoDB.
      • Transitioning to Envoy Proxy and Sovereign (Control Plane):
        • Atlassian decided to replace expensive corporate enterprise load balancers with Envoy Proxy, an open-source, cloud-native proxy.
        • The author built a custom Envoy management server/control plane named Sovereign (which was open-sourced on Bitbucket).
        • Built with FastAPI, Sovereign pulled dynamic context data from the broker's database and AWS S3 buckets, injected it into templates for Envoy resources (clusters, routes, listeners), and dynamically pushed updated configurations to running proxies over the wire.
      • Infrastructure as Code and Image Automation (AMI):
        • The entire proxy infrastructure—comprising around 2,000 proxies across 13 AWS regions—was deployed using AWS CloudFormation templates defining VPCs, subnets, Network Load Balancers (NLBs), Security Groups, and Auto Scaling Groups.
        • To create standardized images, the team used HashiCorp Packer combined with SaltStack (a declarative configuration management tool similar to Ansible or Chef).
        • The resulting AMI had pre-installed and optimized components, including Envoy, network tuning configurations, security hardening layers, and observability agents for logging, tracing, and metrics.
      • Mass Migration and Edge Centralization:
        • Following the initial framework setup, the team spent roughly two years migrating major Atlassian core products (Jira, Confluence, Bitbucket, Statuspage) and thousands of microservices behind this centralized edge infrastructure.
        • The platform locked down public exposures; microservices could no longer be accessed publicly by accident. Developers had to explicitly signal intent through the proxy configuration.
        • Centralizing these features saved millions of dollars and massive development time, sparing thousands of developers from having to independently implement features like authentication, authz, or rate limiting on their own backends.
      • Sidecar Architecture and Custom Rust Tools:
        • While DDoS protection was offloaded to AWS CloudFront and Access Logs were captured natively via Envoy's HTTP Connection Manager filters, more complex features required a sidecar container model running locally on the proxy EC2 instances.
        • The author personally designed and wrote a custom authentication sidecar container from scratch using Rust ("the Lord's language").
        • Other specialized internal teams contributed separate sidecar containers for authorization and rate-limiting.
      • Non-Technical Growth and Professional Lessons:
        • Diplomacy and Conflict Resolution: Working with various managers and diverse personalities for nearly a decade forced the author to dramatically mature his skills in persuasion, mentoring, and navigating interpersonal friction.
        • Code Churn and Long-Term Maintenance: The author notes that building software is easy, but maintaining its malleability over time is hard. Codebases develop highly predictable areas of continuous modification ("code churn"), which serve as code smells indicating growing complexity that must be actively refactored before coupling paralyzes development.
        • Mentoring vs. Training: The author successfully mentored an intern to achieve the highest possible performance rating and a return offer. However, he reflects on mentoring as a highly challenging balancing act—knowing how to guide someone without giving away answers or letting them get overly frustrated—distinguishing it from his everyday engineering strength of breaking down complex system architectures into easily digestible mental models for peers.
    1. Gaining control of every projector and camera on campus
      • The Discovery: While attending the Colorado School of Mines, the author discovered that local DNS servers assign a unique subdomain to every device connecting to the campus Wi-Fi network.
      • Subdomain Enumeration:
        • Initial Attempts: The author first used Python and brute-force permutations to guess subdomains, but the asynchronous script was too slow.
        • Rust Optimization: Moving to Rust and optimizing the code (incrementing an integer and converting it to base 36) dramatically improved speed. They bypass the standard library by interacting directly with the UDP port and utilizing Bash scripting to distribute offsets across multiple processes.
        • The Crash: The optimized Rust script generated queries so quickly (hitting peak rates up to 4.04 Gbps) that it crashed the campus DNS server, causing a 15-minute network outage. School IT tracked them down because they had spent two weeks talking openly about the project.
      • PTR Records: Realizing brute forcing became unrealistic for longer subdomains, the author pivoted to utilizing DNS Reverse Lookup (PTR records), which allowed them to map known active IP addresses back to domain names.
      • Port Scanning and AF_XDP:
        • The author created a custom, lightweight network scanner called convoy utilizing Linux's AF_XDP to bypass the core network stack.
        • By horizontally scanning (one port across all machines before moving to the next), they safely achieved scan speeds of 300,000 ports per second on a single core.
      • Campus Exploitation:
        • Due to loose network restrictions surrounding wireless casting, certain subnets were accessible.
        • The scanner revealed 36 campus security cameras running on default passwords. Although deep packet inspection rules blocked live video streaming, the author reverse-engineered the web interface's API to synchronously manipulate camera positions.
        • They also found unprotected controls for almost every projector screen and input switch across the campus classrooms.
      • Reporting: The vulnerabilities were responsibly disclosed to campus IT, who stated the issues would be patched over the summer. The author received no financial compensation.

      Hacker News Discussion

      • Network Segmentation Failures: Users expressed shock that a modern university in 2026 would still run a completely flat network architecture, allowing unvetted student devices onto the same subnets as critical infrastructure, surveillance cameras, and IoT equipment without basic VLAN segmentation.
      • Lenient Academic Consequences: Commenters heavily debated the IT department's mild reaction to a network crash. Some argued that causing campus-wide outages warrants severe disciplinary action to prevent dangerous professional habits, while others recalled their own college days—noting that universities traditionally serve as a safe environment to learn boundaries, and harsh punishments only incentivize hackers to hide their findings.
      • Alternative Enumeration Techniques: Network professionals chimed in with alternative scanning methods, noting that hotel and public networks often share a single central DNS server across guest and internal networks, allowing easy reverse PTR record profiling. Others recommended utilizing broadcast mDNS/Bonjour for local device footprinting.
      • Industry Perspectives: Former project managers for AV hardware companies noted that modern firmware explicitly mandates changing default passwords upon setup, placing the blame squarely on poor campus IT implementation.
    1. Everyone wants to read personal thoughts from real human beings, but no one writes them anymore. What we get instead is slop, and that’s hardly a good read. The moment I notice I’m reading autogenerated text, I care less. That’s why I keep writing. My personal blog is a weird mix of ramblings about reviewing code, the best programmers I know, and random thoughts. But you know what? People are reading it and from time to time I get an email from someone who found one of my articles helpful.

      Making the case that now is the time to start a blog because there are many people out there who are looking for "personal thoughts from real human beings..."

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this review paper, the authors describe the concept of neural correlates of consciousness (NCC) and explain how noninvasive neuroimaging methods fall short of being able to properly characterise an unconfounded NCC. They argue that intracranial research is a means to address this gap and provide a review of many intracranial neuroimaging studies that have sought to answer questions regarding the neural basis of perceptual consciousness.

      Strengths

      The authors have provided an in-depth, timely, and scholarly contribution to the study of NCCs. First and foremost, the review surveys a vast array of literature. The authors synthesise findings such that a coherent narrative of what invasive electrophysiology studies have revealed about the neural basis of consciousness can be easily grasped by the reader. The review is also, to the best of my knowledge, the first review to specifically target intracranial approaches to consciousness and to describe their results in a single article. This is a credit to the authors, as it becomes ever harder to apply strict tests to theories of consciousness using methods such as fMRI and M/EEG it is important to have informative resources describing the results of human intracranial research so that theorists will have to constrain their theories further in accordance with such data. As far as the authors were aiming to provide a complete and coherent overview of intracranial approaches to the study of NCCs, I believe they have achieved their aim.

      We appreciate the reviewer's positive feedback on our work.

      Weaknesses

      Overall, I feel positive about this paper. However, there are a couple of aspects to the manuscript that I think could be improved.

      (1) Distinguishing NCCs from their prerequisites or consequences

      This section in the introduction was particularly confusing to me. Namely, in this section, the authors' aim is to explain how intracranial recordings can help distinguish 'pure' NCCs from their antecedents and consequences. However, the authors almost exclusively describe different tasks (e.g., no-report tasks) that have been used to help solve this problem, rather than elaborating on how intracranial recordings may resolve this issue. The authors claim that no-report designs rely on null findings, and invasive recordings can be more sensitive to smaller effects, which can help in such cases. However, this motivation pertains to the previous sub-section (limits of noninvasive methods), since it is primarily concerned with the lack of temporal and spatial resolution of fMRI and M/EEG. It is not, in and of itself, a means to distinguish NCCs from their confounds.

      As such, in its current formulation, I do not find the argument that intracranial recordings are better suited to identifying pure NCCs (i.e. separating them from pre- or post-processing) convincing. To me, this is a problem solved through novel paradigms and better-developed theories. As it stands, the paper justifies my position by highlighting task developments that help to distinguish NCCs from prerequisites and consequences, rather than giving a novel argument as to why intracranial recordings outperform noninvasive methods beyond the reasons they explained in the previous section. Again, this position is justified when, from lines 505-506, the authors describe how none of the reported single-cell studies were able to dissociate NCCs from post-perceptual processing. As such, it seems as if, even with intracranial recording, NCCs and their confounds cannot be disentangled without appropriate tasks.

      The section 'Towards Better Behavioural Paradigms' is a clear attempt to address these issues and, as such, I am sure the authors share the same concerns as I am raising. Still, I remain unconvinced that the distinguishing of NCCs from pre-/post- processing is a fair motivation for using intracranial over noninvasive measures.

      We agree that distinguishing proper NCCs from their prerequisites or consequences is primarily a matter of experimental design and theoretical framework, not merely of recording modality. We did not mean to imply that intracranial recordings inherently solve this dissociation.This is now explicitly stated that at the beginning of this section. Instead, we argued that the high signal-to-noise ratio and spatiotemporal accuracy of sEEG offer a stronger "testing ground" for the null findings often relied on by no-report paradigms. This is now also further clarified in the revised section “Limits of noninvasive measures”.

      We also explicitly acknowledge, as the reviewer noted, that even the most precise recordings require careful task dissociations to distinguish NCCs from their prerequisites and consequences.

      (2) Drawing misleading conclusions from certain studies

      There are passages of the manuscript where the authors draw conclusions from studies that are not necessarily warranted by the studies they cite. For instance:

      Lines 265 - 271: "The results of these two studies revealed a complex pattern: on the one hand, HGA in the lateral occipitotemporal cortex and the ventral visual cortex correlated with stimulus strength. On the other hand, it also correlated with another factor that does not appear to play a role in visibility (repetition suppression), and did not correlate with a non-sensory factor that affects visibility reports (prior exposure). These results suggest that activity in occipitotemporal cortex regions reflecting higher-order visual processing may be a precursor to the NCC but not an NCC proper."

      It's possible to imagine a theory that would predict HGA could correlate with stimulus strength and repetition suppression, or that it would not correlate with prior exposure (e.g. prior exposure could impact response bias without affecting subjective visibility itself). The authors describe this exact ambiguity in interpretation later in the article (line 664), but in its current form, at least in line 270 (when the study is most extensively discussed), the manuscript heavily implies that HGA is not an NCC proper. This generates a false impression that intracranial recordings have conclusively determined that occipitotemporal HGA is not a pure NCC, which is certainly a premature conclusion.

      We agree that our interpretation of these studies (lines 265–271 of the previous version of the manuscript) was presented too definitively. We have modified the text (now lines 314-317) to soften this conclusion and align it with the more nuanced discussion later in the manuscript. Specifically, we now frame this as a "suggested dissociation" rather than a conclusive finding (line 730), and we explicitly acknowledge that alternative interpretations remain viable.

      Line 243: "Altogether, these early human intracranial studies indicate that early-latency visual processing steps, reflected in broadband and low gamma activity, occur irrespective of whether a stimulus is consciously perceived or not. They also identified a candidate NCC: later (>200 ms) activity in the occipitotemporal region responsible for higher-order visual processing."

      The authors claim in this section that later (>200ms) activity in occipitotemporal regions may be a candidate for an NCC. However, the Fisch et al. (2009) study they describe in support of this conclusion found that early (~150ms) activity could dissociate conscious and unconscious processing. This would suggest that it is early processing that lays claim to perceptual consciousness. The authors explicitly describe the Fisch et al results as showing evidence for early markers of consciousness (line 240: '...exhibited an early...response following recognized vs unrecognised stimuli.) Yet only a few lines later they use this to support the conclusion that a candidate NCC is 'later (>200ms) activity in the occipitotemporal region' (line 245). As such, I am not sure what conclusion the authors want me to make from these studies.

      This problem is repeated in lines 386-387: "Altogether, studies that investigated the cortical correlates of visual consciousness point to a role of neural responses starting ~250 ms after stimulus onset in the non-primary visual cortex and prefrontal cortex."

      This seems to be directly in conflict with the Fisch et al results, which show that correlates of consciousness can begin ~100ms earlier than the authors state in this passage.

      We thank the reviewer for pointing out this inconsistency. We agree that stating ">200 ms" conflicts with the findings of Fisch et al. (2009), who observed dissociations as early as ~150 ms. Our goal was to contrast the very early, stimulus-driven responses with the later responses that reflect consciousness. However, as the reviewer correctly notes, the exact "onset" of these signals varies across studies and paradigms. To address this, we have removed the specific ">200 ms" mentioned in line 245 of the previous version of the manuscript and updated the timing in line 284 to "starting 150 ms" to better reflect the results of Fisch et al. We also clarify that while the exact latency depends on the paradigm, a consistent finding is that activity representing conscious contents in higher-order visual cortex follows an initial wave of unconscious processes (lines 809-810).

      (3) Justifying single-neuron cortical correlates of consciousness

      The purpose of the present manuscript is to highlight why and how intracortical measures of neural activity can help reveal the neural correlates of perceptual consciousness. As such, in the section 'Single-neuron cortical correlates of perceptual consciousness', I think the paper is lacking an argument as to why single-neuron research is useful when searching for the NCC. Most theories of consciousness are based around circuit or system-level analyses (e.g., global ignition, recurrent feedback, prefrontal indexing, etc.) and usually do not make predictions about single cells. Without any elaboration or argument as to why single-cell research is necessary for a science of consciousness, the research described in this section, although excellent and valuable in its own right, seems out of place in the broader discussion of NCCs. A particularly strong interpretation here could be that intracranial recordings mislead researchers into studying single cells simply because it is the finest level of analysis, rather than because it offers helpful insight into the NCCs.

      It is true that many prominent theories of consciousness were developed based on macroscopic observations, largely due to the prevalence of non-invasive recordings in humans. However, we argue that recording single-unit activity is important for several reasons, and we made this clearer in the revised version. First, signals like fMRI, EEG (or even LFP) often conflate multiple distinct neural populations. SUA allows us to dissociate neurons representing the percept from neighboring neurons involved in task-related confounds (e.g., motor preparation or arousal) that would otherwise be blurred together. Therefore, some percepts might be represented by sparse coding involving a small, specific population of "concept" or "percept" cells. Electrophysiological studies in animal models reveal that various cognitive processes are encoded within neuronal subspaces that only emerge when single-unit activity is analyzed as lower-dimensional projections of the broader neural activity manifold (Mante et al., 2013; Ebitz & Hayden, 2021; Jayazeri & Afraz, 2017). Importantly, many neural computations are only discernible through the lens of population dynamics (i.e. with single neuron activity) (Vyas et al., 2021). We believe that providing high granularity through SUA recordings prevents over-aggregation of data, ensuring that even system-level theories can build on biologically accurate foundations.

      Moreover, some theories are defined at the cellular level. For instance, the Dendritic Integration Theory (Bachmann et al., 2020) posits that the integration of feedforward and feedback signals occurs at the level of individual pyramidal neurons. Without SUA, these cellular mechanisms remain untestable. Beyond spatial granularity, SUA also provides excellent temporal granularity, which is crucial for testing theories that rely on the precise timing of spikes (e.g., neural synchrony). As LFPs reflect average activity across populations, only SUA can confirm whether individual neurons lock their spikes to a specific phase, a mechanism hypothesized to bind features into a conscious whole.

      We added these points to a new section in the revised manuscript. References:

      Bachmann, T., Suzuki, M., & Aru, J. (2020). Dendritic integration theory: A thalamo-cortical theory of state and content of consciousness. Philosophy and the Mind Sciences, 1(II).

      Ebitz, R. B., & Hayden, B. Y. (2021). The population doctrine in cognitive neuroscience. Neuron, 109(19), 3055-3068.

      Jazayeri, M., & Afraz, A. (2017). Navigating the neural space in search of the neural code. Neuron, 93(5), 1003-1014.

      Mante, V., Sussillo, D., Shenoy, K. V., & Newsome, W. T. (2013). Context-dependent computation by recurrent dynamics in prefrontal cortex. nature, 503(7474), 78-84.

      Vyas, S., Golub, M. D., Sussillo, D., & Shenoy, K. V. (2020). Computation Through Neural Population Dynamics. Annual Review of Neuroscience, 43(1), 249-275.

      (4) No mention of combined fMRI-EEG research

      A minor point, but I was surprised that the authors did not mention any combined fMRI-EEG research when they were discussing the limits of noninvasive recordings. Intracortical recordings are one way to surpass the spatial and temporal resolution limits of M/EEG and fMRI respectively, but studies that combine fMRI and EEG are also an alternative means to solve this problem: by combining the spatial resolution of fMRI with the temporal resolution of EEG, researchers can - in theory - compare when and where certain activity patterns (be they univariate ERPs or multivariate patterns) arise. The authors do cite one paper (Dellert et al., 2021 JNeuro) that used this kind of setup, but they discuss it only with respect to the task and ignore the recording method. The argument for using intracranial recordings is weaker for not mentioning a viable, noninvasive alternative that resolves the same issues.

      We thank the reviewer for this point. We have added a discussion of fMRI-EEG to the "Limits of noninvasive measures" section (lines 167-171). While we acknowledge that fMRI-EEG is a powerful non-invasive tool for bridging spatial and temporal scales, we note that it relies on merging an indirect metabolic signal with a weak electrophysiological one filtered by the skull, which is computationally complex and often noisy. In contrast, intracranial recordings provide direct measures of both local field potentials and spiking activity within the same neural population, offering interpretability and signal-to-noise ratio that non-invasive combinations cannot match. In our view, this is not just an alternative to these methods, but a unique means of accessing the underlying neuronal ground truth.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors review the study of the neural correlates of consciousness (NCCs). They discuss several of the difficulties that researchers must face when studying NCCs, and argue that several of these difficulties can be alleviated by using intracranial recordings in humans.

      They describe what constitutes an NCC, and the difficulties to distinguish between an NCC proper from the prerequisites and consequences of conscious processing.

      They also describe the two main types of experimental designs used to study NCCs. These are the contrastive approach (with its report and non-report variants), and the supraliminal approach, each with its own merits and pitfalls.

      They discuss the limitations of non-invasive methods, such as fMRI, EEG and MEG, as well as the limitations of the use of invasive recordings in non-human animals.

      After setting the stage in this way, the authors provide an extensive review of the knowledge acquired by using invasive recordings in humans. This included population-level measurements in vision and in other sensory modalities, as well as single-neuron level studies. The authors also discuss studies of subcortical NCCs.

      The second half of this work discusses the theoretical insights gained through the use of intracranial recordings, as well as their limitations, and a perspective for future work.

      Strengths:

      This work offers an impressive review, which will serve as a useful reference document, both for newcomers to the study of NCC and for experienced researchers. The inclusion of non-visual and subcortical NCCs is of particular merit, as these have been understudied.

      Besides serving as a review, this work includes a perspective, exploring several directions to pursue for the progress of the field.

      We thank the reviewer for acknowledging the strength of our work.

      Weaknesses:

      The intention of the authors is to argue how some of the problems faced when studying NCCs are alleviated by the use of intracranial recordings in humans. But in some cases, the link between the problems related to the study of NCCs and the advantages of intracranial recordings over non-invasive methods is not clear.

      For example, the authors explain the difficulties in distinguishing between true NCCs from their prerequisites and consequences. This constitutes a difficult conceptual problems that plague all recording techniques. The authors don't provide a convincing explanation of how intracranial recordings offer advantages over EEG or MEG when dealing with these problems.

      We agree that the distinction between proper NCCs and their prerequisites or consequences is a fundamental challenge that affects all recording modalities. We did not intend to imply that intracranial recordings are a "silver bullet" for solving this conceptual problem in isolation, and we now explicitly state that at the beginning of this section (line 101).

      We have revised the section on "Distinguishing NCCs from their prerequisites or consequences" to clarify that intracranial recordings are a powerful tool when used in conjunction with appropriate experimental designs, rather than a standalone solution to these conceptual difficulties.

      For example, the authors explain how the use of non-report designs to rule out post-perceptual processing relies on null results, which, according to them, are harder to interpret given the low resolution of non-invasive methods. But the interpretation of null results is actually more complicated in the case of intracranial recordings. As the coverage achieved by the electrodes is sparse, if a null result is attested, it remains possible that a true effect was present in a nearby patch of cortex out of coverage.

      It is true that a null result in an intracranial study may simply reflect that the relevant neural population was not sampled by the specific electrode implantation scheme. However, we argue that interpreting null results is equally, if not more, complicated in non-invasive methods, albeit for different reasons. While M/EEG offers broader coverage, it is blind to many cortical sources because of their orientation (radial sources in MEG) or their location in deep sulci and subcortical structures. The signal-to-noise ratio of M/EEG is also much lower than that of intracranial EEG, making it more likely that null results obscure the existence of subtle effects (Parvizi & Kastner, 2018).

      To address this, we revised the manuscript to clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We now explicitly emphasize that drawing conclusions from null results based on intracranial recordings requires caution regarding electrode placement. We also point out that these approaches are complementary: M/EEG can identify large regions of interest, while sEEG can then provide high-resolution "ground truth" to confirm whether those regions are part of the NCC.

      Reference: Parvizi, J., & Kastner, S. (2018). Promises and limitations of human intracranial electroencephalography. Nature Neuroscience, 21(4), 474-483. https://doi.org/10.1038/s41593-018-0108-2

      The authors argue that the spatial resolution of intracranial recordings is better than that of EEG and MEG. While this is technically true (especially compared to EEG), the true spatial scale of the NCCs is unknown. If NCCs' span is in the mm range, then the additional spatial resolution of intracranial recordings might not be an advantage.

      We agree with the reviewer that the exact spatial scale of the NCC remains a topic of ongoing debate. However, we believe that the advantage of intracranial recordings holds true whether the NCC spans millimeters or centimeters. The main spatial limitation of non-invasive electrophysiology (M/EEG) is not just its spatial resolution but also the inverse problem. Since scalp sensors detect a mixture of signals from across the brain, different cortical configurations can produce identical scalp patterns. This makes it challenging to precisely locate the NCC or distinguish it from nearby activity (e.g., motor or attentional signals). When recording intracortically, a widespread NCC could be captured across multiple adjacent channels with high accuracy. Conversely, if the NCC is focal, it can be isolated with high spatial resolution. In either case, intracranial recordings eliminate the spatial ambiguity inherent in scalp recordings. We have revised the Introduction (lines 158-164) to clarify that the "spatial advantage" of intracranial recordings also pertains to the inverse problem, not merely to the ability to record from smaller cortical areas.

      Another factor that should be taken into consideration when assessing the spatial resolution of intracranial recordings is that while the listening zone of individual intracranial contacts is small, coverage is sparse and defined by clinical criteria (something that the authors discuss). In practice, the activity recorded by contacts is usually attributed to anatomically defined ROIs with a scale in the cm range. Given the sparse and uneven (across regions and patients) coverage afforded by intracranial recordings, the advantage of intracranial recordings in terms of spatial resolution is overstated.

      We thank the reviewer for raising this point regarding how intracranial data is often aggregated into regions of interest. We agree that if researchers generalize findings to large anatomical regions without accounting for single-channel recordings, some of the spatial benefits of intracranial recordings are indeed mitigated. We toned down some of the original claims accordingly, and acknowledged more clearly that clinical constraints of sEEG lead to sparse coverage (245-249).

      However, we maintain that even when using an ROI-based approach, intracranial recordings offer a clear advantage over non-invasive methods, in that they represent a direct measure from a specific patch of tissue, rather than a statistical estimate that may be contaminated by "leakage" from distant sources. To address the reviewer’s concern, we have updated the manuscript (lines 244-245) to emphasize the importance of relying on MNI coordinates and individual anatomy rather than solely on broad ROI labels.

      Appraisal of whether the authors achieved their aims:

      In this work, the authors have gathered an impressive review and have discussed several important problems in the field of study of NCCs, as well as provided a perspective on how the field could move forward.

      What is less clear is how the use of intracranial recordings per se holds potential to overcome problems such as the distinction between true NCCs and the prerequisites and consequences of conscious processing.

      Discussion of the likely impact of the work on the field:

      This work has the potential of becoming a must-read for anyone working in the field of consciousness research.

      Reviewer #3 (Public review):

      Summary:

      This narrative review provides a clear, well-structured, and comprehensive synthesis of intracerebral recording work on the neural correlates of consciousness. It is written in an accessible manner that will be useful to a broad community of researchers, from those new to iEEG to specialists in the field.

      Strengths:

      The manuscript successfully integrates methodological and theoretical perspectives and offers a balanced overview of current, sometimes contradicting evidence. As such, the manuscript is important as it calls for a concerted and better exploration of NCCs using iEEG in the future.

      We thank the reviewer for stating the importance of our work and its potential contribution to the field.

      Weaknesses:

      The manuscript extensively discusses the use of "report" as a criterion for identifying conscious perception and its limitations for separating between correlates of consciousness and post-consciousness processes, yet the term is not defined at the outset. The authors should specify what they mean by "report" (e.g., verbal report, nonverbal self-report, or any meta-cognitive indication of experience). Importantly, this definition should be explicitly linked to the theoretical landscape: whether the authors adopt an access-consciousness perspective in which (self) reportability is central, or whether the review also aims to address phenomenal consciousness. Making this conceptual grounding explicit at the beginning will help readers interpret the empirical work surveyed throughout the review.

      We agree that a clear definition of report is essential for the reader to interpret the empirical findings presented. We have added a definition to the Introduction (lines 108-111), specifying that we use "report" to refer to any explicit behavioral response (whether verbal, manual, or otherwise) that communicates a subject’s subjective state.

      Regarding the conceptual distinction between Phenomenal and Access consciousness, we refer to recent work from some of the co-authors (Mudrik et al., 2025), which suggests that P and A should not be seen as two types of consciousness, but rather as two necessary conditions for conscious experience. While a full discussion of this distinction is beyond the scope of this review, we now clearly state that our focus is on identifying neural activity that reflects the subjective experience itself, regardless of the downstream requirements of report.

      Reference: Mudrik, L., Faivre, N., Pitts, M., & Schurger, A. (2025). On a confusion about there being two types of consciousness. Trends in Cognitive Sciences. https://doi.org/10.1016/j.tics.2025.11.012

      In addition, the review would benefit from an earlier introduction of the distinction between states and contents of consciousness. This distinction becomes important in the later section on anaesthesia, sleep, and epileptic seizures, where the focus shifts from content-specific NCCs to alterations in global states. Presenting these definitions upfront and briefly explaining how states and contents interact would strengthen the coherence of the manuscript.

      We agree that clarifying the distinction between contents and levels of consciousness early on provides a stronger framework for the paper.

      We have added a brief clarification in the Introduction (lines 63-76): "It is also helpful to distinguish between levels of consciousness, defined as a global level of arousal or wakefulness (e.g., being awake vs. under anesthesia), and the contents of consciousness, defined as the specific subjective experiences one has while conscious (e.g., perceiving a visual stimulus; Bayne et al., 2016; Laureys, 2005). While the majority of this review focuses on 'content-specific' NCCs, the two dimensions are intrinsically linked, as global states typically set the conditions for the occurrence of specific conscious contents."

      Overall, this is an excellent and timely review. With clearer initial theoretical definitions of consciousness, the manuscript will offer an even stronger conceptual framework for interpreting intracerebral studies of consciousness.

      We thank the reviewer again for this highly positive assessment of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I would like to reiterate that I believe this is a very scholarly piece of writing, and I congratulate the authors on producing such a useful and timely manuscript. Below, I suggest just a few ways the authors may resolve some of the issues I raised in the public review. However, I would like to emphasise that these are merely suggestions - the authors may think of different and better ways to address these comments that are more in line with either their thinking or writing style, and I would certainly encourage the authors to follow their own preferences if they feel they are at odds with my suggestions.

      For the longer comment questioning whether intracranial recordings are really a way to isolate NCCs from their pre- and post-processing, there are two ways the authors could resolve this. One is that they collapse the section distinguishing NCCs from their prerequisites and consequences into the previous section regarding limits of noninvasive measures. For instance, they could make the point that null results are easier to interpret with intracranial recordings in this previous section. Then they could discuss how specific intracranial studies have been able to resolve questions of pre-/post- processing confounds when they introduce studies later in the manuscript. At the moment, the Distinguishing NCCs from their prerequisites and consequences section, at least to me, undermines the argument of why intracranial recordings are important because it spends too much time describing how tasks are the core component of isolating pure NCCs, and not the recording method.

      Alternatively, the authors could keep the structure as it is. In this case, I would urge the authors to emphasise the role of intracortical recordings here and to make the argument that this is a problem that intracortical recordings (rather than novel tasks) can solve more convincingly. Citing specific studies that combined intracortical recordings with no-report paradigms and emphasising how the invasive recording allowed the researchers to reach a conclusion that would not have been possible with noninvasive measures would also be helpful.

      We thank the reviewer for these useful suggestions and agree that we would not want readers to take from this paper that design issues can be fixed by using invasive recordings. Because confounding issues are crucial in research on the NCC, we believe it is important to include a section on this topic in the Introduction. However, as we explained in our response to the public review, we revised the section introducing Human intracranial electrophysiology to reflect that intracranial recordings are a complementary tool that improves the interpretability of no-report paradigms, rather than a “silver bullet” solution for confound issues. We also explicitly say now that this problem is relevant to all techniques in the study of consciousness, including intracranial recordings (line 101). Additionally, based on the reviewer’s suggestion, we have added a more detailed explanation of how studies that pair intracranial recordings with no-report paradigms provide a unique insight in the Temporal Insights section (lines 822-823).

      For my comment: Drawing misleading conclusions from certain studies, I think the public review speaks for itself. I would recommend that the authors make sure they are drawing correct conclusions from the studies they cite, and make clear from the outset where there is ambiguity in interpretation.

      We thank the reviewer for bringing these ambiguities to our attention. As explained in the response to the public review, we have modified the text accordingly.

      Finally, with regard to the single-cell analyses, I would imagine that most readers will share at least some scepticism around single neurons being the appropriate level of analysis for revealing the basis of perceptual experience. As such, I think it would strengthen the manuscript greatly if the authors could provide a brief argument as to how such work can either inform theories of consciousness or contribute more generally to the study of NCCs, given that the field and its theories are mostly biased towards studying system-level neural processes. I think single-cell analyses are extremely valuable to NCC research, and the authors have a good opportunity to frame these studies accordingly.

      We agree. As detailed in the response to the public review, we now specify (1) how a higher level of granularity in electrophysiological measurements can distinguish between awareness-related signals and confounds, (2) that these measurements provide an opportunity to study neuronal population dynamics where various cognitive processes have been shown to emerge in animals and (3) that single-neuron measurements are necessary to test predictions of theories that are defined at the cellular level

      Reviewer #2 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      My compliments for having written an impressive review. Overall, I think that this is a beautiful piece of work that will be of great use to the community. My only concern is that the advantages of intracranial recordings over non-invasive methods in solving the difficulties faced in the study of NCCs are overstated.

      Here I provide more precise comments for your consideration.

      (1) On page 5, lines 100 to 102, you argue that "Scalp EEG and MEG have limitedanatomical resolution due to the overlap of deep and superficial brain signals at the scalp level and, in the case of EEG, the scattering of the adjacent electrical signals through the scalp". It would be good to provide precise estimates of the spatial resolutions of EEG, MEG and intracranial recordings, with accompanying references. Consider also that MEG is relatively insensitive to deep sources. I recommend this paper: Piastra et al. 2020 https://onlinelibrary.wiley.com/doi/10.1002/hbm.25272

      We thank the reviewer once again for their positive evaluation of our work. As detailed in the response to the public reviews, we now clarify that intracranial recordings provide high local certainty within the sampled regions (lines 224-227), whereas non-invasive methods provide broader coverage (lines 247-249). We thank the reviewer for their additional suggestions and have clarified our concern about the anatomical conclusions that can be drawn from scalp EEG and MEG data (lines 158-164).

      (2) On page 11, you describe work showing that activity in the occipitotemporal cortex mightreflect a precursor to consciousness, but not an NCC proper, except for the case of faces, in which the fusiform seems to behave like a true NCC. Could you discuss how these seemingly contradictory results could be reconciled?

      One possibility is that activity in some parts of the occipitotemporal cortex instantiates content-specific NCCs, i.e., correlates that are only specific to certain stimulus types (in this case: faces), while activity in other parts instantiates precursors of the NCCs. Because faces have been extensively studied, we might have uncovered the content-specific NCCs for these stimuli but not for others. This is now discussed in the text on lines 342-344. Based on reviewer 1’s suggestion, we have also toned down our claim about occipitotemporal activity being a precursor to the NCC.

      (3) From line 322, you start to discuss connectivity analyses. Adding a subheading mightimprove readability.

      We appreciate the suggestion; however, adding a subheading to a single paragraph would require restructuring the entire section, which could disrupt the flow. We believe the current format maintains clarity and cohesion.

      (4) In line 329, you write "It remains unclear to what extent these connectivity patterns reflectpost-perceptual processing and how the signals associated with perceptual consciousness in the occipitotemporal cortex interact with frontoparietal regions." But it's not clear why this is the case.

      We meant to make two separate points: (1) these studies did not control for report-related activity using no-report paradigms and (2) there has been no investigation so far of the interaction between occipitotemporal and frontoparietal signals associated with perceptual consciousness. These two points have been clarified in the text (lines 378-381).

      (5) In line 692, it would be good to clarify that Pereira 2021 is a single-neuron study.

      This has been clarified in the text.

      (6) The phrase "more research/work is needed" is repeated several times.

      Thank you for pointing this out. To avoid redundancy, we have deleted the second mention of this phrase.

    1. going full ai engineer, not touching code anymore
      • Shift in Role and Passion: The author has stopped writing manual code entirely after nearly two decades as a developer. They realized the actual enjoyment came from software design, architecture, and problem-solving, rather than the mechanical overhead of typing out code.
      • The "Toll" of Typing: Writing boilerplate code, null checks, imports, and repetitive logic is characterized as a "toll" paid to bring systemic ideas into reality. AI agents now handle this translation layer entirely.
      • New Core Responsibilities: The job has evolved into writing clear specifications, designing robust architectures, orchestrating multiple AI agents, and aggressively reviewing diffs to reject bad implementations.
      • The Importance of "Taste": Utilizing AI agents successfully requires profound technical taste. An engineer must understand what to insist on, detect fake test coverage, and identify load-bearing assumptions that are likely to fail.
      • Vibe-Coding Warning: Blindly relying on AI to write unread code into unverified systems results in fragile production software. Evaluating code is harder than producing it, meaning AI tools will make bad engineers worse and efficient engineers better.
      • Identity and Future Uncertainty: The author admits they would likely quit engineering altogether if forced to return to manual coding. However, they acknowledge unresolved questions regarding how this shift affects the training and hiring of junior engineers who won't build foundational muscle memory.

      Hacker News Discussion

      • The Skill Disconnect for Juniors: A dominant theme is how junior developers will gain the necessary "taste" and evaluation skills if they completely skip the grueling phase of writing and debugging code manually.
      • The Cognitive Load of Code Review: Many commenters argue that reading, auditing, and maintaining AI-generated code is mentally exhausting. They note that debugging subtle, hallucinated logic errors written by an agent is often more difficult than writing the logic from scratch.
      • Loss of Mastery and Dependency: Users express concern over the degradation of raw coding skills. Becoming entirely reliant on a fluctuating AI tool stack risks leaving engineers stranded if the quality of the models regresses or changes.
      • Analogy to Higher-Level Languages: Several participants view this evolution as a natural continuation of computer science history, comparing the shift to moving from Assembly to C, or from C to Python, where engineers routinely surrendered low-level control for higher abstraction.
    1. Współdzielenie Skills i Agents między Codex i Claude Code
      • The Problem: Developers using multiple local AI terminal agents (such as Codex, Claude Code, or OpenCode) quickly face fragmentation when trying to manage custom skills, agent roles, and project-specific instructions. Files end up being scattered across varying default directories or duplicated manually across the user's home folders.
      • The Solution: A centralized directory architecture within the project repository that acts as a single source of truth (ai/), sharing identical configurations across different AI tools through local symbolic links (symlinks).
      • Directory Layout & "Source of Truth":
        • All active configuration files reside inside a single /ai folder, split into /ai/agents (who the model should be—e.g., Architect, Reviewer, Incident Commander) and /ai/skills (how the model performs tasks—e.g., API Review, Security Check, Frontend QA).
      • The Symlink Mechanism:
        • Instead of configuring generic home directories (~/.claude or ~/.codex), local tool-specific directories are generated inside the project (.agents/ for Codex and .claude/ for Claude Code).
        • Using terminal commands (like ln -sfn on macOS/Linux or New-Item -ItemType SymbolicLink on Windows PowerShell), symlinks are established to point both .agents/ and .claude/ folders to the exact same /ai sub-directories.
      • Key Advantages:
        • Centralization: Establishes a single, distinct source of truth for all AI interactions within the workspace.
        • Tool Compatibility: Seamlessly supplies the exact same data to different AI agents without manual file copying.
        • Team Portability & Version Control: Because Git natively tracks symbolic links, the entire team receives the exact same AI tooling, workflows, and prompts directly upon cloning the repository.
    1. eLife Assessment

      This study offers an important advance by extending an intuitive visualization tool that enables assessment of how dendritic and synaptic currents potentially shape neuronal output. The evidence supporting the tool's capabilities is convincing, with well-documented code, algorithmic innovation, and application to hippocampal pyramidal neurons. The work will be of interest to computational and systems neuroscientists seeking accessible methods to examine dendritic computations.

    2. Reviewer #1 (Public review):

      Summary

      Fogel & Ujfalussy report an extension of a visualization tool that was originally designed to enable an understanding of detailed biophysical neuron models. Named "extended currentscape", this new iteration enables visual assessment of individual currents across a neuron's spatially extended dendritic arbor with simultaneous readout of somatic currents and voltage. The overall aim was to permit a visually intuitive understanding for how a model neuron's inputs determine its output. This goal was worthwhile and the authors achieved it. Demonstrating the utility of extended currentscape, the authors leverage their models to generate interesting and detailed biophysical insights into widely studied neurophysiological phenomena with clear behavioral relevance. Overall, this study provides a valuable and well-characterized biophysical modeling resource to the neuroscience community.

      Strengths

      The authors significantly extended a previously published open-source biophysical modeling tool. Beyond providing important new capabilities, the potential impact of extended currentscape is boosted by its integration with preexisting resources in the field.

      In keeping with the authors' goal to provide an approachable platform with intuitive visualizations of how current flows through neurons, the manuscript is approachable to non-computationalists. In particular, a dedicated glossary and elegant illustrations in Figure 2 boost accessibility for biologists.

      Extended currentscape produces intriguing and detailed predictions spanning neurophysiological phenomena such as local dendritic spikes, complex spike generation, and feature selectivity (hippocampal place fields). By triggering analysis of modeled synaptic inputs on these events, the authors trace their origins from dendritic integration to synaptic input patterns.

      The authors cleverly apply a graph theoretical approach to efficiently model bidirectional current flow throughout a neuron's dendritic arbor. As a result, extended currentscape can run on a standard personal computer.

      The code is well-documented and freely available via GitHub.

      Weaknesses

      While extended currentscape meets its objective of modeling and illustrating the propagation of axial currents throughout a model neuron in great detail, it requires simulation and measurement of synaptic input currents. For this reason, there currently exists a very high technical barrier to conclusively test its intriguing predictions: simultaneous readout of synaptic inputs throughout a neuron's dendritic arbor. Mitigating this weakness, the authors propose a relatively more feasible alternative approach in Discussion: simultaneous voltage imaging of dendrites and their soma while estimating synaptic inputs from the distributions of voltage dynamics along individual dendritic branches.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fogel & Ujfalussy report an extension of a visualization tool that was originally designed to enable an understanding of detailed biophysical neuron models. Named "extended currentscape", this new iteration enables visual assessment of individual currents across a neuron's spatially extended dendritic arbor with simultaneous readout of somatic currents and voltage. The overall aim was to permit a visually intuitive understanding for how a model neuron's inputs determine its output. This goal was worthwhile and the authors achieved it. Their manuscript makes two additional contributions of note: (1) a clever algorithmic approach to model the axial propagation of ionic currents (recursively traversing acyclic graph subsections) and (2) interesting, albeit not easily testable, insights into important neurophysiological phenomena such as complex spike generation and place field dynamics. Overall, this study provides a valuable and well-characterized biophysical modeling resource to the neuroscience community.

      Strengths:

      The authors significantly extended a previously published open-source biophysical modeling tool. Beyond providing important new capabilities, the potential impact of "extended currentscape" is boosted by its integration with preexisting resources in the field.

      The code is well-documented and freely available via GitHub.

      The author's clever portioning algorithm to relate dendritic/synaptic currents to somatic yielded multiple intriguing observations regarding when and why CA1 pyramidal neurons fire complex spikes versus single action potentials. This topic carries major implications for how the hippocampus represents and stores information about an animal's environment.

      Weaknesses:

      While extended currentscape is clearly a valuable contribution to the neuroscience community, this reviewer would argue that it is framed in a way that oversells its capabilities. The Abstract, Introduction, Results, and Methods all contain phrases implying that extended currentscape infers dendritic/synaptic currents contributing to somatic output., i.e. backwards inference of unknown inputs from a known output. This is not the case; inputs are simulated and then propagated through the model neuron using a clever partitioning algorithm that essentially traverses a biologically undirected graph structure by treating it like a time series of tiny directed graphs. This is an impressive solution, but it does not infer a neuron's input structure.

      We are sorry if our text could be interpreted as if we were inferring unobserved inputs from the known outputs. This was not intentional and we were unaware of the possibility of such interpretation.

      In fact, at the beginning of the Results, we started the description of the extended currentscape method by explicitly stating that we need to measure the input currents: “Our method … requires measuring the membrane and axial currents throughout the dendritic tree of a neuron (in every node of the circuit)”.

      To further clarify that our method starts with measuring the input currents, we made this information explicit already in the abstract (“Our approach relies on the iterative decomposition of the axial current flowing between neighbouring compartments in proportion to the underlying membrane currents measured in the model.”), and in the Introduction (“Even if the membrane currents are known, studying the impact of particular ion channels on the neuronal response in such a dynamical system under in vivo conditions is hindered by two major obstacles”). We also rewrote several parts of the text to remove any phrases that could imply the inference of the inputs (line 568). We believe that after clarifying this at the beginning of the paper, the readers will not misinterpret our descriptions later in the text.

      Because a directed acyclic graph architecture is shown in Figure 2, it is unintuitive that the authors can infer bidirectional current flow, e.g. Figure 3 showing current flowing from basal dendrites and axon to soma, and further towards the apical dendrites. This is explained in Methods, but difficult to parse from Results amidst lots of rather abstract jargon (target, reference, collision, compartment). Figure 2 would have presented an opportunity to clearly illustrate the author's portioning algorithm by (1) rooting it in the exact morphology of one of their multicompartmental model neurons and (2) illustrating that "target" and "reference" have arbitrary morphological meanings; they describe the direction of current flow which is reevaluated at each time step.

      We thank for this comment. We agree that the concepts introduced here to explain our method are rather abstract and could be difficult to understand. To help the reader we followed the instructions of Reviewer and redesigned Fig. 2 to provide a step by step explanation of the extended currentscape method. In particular,

      We used a simpler model where the structure of the graph can be directly related to the morphology of the model.

      We show that the target node can connect multiple subtrees with axial currents flowing in different directions. We explain that in this case the inward and the outward subtrees are pruned and partitioned separately.

      We provide a glossary in Table 1 to ensure that the readers can follow our description and do not get lost amidst lots of rather abstract jargon.

      We also clarified that although the target compartment is chosen arbitrarily by the user, it remains the same for all time points throughout the analysis.

      Analyses in Figure 7, C and D, are insightfully devised and illuminating. However, they could use some reconciliation with Figure 5 regarding initiation of individual APs versus CSBs within place fields.

      We thank the reviewer for the positive comments and also for pointing out the potential source of misunderstanding. We slightly changed the text at Fig 5 to emphasize that this is a single example trial, and we added the following sentence to the paragraph describing Fig 7CD: “Consequently, the somatic current dynamics before the iAP and the CSB presented in Fig 5Cc-Dd can be regarded as illustrative samples from a broad distribution, but the differences observed between them are not representative.}”

      The intriguing observations generated by extended currentscape also point to its main weakness, which the authors openly acknowledge: as of now, no experimental methods exist to conclusively tests its predictions.

      We agree with the Reviewer that not being able to apply our extended currentscape method to reveal the current types driving real neurons recorded in vivo is currently a weakness of our approach. However, we would like to emphasize that it may be feasible to use it to estimate the spatial distribution of the membrane currents driving the cell based on in vivo voltage imaging data, as we briefly outline in the discussion.

      Reviewer #2 (Public review):

      Summary

      The electrical activity of neurons and neuronal circuits is dictated by the concerted activity of multiple ionic currents. Because directly investigating these currents experimentally isn't possible with current methods, researchers rely on biophysical models to develop hypotheses and intuitions about their dynamics. Models of neural activity produce large amounts of data that is hard to visualize and interpret. The currentscape technique helps visualize the contributions of currents to membrane potential activity, but it's limited to model neurons without spatial properties. The extended currentscape technique overcomes this limitation by tracking the contributions of the different currents from distant locations. This extension allows tracking not only the types of currents that contribute to the activity in a given location, but also visualizing the spatial region where the currents originate. The method is applied to study the initiation of complex spike bursts in a model hippocampal place cell.

      Strengths.
>

      The visualization method introduced in this work represents a significant improvement over the original currentscape technique. The extended currentscape method enables investigation of the contributions of currents in spatially extended models of neurons and circuits. 
>

      Weaknesses.

      The case study is interesting and highlights the usefulness of the visualization method. A simpler case study may have been sufficient to exemplify the method, while also allowing readers to compare the visualizations against their own intuitions of how currents should flow in a simpler setting. 
>

      We thank the reviewer for this comment. In fact we had been also considering to include a simpler case study to illustrate the extended currentscape method in the original submission. In accordance with the comments from Reviewer 1, we now use a simple model to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Model complexity vs. intuition/validation. The case study relies on a very complex CA1 model, making it difficult to build intuition about current flow and to validate the visualization. Inclusion of a simpler benchmark (e.g., soma plus a dendrite with two branches, fewer compartments) is recommended to demonstrate how the extended currentscape behaves in a more tractable setting.

      Inspired by the suggestions of the Reviewers, we modified Figure 2 and now first use a simple model with a soma and a dendrite with two branches to introduce the concepts of our analysis. We start with a few examples where the reader can compare the results with their own intuition in simpler cases.

      (2) Rationale and citations for input structure. The in vivo-like input design (untuned inhibition; 12 co-tuned excitatory clusters with large conductances; the goal of generating place fields) would benefit from a more explicit rationale and substantially more literature support. Alternative plausible scenarios (e.g., distributed co-tuned inputs and homosynaptic plasticity) should be articulated, and choices situated within the experimental literature on CA1 excitation/inhibition, including tuning and anti-tuning results.

      We extended the paragraph in the Results describing the input structure and added the most important references there. We added further references to the Methods section where we argue that “Reliable place cell tuning can be achieved by functional synaptic clustering without increased excitatory drive in the place field (Ujfalussy and Makara 2020) or via strong excitatory drive without input clustering (Grienberger et al., 2017, Ujfalussy and Makara, 2020). However, experimental data indicates that both of these mechanisms are present and contribute to the activity of place cells (Adoff et al., 2021,Tasciotti et al., 2025)” and “although interneurons can display spatial tuning, they typically have a broad tuning with low selectivity (Ego-Stengel et al., 2007, Dupret et al., 2013, Geiller et al., 2020). A weak disinhibition within the place field can also contribute to the selective firing of place cells (Geiller et al., 2022, Valero et al., 2022), this was not necessary for place cell activity in novel environments (Geiller et al., 2022) and the overall inhibitory input to place cells is largely untuned (Grienberger et al., 2017).”

      (3) Scope of PCA-based claims. The interpretations derived from the PCA analysis appear broader than warranted, given subcellular heterogeneity and the dominance of somatic action potential variance. These claims should be tempered with more explicit statements about what PCA can and cannot resolve in this context.

      We thank the Reviewer for the opportunity and encouragement to clarify this part of the text. We agree with the Editor and the Reviewers that the results of the PCA analysis can not be used to support claims regarding the presence or the absence of independent dendritic events. In fact, we aimed to use it as an illustration that global activity tends to dominate PCA analysis even when the “neuron is mainly driven by strong, functionally clustered synaptic inputs to a few dendritic branches”. We acknowledge that we did not formulate this point clearly in the original submission. Therefore we substantially rewrote this part of the Results and performed additional analysis to clarify that there is a substantial amount of soma-independent dendritic activity in our model that remains invisible for a PCA based analysis.

      Reviewer #1 (Recommendations for the authors):

      Major concerns:

      (1) Depolarization-inactivated K+ may be an important consideration to model burst-firing.

      Our current model includes 2 kinds of transient K+ channels that show inactivation after depolarization: a proximal and a distal type, as the original model in Jarsky et al., 2005. We now made this explicit in the main text (line 178).

      (2) Description of the in vivo-like model's excitatory and inhibitory input structure needs many more citations of biological studies to communicate rationale for the author's decisions, e.g. untuned inhibitory neurons, organization of a subset of excitatory inputs into 12 function synaptic clusters with co-tuned presynaptic neurons and outsized synaptic conductances. The goal is clearly to create CA1 pyramidal neurons with place fields, which would be helpful to state upfront. But additionally, (a) place fields could arise from homosynaptic potentiation of distributed co-tuned excitatory inputs (e.g., Bittner, et al. 2017 study describing BTSP made no assumptions) and (b) CA1 inhibitory interneurons can be spatially tuned (Ego-Stengel & Wilson, 2006; Wilent & Nitz, 2007; Geiller, et al. 2020) and even anti-tuned (Geiller, et al. 2021).

      We thank the Reviewer for pointing out the lack of appropriate references in this section. We made the following changes in the manuscript:

      (1) Stated explicitly that the goal was to create place cell activity.

      (2) Added references to the main text to justify our choices of the inputs (lines 234-241).

      (3) We included a longer rationale for the choice of synaptic clusters and the lack of inhibitory (anti-)tuning in the Methods section, describing the neuron model. In brief, Adoff et al., 2021 reported more clustering of excitatory inputs within the place field. In our model, the degree of clustering is somewhat larger than the clusters reported. Although inhibitory neurons can be tuned, their tuning is much weaker than that of place cells and seems to play only a minor role in the generation of place fields (Grienberger et al., 2017). The presence of inhibitory anti-tuning is controversial: although Geiller et al., 2021 reported weak (~10%) anti-tuning, they did not find it in novel environments, indicating that it is not needed for spatially selective activity (lines 628-646).

      (3) Interpretation of principal component-based analyses shown in Figure 4 could be toned down. As written in section "CSBs in the CA1 pyramidal neuron", it sounds like CA1 pyramidal neuron dendrites display minimal autonomous activity. However, PCA does not seem well-suited to address the heterogeneity of subcellular voltage dynamics over physiologically relevant timescales. Somatic action potentials, and their backpropagation/modulation of dendritic voltage, would of course explain a very large fraction of variance. However, if local dendritic events summate over fine timescales to initiate somatic firing, it is hard to imagine this important nuance being detected. On the other hand, it is hard to imagine single dendritic branches driving robust somatic firing except in the relatively extreme situation in which large numbers of synapses synchronously drive the same branch to initiate a local Ca2+ spike (Figure 3, A-C).

      We agree with the reviewer that PCA can not reveal the potential dendritic origin of somatic APs, and thus is not suitable to assess the role of local dendritic spikes in shaping the output of the cell. We wanted to highlight here that even in cells with excitable dendrites driven by strong, local input clusters, exhibiting frequent local dendritic spikes, the dendritic membrane potential dynamics will be dominated by global fluctuations with surprisingly little sign of local dynamics in the PCA components. As the reviewer also pointed out, this may not be surprising as local events either remain spatially restricted and thus contribute little to the overall variability of the dendritic Vm or they initiate somatic APs and will thus be counted as global events.

      To demonstrate the high propensity of local dendritic events, we analysed local Vm peaks in dendritic branches and found that ~7.6% of the peaks were not coupled to somatic APs.

      Although this number could seem low, we emphasize that most of the 92.4% of the dendritic peaks coupled to APs potentially reflect the backpropagation of the same somatic events to multiple dendritic sites. To confirm this, we performed an additional analysis measuring the spatial extent (number of branches involved) of the individual dendritic events. We found that 90% of the events remained local, restricted to a few dendritic branches, while 10% of the events were global, associated with BAPs and involving the majority of the dendritic tree. Interestingly, these global events dominate the PCA analysis and are responsible for >90% of the dendritic Vm peaks. These results are included in a new panel in Figure 4H.

      We conclude that, “this way, although only 10% of the dendritic Vm events were associated with bAPs, they were ~60-times larger than local events and they dominated the PCA analysis even in the presence of local regenerative dendritic events driven by strong, functionally clustered synaptic inputs.” We believe that this model and analysis could serve as an important benchmark for future experimental studies investigating the structure of membrane potential correlations in in vivo voltage imaging data (Lee et al., 2026).

      (4) One suggestion would be to display more data as shown in Figure 4F, with a longer X axis to clarify the temporal relationship between local dendritic spikes and the first somatic action potential.

      We added a few more examples including the CSBs presented in Fig8G-I as a new supplementary Figure S4. We also slightly extended the x-axis on this supplementary figure as the reviewer requested.

      If the models indicate that passively filtered EPSPs drive most somatic action potentials, as seems to be the case in Figure 5, then this would also be helpful to show as in Figure 4F.

      In Fig 5 we showed two examples of isolated APs. The first AP was indeed driven by passively filtered EPSPs. The second one was preceded and possibly caused by a dendritic spike, as highlighted by the black arrowhead labelled c in Fig. 5Cc. We further analysed the currents driving iAPs in Fig 7B and C, and found that there is considerable heterogeneity in the magnitude of the dendritic Na currents driving the soma before action potentials. Figure 8 and Figure S3 (now Fig. S5) show further examples for iAPs driven either by passively filtered EPSPs or dendritic spikes. We also included these examples in the new supplementary Figure S4.

      (5) Another suggestion would be to use one-hot vectors containing onset times of different event types, since this would divorce the amplitude/duration of events from their influence over total variance.

      In this paper our goal was to illustrate the ability of the extended currentscape method to reveal the origin of the axial currents driving neuronal activity. In Fig. 4, our primary intention was to characterize the membrane potential response of the model in a way that is easily comparable with experimental data. To further quantify the frequency of local events, we added a new panel showing the spatial extent of dendritic events (Fig. 4H). To make our model more comparable with recent publications, we also calculated two additional metrics used to evaluate the relationship between somatic and dendritic activity (Fig 4I-J). We hope that these additional analyses help the reader to characterize the prevalence and impact of local dendritic events on somatic activity.

      (6) From section "Input conditions for complex spike burst generation", paragraph 2: "Note that synapse density, the ion channel mechanisms and the input statistics are identical for tuft and oblique branches,...". The authors should justify this parameterization given the numerous known differences between tuft and oblique branches in both of these regards and acknowledge accompanying interpretational caveats.

      We agree with the reviewer that experimental data demonstrated several significant differences between the tuft and oblique branches regarding both the inputs they receive and the way they process it. However, in the present paper we chose not to include these differences for several reasons:

      Here we aimed to focus on the abilities of the dendritic currentscape methods and use CSBs as a case study to illustrate how dendritic currentscape can reveal the membrane currents underlying complex neuronal responses.

      Currently there is no CA1PN model that would be able to reproduce all data regarding tuft and oblique integration and would be able to fire calcium spikes. We only wanted to make minimal modifications to the existing CA1PN model to make it capable of generating Ca-spikes and CSBs. We are currently working towards developing and extensively testing a new model, examining the role of these regional differences in CSB generation.

      Although there is information regarding input statistics and dendritic physiology in the literature, many of the relevant parameters are underconstrained. We wanted to avoid overfitting by keeping the model simple.

      By maintaining identical inputs and ion channel distribution we can distinctly highlight the special role of tuft morphology in CSB generation. Altering the inputs or the ion channel density for the tuft would make the interpretation more ambiguous, and elucidating the specific role of the different factors in CSB generation is the subject of future investigations.

      In sum, although we acknowledge that our model does not reflect the full complexity of CA1 PNs and its inputs, we regard this simplicity as a useful feature of the model. We added a section discussing potential future extensions of the model and highlighting interpretational caveats in the discussion (lines 482-490).

      (7) Given the debate in the field regarding the level of functional autonomy present in dendrites, the authors' finding that dendritic voltage largely tracks that of the soma (though see concern above re: PCA), and their access to specific currents, the authors have an important opportunity investigate the divergence between Ca2+ and voltage sensors as reporters of dendritic activity.

      For instance, why have some studies reported relatively common isolated dendritic Ca2+ transients in CA1 pyramidal neurons while other studies, including voltage imaging studies, have reported the opposite?

      We thank the Reviewer for the opportunity to highlight a few important points regarding functional autonomy of dendrites based on the analysis of our model. We would like to first note that only parallel calcium and voltage imaging studies will be able to ultimately resolve this debate. Nevertheless, below we briefly summarize our take on this issue.

      (1) In general, most Ca2+ imaging studies found that soma-independent dendritic events are rare. "Isolated dendritic transients (no coincident somatic event; see fig. S6, C and D, for example) were overall rare. Isolated apical dendritic Ca2+ transients, which have not previously been reported in CA1PNs, were larger and more frequent than those observed in basal dendrites." (O’Hare et al., 2022). "Activity in the ... basal dendrites ... along the track but outside of the place field was rarely observed” (Sheffield and Dombeck, 2014) and “overall, isolated dendritic transients were similar in size but occurred far less frequently than coincident dendrite-soma transients”, or “data indicate that spatially reliable dendritic firing was almost exclusively yoked to somatic tuning, likely reflecting strong backpropagation of burst firing during traversals of the somatic PF” (Rolotti et al., 2022). Consistent with this observation, a dendritic Vm peak chosen randomly from any branch has ~93% probability to be related to a bAP in our model. However, it is also true that ~90% of events in the model are local events, simply because isolated events involve ~60-times fewer branches (1.8 on average) than events associated with bAPs (114 branches) in the model. If the spatial extent of typical local events are also similarly small in real neurons as in the model, then even rare occurrences of dendritic events may reveal substantial dendritic independence. We added a section quantifying the functional autonomy of dendrites in the model in the main text, around Fig 4H.

      (2) Ca2+ indicators are slower and nonlinear and thus they are somewhat unreliable reporters of dendritic voltage events, especially in distal dendrites (Wu et al., 2026; Gonzalez et al., 2026). To illustrate this, we calculated three metrics in our model that were also reported in recent dendritic Ca2+ imaging studies (Rolotti et al., 2022, Sheffield et al., 2014, 2017). First, we calculated the fraction of bAPs detected in a branch (called dendrite-soma coupling in Rolotti et al., 2022, see their Fig. 2C) as a function of the distance of the branch from the soma (our new Fig. 4I). In the Ca2+ imaging data, this was essentially constant ~30% between distances 5-100 µm from the soma. In contrast, the fraction of bAPs detected in the model was 100% in this range as bAPs propagation failures did not occur before µ100 µm. This is also consistent with a recent voltage imaging study showing that even low-transmission bAPs reliably propagate to the proximal dendrites (Lee et al., 2026, Fig 3G). The low and distance independent dendrite-soma coupling reported by Rolotti et al. can only be reconciled with the known biophysics of neurons if the recorded calcium signal is unreliable reporter of the underlying voltage. Indeed, it has been reported that Ca signals associated with bAPs can be absent in some dendritic branches (Landau et al., 2022) or that local, nonlinear Ca signals can appear in the absence of local regenerative voltage response (Weber et al., 2016, Tran-Van-Minh et al., 2016) and that the Ca signals are highly variable across cells (Eltes et al., 2019).

      Second, we calculated the fraction of local events as a function of the distance from the soma (our Fig 4J; see also Fig. 2F in Rolotti et al.). When averaged across all branches, this was somewhat lower in the model (18%) than in the data (38%) which, again, could be explained by the low reliability of detecting global voltage events in all compartments based on the calcium signal.

      Third, the range of branch-spike-prevalence (BSP) values in our model (0.5-0.9; Fig. 4H) seem consistent with that reported (0.4-0.8) at first (Fig 4C of Sheffield et al., 2014; Fig 2 of Sheffield et al., 2017). However, we note that there are several important differences: for technical reasons, Sheffield et al. reported BSP for place field traversals and not for individual events, and they measured Ca2+ dynamics in the basal dendrites. Since bAPs are almost always present in all basal dendrites in the model (basal BSP > 0.9 for all events with somatic spikes) and place field traversals were always accompanied by somatic APs, BSP for basal dendrites would be nearly 1 in the model. Thus, the lower BSP values reported by Sheffield et al. could be explained by the limited reliability of the Ca2+ indicators in reporting regenerative voltage events in neuronal processes.

      We briefly discussed these differences in the Discussion (lines 474-478).

      (3) Finally, to our knowledge, there are 3 relevant in vivo voltage imaging studies in CA1 PNs. Liao et al., 2024 found that in induced place cells the tuning of dendritic events (presumably local or back-propagating Na-spike) was similar to the somatic tuning, which is consistent with our model where dendritic activity and tuning is dominated by bAPs. However, they did not acquire simultaneous signals from the dendrites and the soma so they could not study the independence of the dendritic events. Lee et al. (2026) found that only 10% of the dendritic events are not associated with a somatic spike, which is lower than the number of independent events in the model. However, the events they found were generated in the distal apical trunk (their Fig 3D) and they could not record from the most distal branches where most of the isolated events were generated in our model. Gonzalez et al., 2026 measured voltage and calcium in selected locations within the dendritic tree, and could not reliably estimate the fraction of isolated events throughout the cell. (Gonzalez et al, 2024 measured voltage only in single spines and soma, but did not quantify independent dendritic events; Wong-Campos et al., 2023 measured dendritic integration and bAPs in L23 branches; Wu et al. 2026 recorded in CA2 neurons.)

      We added a paragraph in the discussion comparing the level of functional autonomy present in the model dendrites to recent Ca- and voltage-imaging studies (lines 467-474).

      Minor concerns:

      (1) Abstract:

      There is a need to explain what currentscape is - even at the cost of not invoking its name. To a reader not familiar with currentscape, the abstract is extremely difficult to understand.

      We reworded the title and the abstract to make them more accessible to readers not familiar with the term currentscape.

      (2) "Currentscape analysis of place field dynamics" section:

      It would be helpful to emphasize upfront that dendritic determinants of individual somatic APs versus CSBs will be discussed separately. Since somatic action potentials are discussed before CSBs, I found this section initially confusing as I attributed those findings to CSBs until reading the next paragraph.

      We added a sentence to clarify that we analysed subthreshold responses, APs and CSBs separately.

      (3) Bottom of p2 discussing mixed literature on what drives CSBs in CA1 PCs:

      Overall accurate and useful point, but an important nuance is glossed over which misportrays state of field. References ex vivo studies that fail to drive CSBs with somatic current injection and in vivo study successfully doing so. These aren't really conflicting results. In vivo current injection co-occurs with spontaneous synaptic input, which is high in CA1 and results in PCs that are significantly depolarized at rest relative to those in acute slices. Bittner 2017 ex vivo results are consistent with this: CSBs driven by Cs+-based internal solution to block K+ channels (partially, using strategy of purposefully high series resistance). Similar situation in vivo given that A-type K+ channels are inactivated by depol. Resulting increase in input resistance lowers input threshold to CSB. This is clarified in Results, p.5: "Under in vivo-like synaptic input conditions (see below and Methods), dendritic Ca2+-spikes could also be evoked by somatic current injection (Fig. S1E), as in Bittner et al. (2015).", which makes p. 2 feel especially awkward.

      We agree with the Reviewer that these are not necessarily conflicting results. We rephrased this section, emphasizing that the role of the different input pathways in the initiation of CSBs are not clear.

      (4) Abbreviating "pyramidal neuron" with PC is confusing:

      PC often means place cell. The authors could change this, such that PC refers to "pyramidal cell", or else use PN as an abbreviation. It is important to avoid confusion, especially because place cell dynamics feature prominently in the manuscript.

      Thanks for the suggestion. We replaced PC with PN throughout the manuscript.

      (5) Only apical dendritic parameters are described in section 2 of Results, but the full morphology is shown in Figure 3B with basal currents shown in panels C and F. Some clarification is needed - either what currents were considered for basal dendrites and why, or else why basal dendritic current parameters were not considered for this simulation using apical dendritic current injection but nonetheless examining basal dendritic currents.

      We clarified in the text that the original model contained a standard set of Na and K channels (line 178).

      (6) Clarify "i" and "s" in the Figure 3C legend - "intrinsic" and "synaptic" white letterings are small/hard to see in the bottom subpanels.

      We now spell out intrinsic and synaptic in the Figure and increased the contrast of the letterings.

      (7) Regarding the computational benefit of recursively decomposing axial currents along an adaptively truncated acyclic graph, it would be useful to (a) include a supplemental figure benchmarking this approach to standard approaches to quantify the described gain in computational efficiency and (b) describe computing hardware in the Methods.

      We included an estimated benefit of the pruning process (line 758) as well as the utilised computing hardware and the simulation times in the Methods (line 776).

      Reviewer #2 (Recommendations for the authors):

      The manuscript is in great shape, it is well organized, and the figures are gorgeous. I believe that the extended currentscape is a great extension of the original currentscape method. In particular, the possibility of partitioning currents by the spatial location of their sources is a great addition. 
>

      Recommendations:

      (1) The method is applied in the context of an interesting case study that highlights its usefulness. However, the model in the study is so complex that it is difficult to develop an intuition of how currents should be flowing, and this makes it hard to intuitively validate the visualization method. I think that applying the extended currentscape in a simpler model - maybe a soma with a dendrite with two branches, fewer compartments - would be instrumental in developing this intuition. 
>

      We now first use a simple model with a soma and a dendrite with two branches to introduce the concepts in Figure 2 and provide a few examples where the reader can compare the results with their own intuition in simpler cases. We also added the currentscape analysis of a standard, two-compartmental model from Pinsky and Rinzel, 1994 as Supplementary Figure 1.

      (2) I found a number of typos and minor stylistic details you may want to fix in a revised version of the manuscript.

      (a) Abstractine, line 12. I believe the word "recursive" is a bit technical at this point. It's meaning in this context becomes clear after ones goes through the details of the algorithm (Figure 2). 
>

      We replaced the word “recursive” with “iterative”. We hope that this will make the abstract clearer for the readers. In fact, we realized that the word iterative is a better description of the algorithm, so we replaced the “recursive” with “iterative” consistently throughout the manuscript.

      (b) Figure 1, caption."Since we included the capacitive current, the magnitude of the inward and the outward currents is identical (Kirchhoff's law)."This sentence can be confusing. If the inward and outward currents are the same, the membrane potential doesn't change. I believe that you are including the capacitive current in the inward (or outward) currents.

      Indeed, we included the capacitive current in the inward or outward currents. We changed the text to clarify this.

      (c) Lines 92-93. I do not fully understand this sentence. Are you making an assumption? What does 'continuos flow of axial current' mean?
>

      By ‘continuous flow of axial current’ we meant a spatially continuous stream of axial currents flowing from the reference to the target. To clarify this, we added the explanatory sentence: “i.e., if the axial current is not blocked or reversed between the reference and the target.”

      (d) Equation (1.) Why summing axial currents over j? Is this for the case of a branching point?

      The compartment could be 1) part of a continuous segment of dendritic branch, where axial currents can flow from the distal and the proximal direction (sum over 2); 2) It can be a branch point with 3 axial currents; 3) or it can be a leaf compartment with only one axial current, in which case the summation is not relevant. We clarified this in the text.

      (e) Figure 2, caption. Typo. "When the axial currents flows…" Should it be 'current'? - Figure 3, caption. Typo in (C) "Extended currentscape" 
>

      Corrected.

      (f) Figure 4. I cannot see the grey lines or the dotted lines mentioned in the caption. 
>

      We added an arrow highlighting the gray and the dotted lines in the figure.

      (g) Figure 5, caption. "Red boxes highlight regions analyzed in panels B-D."Because this is a spatially extended model, region may be confused with spatial location, but you are highlighting a temporal interval.
>

      We rephrased the caption referring to temporal intervals now.

      (h) Line 341. This is a numerical experiment, correct? 
>

      We clarified in the text and added that it was indeed a simulation experiment.

      (i) Line 349. Should it be 'distributions'? 
>

      Corrected

      (j) Line 422. Typo. Missing space 'in vivousing'
>

      Corrected

      (k) Line 537. "Preprocessing membrane…" I found this entire subsection a bit confusing and hard to read.

      We rephrased this subsection to clarify it and facilitate reading.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the manuscript "Conformational Variability of HIV-1 Env Trimer and Viral Vulnerability", the authors study the fully glycosylated HIV-1 Env protein using an all-atom forcefield. It combines long all-atom simulations of Env in a realistic asymmetric bilayer with careful data analysis. This work clarifies how the CT domain modulates the overall conformation of the Env ectodomain and characterizes different MPER-TMD conformations. The authors also carefully analyze the accessibility of different antibodies to the Env protein.

      Strengths:

      This paper is state-of-the-art, given the scale of the system and the sophistication of the methods. The biological question is important, the methodology is rigorous, and the results will interest a broad audience.

      Weaknesses:

      The manuscript lacks a discussion of previous studies. The authors should consider addressing or comparing their work with the following points:

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work: https://doi.org/10.1101/2025.03.26.645577

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain: https://doi.org/10.1021/jacs.5c15421

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state: https://doi.org/10.1073/pnas.1807259115

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study: https://doi.org/10.1126/science.aaa9804

      Based on the Reviewer’s comments and suggestions, we have added a discussion related to each previous study mentioned above.

      (1) Tilting of the Env ectodomain has also been reported in previous experimental and theoretical work: https://doi.org/10.1101/2025.03.26.645577

      At the end of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “Shehata et al. also built a model of full-length gp120–gp41 trimer embedded in a lipid bilayer and performed all-atom simulations, in which a tilting motion of the ectodomain was observed. Based on the analysis of accessible surface area using different probe radii, they reported that antibody epitopes on the ectodomain are largely shielded by glycans, while the MPER epitope is mainly occluded by the membrane with tilt angles above 30° required to achieve greater MPER exposure (Shehata et al., 2025).”

      (2) A previous all-atom simulation study has characterized the conformational heterogeneity of the MPER-TMD domain: https://doi.org/10.1021/jacs.5c15421

      In the middle of the first paragraph in the Discussion section we added:

      “This is consistent with the all-atom simulations of MPER–TMD–CT and MPER–TMD in an asymmetric membrane conducted by Majumder et al., which likewise show multiple different conformational states of MPER and TMD (Majumder et al., 2025).”

      (3) Experimental studies have shown that MPER-directed antibodies recognize the prehairpin intermediate rather than the prefusion state: https://doi.org/10.1073/pnas.1807259115

      The paper mentioned by the Reviewer mainly reports the NMR structure of the MPER and TMD. In this study, the authors experimentally examined a series of MPER mutations to assess whether alterations in the MPER affect epitope accessibility in other regions of the Env ectodomain. This study did not investigate whether MPER-directed antibodies recognize the prehairpin intermediate. Instead, it cited prior studies (Frey et al.; 2008, Alam et al., 2009; and Chen et al., 2014) reporting that MPER-directed antibodies target the prehairpin intermediate conformation. We have already cited two of them (Alam et al., 2009 and Chen et al., 2014) in the original preprint, and we have now added the third one (Frey et al., 2008) in the revised manuscript.

      In the middle of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “This is consistent with experiment studies indicating that MPER-targeting antibodies bind effectively only after the gp120–gp41 trimer undergoes major conformational rearrangements toward a fusion-intermediate or post-fusion state (Frey et al., 2008; Alam et al., 2009; Chen et al., 2014; Lee et al., 2016).”

      (4) How does the CT domain modulate the accessibility of these antibodies studied? The authors are in a strong position to compare their results with the following experimental study: https://doi.org/10.1126/science.aaa9804

      At the beginning of the second paragraph in the Discussion section we added:

      “Comparison of the full-length and CT-truncated systems shows that the primary difference arises from changes in the lipid bilayer, particularly in the exoplasmic leaflet, whereas differences in protein conformation and dynamics are less evident. Previous experimental studies have reported that mutations of the TMD residue and CT truncation can substantially affect antigenicity of ectodomain (Edwards et al., 2002; Chen et al., 2015; Dev et al., 2016). However, the ectodomain remains relatively rigid in our simulations for both full-length and CT-truncated systems. It is unclear whether this behavior reflects insufficient conformational sampling or artifacts associated with the model structures. Structural information for the CT is very limited, and the NMR structure (PDB ID: 7LOI) was the only available CT structure at the time the simulation systems were constructed. As a result, the extent to which this structure represents the native CT conformation remains uncertain. Additional experimental structural characterization of the CT will be important for achieving a more complete understanding of its functional role.”

      Reviewer #1 (Recommendations for the authors):

      A minor point: The RMSD values in Figure 3-figure supplement 1, seem a little too small. Please check the units.

      Figure 3-figure supplement 1 shows the RMSD of the ectodomain. Prior to RMSD calculation, the snapshots extracted from each trajectory were aligned to the initial structure using the ectodomain as the reference to avoid falsely high RMSD values arising from different orientations of the ectodomain. The relatively small RMSD values therefore reflect the intrinsic structural stability of the ectodomain, indicating that its internal conformation remains stable even though it undergoes substantial tilting motions.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors aim to elucidate how a viral surface protein behaves in a membrane environment and how its large-scale motions influence the exposure of antibody-binding sites. Using long-timescale, all-atom molecular dynamics simulations of a fully glycosylated, full-length protein embedded in a virus-like membrane, the study systematically examines the coupling between ectodomain motion, transmembrane orientation, membrane interactions, and epitope accessibility. By comparing multiple model variants that differ in cleavage state, initial transmembrane configuration, and presence of the cytoplasmic tail, the authors aim to identify general features of protein-membrane dynamics relevant to antibody recognition.

      Strengths:

      A major strength of this study is the scope and ambition of the simulations. The authors perform multiple microsecond-scale simulations of a highly complex, biologically realistic system that includes the full ectodomain, transmembrane region, cytoplasmic tail, glycans, and a heterogeneous membrane. Such simulations remain technically challenging, and the work represents a substantial computational and methodological effort.

      The analysis provides a clear and intuitive description of large-scale protein motions relative to the membrane, including ectodomain tilting and transmembrane orientation. The finding that the ectodomain explores a wide range of tilt angles while the transmembrane region remains more constrained, with limited correlation between the two, offers useful conceptual insight into how global motions may be accommodated without large rearrangements at the membrane anchor.

      Another strength is the explicit consideration of membrane and glycan steric effects on antibody accessibility. By evaluating multiple classes of antibodies targeting distinct regions of the protein, the study highlights how membrane proximity and glycan dynamics can differentially influence access to different epitopes. This comparative approach helps place the results in a broader immunological context and may be useful for readers interested in antibody recognition or vaccine design.

      Overall, the results are internally consistent across multiple simulations and model variants, and the conclusions are generally well aligned with the data presented.

      Weaknesses:

      The main limitations of the study relate to sampling and model dependence, which are inherent challenges for simulations of this size and complexity. Although the simulations are long by current standards, individual trajectories explore only portions of the available conformational space, and several conclusions rely on pooling data across a limited number of replicas. This makes it difficult to fully assess the robustness of some quantitative trends, particularly for rare events such as specific epitope accessibility states.

      In addition, several aspects of the model construction, including the treatment of missing regions, loop rebuilding, and initial configuration choices, are necessarily approximate. While these approaches are reasonable and well motivated, the extent to which some conclusions depend on these modeling choices is not always fully clear from the current presentation.

      Finally, the analysis of antibody accessibility is based on geometric and steric criteria, which provide a useful first-order approximation but do not capture potential conformational adaptations of antibodies or membrane remodeling during binding. As a result, the accessibility results should be interpreted primarily as model-based predictions rather than definitive statements about binding competence.

      Despite these limitations, the study provides a valuable and carefully executed contribution, and its datasets and analytical framework are likely to be useful to others interested in protein-membrane interactions and antibody recognition.

      Based on the Reviewer’s comments, we have revised the Discussion section to emphasize the limitation related to model construction and analysis of antibody accessibility.

      In the middle of the second paragraph in the Discussion section we added:

      “Similar limitations apply to other modeled regions where structural information is incomplete, including missing loops in the ectodomain, the cleavage site and heptad repeat 2 where two PDB structures (IDs: 6B0N and 7LOI) were merged. These regions introduce additional uncertainty, and the extent to which they influence the interpretation of our results remains an open question.”

      In the middle of the third paragraph (originally the second paragraph) in the Discussion section we added:

      “In addition, this analysis is based on geometric and steric criteria without accounting for potential conformational adaptations of gp120–gp41, antibodies, or the membrane; therefore, the calculated frequency of antibody accessibility should be interpreted as an approximation rather than a definitive indicator of binding competence.”

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 45-47: The phrase "A major breakthrough was the design of ..." may be confusing. The gp140 trimer refers to a naturally occurring form of the HIV envelope protein rather than a structure designed de novo. If this statement refers to the development of a specific experimental construct or model system, this should be clarified to avoid misunderstanding.

      We have revised the sentence to clarify that the statement refers to soluble gp140 trimer constructs developed to stabilize the prefusion Env ectodomain for structural and immunological studies.

      At the beginning of the second paragraph in the Introduction section, we have modified the following:

      “A major advance was the development of soluble gp140 trimers, composing gp120 and the ectodomain portion of gp41, designed to stabilize the prefusion Env trimer for structural and immunological characterization.”

      (2) Figure 1A: The figure displays a model structure lacking the cytoplasmic tail. Given that the full-length model is central to the study, the authors may wish to explain why the truncated structure is shown here or consider displaying the full-length model to better reflect the complete system analyzed.

      We have combined Figure 1 and Figure 1—figure supplements 1 to show both full-length and CT-truncated models in one figure. We have also added an explanation of why the CT-truncated model was used as the primary system for analysis.

      In the middle of the third paragraph in the Introduction section we added:

      “However, structural information for the CT remains limited, leading to uncertainty in its conformational organization. To reduce potential bias arising from this uncertainty, we also generated a CT-truncated model and used it as the primary system for analysis (Figure 1, Figure 1—figure supplements 1).”

      We have modified Figure 1

      We removed Figure 1—figure supplements 1

      (3) Line 106: The probability distributions of θEC and θTM are cited in support of the statement that the angles "typically range from ... with occasional tilting." Providing explicit quantitative measures (for example, means, percentiles, or fractions of time spent in different angular regimes) would strengthen this claim.

      We have revised the text to explicitly indicate that only 0.7‰ of the sampled θ<sub>EC</sub> values are greater than 40°.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “Across trajectories, θ<sub>EC</sub> typically ranges from 0° to 40°, with only 0.7‰ exceeding 40°”.

      (4) Figure 2: The meaning of the contour lines is not clearly explained. If these represent probability density estimates of angular values over the trajectory, this should be stated explicitly. In addition, because the angles may evolve over time, it would be helpful to clarify how temporal drift is accounted for in the contour representation.

      We have clarified in both the main text and the figure caption that the contour lines in Figure 2B represent the joint probability density of the ectodomain and TMD tilt angles. We have also added Figure 2—figure supplements 5–8 showing the temporal evolution of the ectodomain and TMD tilt angles.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “The temporal evolution of θ<sub>EC</sub> and θ<sub>TM</sub> is additionally shown in Figure 2—figure supplements 5–8. For the CT-truncated systems, the joint probability densities of θ<sub>EC</sub> and θ<sub>TM</sub> calculated from the final 0.5 µs of each trajectory are shown in Figure 2B, while those for the full-length systems are shown in Figure 2—figure supplement 9.”

      In the caption of Figure 2 we have modified the following:

      “(B) Probability densities of ectodomain and TMD tilt angles, calculated from CT-truncated systems with various initial configurations.”

      We have added Figure 2—figure supplements 5–8.

      We have modified the following:

      “The original Figure 2—figure supplements 5 has been renumbered as Figure 2—figure supplements 9.”

      (5) Figure 2 (supplements): Some datasets are shown using scatter plots, while others are presented as contour plots. Using a consistent visualization style across panels or clearly explaining the rationale for the different representations would improve clarity.

      The contour plots in Figure 2B and Figure 2—figure supplements 9 show the joint distribution of the ectodomain and TMD tilt angles during the final 0.5 µs of each trajectory, whereas the scatter plots in Figure 2—figure supplements 1–4 illustrate the variations of the tilt angles across different time intervals. Each 1-µs trajectory was divided into four 0.25-µs intervals, indicated by light gray, dark gray, black, and red respectively, as shown in the legends of Figure 2—figure supplements 1–4. We have clarified in the main text that the multi-colored scatter plots are intended to demonstrate that large conformational changes predominantly occurred during the first 0.5 µs of each trajectory.

      In the middle of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified the following:

      “Each 1-µs trajectory is divided into four consecutive 0.25-µs intervals, and data points from each interval are distinguished by four different colors (Figure 2—figure supplements 1–4). The variations of θ<sub>EC</sub> and θ<sub>TM</sub> over time show that large conformational changes predominantly occurred during the first 0.5 µs, followed by convergence of the θ<sub>EC</sub> and θ<sub>TM</sub> distributions during the second 0.5 µs in most trajectories.”

      (6) As noted in Line 97, θEC and θTM tilt independently. In this context, presenting time series plots of θEC and θTM separately would be highly informative. Such plots would help readers distinguish between equilibration behavior, drift from initial conditions, and equilibrium fluctuations.

      We have added Figure 2—figure supplements 5–8 showing the temporal evolution of the ectodomain and TMD tilt angles, as noted in our response to comment (4).

      (7) Figure 3A: It is not immediately clear which panels correspond to top views and which correspond to side views. Explicitly labeling these views in the figure or caption would reduce ambiguity.

      We have added labels in Figure 3A to clearly denote the top-view and side-view panels.

      (8) Figure 3B: The description "...by solid and transparent colors..." is ambiguous, as it is unclear whether this refers to color intensity or transparency. The caption would benefit from explicitly stating the visual encoding used (for example, darker/lighter colors or left/right bars).

      We have revised the figure caption to clarify which boxes correspond to cleaved systems and which correspond to uncleaved systems.

      In the caption of Figure 3 we have modified the following:

      “For each residue, the distribution from cleaved systems is shown in dark color (left), and that from uncleaved systems is shown in light color (right).”

      (9) Figure 4H: The definition of "frequency" expressed as a percentage is unclear. If this represents the fraction of snapshots in which two atoms fall within a specified distance range, this should be stated explicitly. The authors should also clarify whether the reported quantity is a probability or a rate, and ensure that the units and terminology are consistent.

      We have revised the figure caption to clarify that the frequency represents the fraction of snapshots in which the heavy atoms of a TMD residue and the interacting component are within 5 Å.

      In the caption of Figure 4 we have modified the following:

      “For each TMD residue–interacting component pair, the frequency represents the fraction of snapshots in which the heavy atoms of the TMD residue and the corresponding component are within 5 Å. Bar shading reflects this fraction, with fully filled bars indicating 100% and empty bars indicating 0%.”

      (10) Line 170: The manuscript describes a "rapid rearrangement" of the transmembrane domain at early simulation times. It would be helpful to clarify whether this regime is considered equilibration and whether it is excluded from subsequent analyses. Plotting time series of the relevant tilting angles and transmembrane rearrangement metrics could help address this point.

      We have clarified that the TMD underwent conformational changes early in the equilibration stage to enable R696 to interact with lipid headgroups, ions, or CT residues, and these interactions were largely maintained throughout the production stage. The time series of TMD tilting angles are now shown in Figure 2—figure supplements 5–8. Notably, the TMD exhibits heterogeneous conformational changes, including tilting, bending, and partial loss of helical structure. Therefore, no single metric or limited set of metrics can comprehensively capture the full extent of TMD conformational variability.

      In the middle of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “Early in the equilibration stage, the TMD rapidly rearranged to allow R696 residues to interact with more favorable partners, including negatively charged lipid headgroups from either leaflet, ions and water molecules diffusing into the bilayer center, as well as polar and positively charged groups in the CT when present. Once the interactions between R696 residues and their binding partners (lipid headgroup, ions or CT residues) were established, they remained stable with minimal changes throughout the production stage.”

      (11) Line 213: As with earlier sections, time series plots of θEC and θTM, similar to those shown in Figure 3-figure supplement 1, would greatly aid interpretation by showing whether these angles drift or fluctuate around stable values.

      The time series of θ<sub>EC</sub> and θ<sub>TM</sub> are now shown in Figure 2—figure supplements 5–8. Line 213 refers to the conformational variability of the MPER. For the same reason discussed in our response to comment (10), the MPER exhibits even greater conformational heterogeneity than the TMD, and therefore cannot be adequately described by a single or small set of geometric metrics such as tilt or bending angles.

      (12) Lines 216-222: The term "trajectories" may be misleading in this context. It is unclear whether the differences discussed arise from different trajectories of the same system or from different systems altogether. Clarifying this distinction would improve interoperability.

      In this paragraph, we describe MPER conformational variations observed across all trajectories from all systems. A preceding sentence has been modified to emphasize that all trajectories from all systems are included. In addition, we have clarified which specific trajectory is referred to when discussing each example.

      At the beginning of the first paragraph in the subsection “MPER adopts diverse conformations, and its exposure depends on both MPER and TMD conformations” we have modified the following:

      “…, and a wide variety of conformations were sampled across all trajectories from all systems.”

      “Such conformation and orientation were maintained in some trajectories such as CL<sup>ΔCT</sup>3 (the third trajectory of the cleaved, CT-truncated system with the low TMD position, Figure 4—figure supplement 2C). In other trajectories, such as CL<sup>CT</sup>1, the helix-turn-helix MPER in one protomer shifted into a horizontal orientation parallel to the membrane surface (Figure 4—figure supplement 6A). In UL<sup>ΔCT</sup>1, the entire MPER adopted a more vertical arrangement, with both MPER-N and MPER-C tilted outward (Figure 4E, Figure 4—figure supplement 4A). We also observed in UH<sup>ΔCT</sup>3 and UL<sup>ΔCT</sup>3 that the HR2 helix in the ectodomain, MPER, and TMD merged into a continuous long helix (Figure 4C, F, Figure 4—figure supplement 3C, 4C). In addition, loss of helical structure within the MPER was common, particularly in the MPER-C region, which often transitioned to a random coil.”

      (13) Lines 280 and 287: Similar concerns apply to the use of the term "trajectories." If observations differ primarily between systems rather than between trajectories within a system, revising the wording accordingly would avoid confusion.

      We have revised the text to clarify that all trajectories from all systems are considered collectively.

      In the middle of the second paragraph in the subsection “Ectodomain epitopes are conditionally accessible, whereas MPER epitopes are virtually inaccessible in the closed prefusion state” we have modified the following:

      “When considering all trajectories from all systems collectively, approximately half of them exhibited at least one protomer with >35% accessibility (Supplementary file 1–Supplementary Table 2).”

      (14) Figure 5B: Providing a time series of the distance dF673, at least in the Supporting Information, would help assess sampling and equilibration. Such plots would complement the probability distributions and increase confidence in the reported trends.

      We have added Figure 5—figure supplement 1 showing the time series of the distance d<sub>F673</sub> to complement the probability distribution in Figure 5B.

      In the middle of the second paragraph in the subsection “MPER adopts diverse conformations, and its exposure depends on both MPER and TMD conformations”, we have modified the following:

      “In the initial ‘low’ and ‘high’ TMD configurations, dF673 was 6.1 Å and 9.1 Å, respectively, but across simulations it spanned a wide range from -15 Å to 20 Å (Figure 5A, B, Figure 5—figure supplement 1).”

      We have added Figure 5—figure supplement 1.

      Reviewer #3 (Public review):

      Summary:

      This study uses large-scale all-atom molecular dynamics simulations to examine the conformational plasticity of the HIV-1 envelope glycoprotein (Env) in a membrane context, with particular emphasis on how the transmembrane domain (TMD), cytoplasmic tail (CT), and membrane environment influence ectodomain orientation and antibody epitope exposure. By comparing Env constructs with and without the CT, explicitly modeling glycosylation, and embedding Env in an asymmetric lipid bilayer, the authors aim to provide an integrated view of how membrane-proximal regions and lipid interactions shape Env antigenicity, including epitopes targeted by MPER-directed antibodies.

      Strengths:

      A key strength of this work is the scope and realism of the simulation systems. The authors construct a very large, nearly complete Env-scale model that includes a glycosylated Env trimer embedded in an asymmetric bilayer, enabling analysis of membrane-protein interactions that are difficult to capture experimentally. The inclusion of specific glycans at reported sites, and the focus on constructs with and without the CT, are well motivated by existing biological and structural data.

      The simulations reveal substantial tilting motions of the ectodomain relative to the membrane, with angles spanning roughly 0-30° (and up to ~50° in some analyses), while the ectodomain itself remains relatively rigid. This framing, that much of Env's conformational variability arises from rigid-body tilting rather than large internal rearrangements, is an important conceptual contribution. The authors also provide interesting observations regarding asymmetric bilayer deformations, including localized thinning and altered lipid headgroup interactions near the TMD and CT, which suggest a reciprocal coupling between Env and the surrounding membrane.

      The analysis of antibody-relevant epitopes across the prefusion state, including the V1/V2 and V3 loops, the CD4 binding site, and the MPER, is another strength. The study makes effective use of existing experimental knowledge in this context, for example, by focusing on specific glycans known to occlude antibody binding, to motivate and interpret the simulations.

      Weaknesses:

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      With stronger integration of prior experimental and computational literature, this work has the potential to serve as a valuable reference for how Env behaves in a realistic, glycosylated, membrane-embedded context. The simulation framework itself is well-suited for future studies incorporating mutations, strain variation, antibodies, inhibitors, or receptor and co-receptor engagement. In its current form, the primary contribution of the study is to consolidate and extend existing observations within a single, large-scale model, providing a useful platform for future mechanistic investigations.

      Following the Reviewer’s comments and suggestions, we have revised the manuscript accordingly.

      While the simulations are technically impressive, the manuscript would benefit from more explicit cross-validation against prior experimental and computational work throughout the Results and Discussion, and better framing in the introduction. Many of the reported behaviors, such as ectodomain tilting, TMD kinking, lipid interactions at helix boundaries, and aspects of membrane deformation, have been described previously in a range of MD studies of HIV Env and related constructs (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC12665260, PMID: 33882664, PMC11975376). Clearly situating the present results relative to these studies would strengthen the paper by clarifying where the simulations reproduce established behavior and where they extend it to more complete or realistic systems.

      We have added a summary of the prior computational studies in the Introduction section.

      At the beginning of the third paragraph in the Introduction section we added:

      “Molecular dynamics (MD) simulations have been employed to investigate the stability and conformational properties of monomeric and trimeric helical TMD in both aqueous and lipid bilayer environments since late 2000s (Kim et al., 2009; Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014; Hollingsworth et al., 2018). Early studies were constrained by limited computational resources and therefore the simulation times are relatively short. Subsequent work employed metadynamics to probe rare events (Gangupomu et al., 2010; Baker et al., 2014), and simulations performed on Anton supercomputers extended sampling to multi-microsecond time scale (Baker et al., 2014). Piai and coworkers determined the NMR structure of a construct comprising the MPER, TMD, and CT, and carried out MD simulations to access the structural stability of the trimeric MPER–TMD–CT complex (Piai et al., 2021). Majumder et al. subsequently simulated the same MPER–TMD–CT complex and applied a machine learning-based approach to classify its conformational ensemble (Majumder et al., 2025). Maillie et al. combined conventional MD, steered MD, and coarse-grained simulations to examine interactions between MPER-targeting antibodies and membrane lipids (Maillie et al., 2025). In addition, MD simulations have been extensively applied to the well-studied ectodomain. Despite these advances, it remains challenging to investigate the gp120–gp41 trimer as an intact entity considering its structural complexity.”

      We have also added a discussion of previous MD simulation studies to the Result section regarding interactions of the TMD residue R696 with ions and lipid headgroups.

      At the end of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity”

      “Previously, Kim et al. reported that the inter-chain interactions between protonated R696 gradually diminished over a short simulation time (23 ns), leading to increased crossing angles and reduced bundle length (Kim et al., 2009). Gangupomu et. al and Baker et. al observed that R696 snorkeled toward either exoplasmic or endoplasmic headgroups in simulations of the TMD monomer, resulting in TMD tilting and membrane thinning due to water penetration and lipid headgroups interacting with R696 (Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014). These observations are consistent with our finding. Hollingsworth et. al also reported membrane thinning; however, they attributed this effect to interfacial interactions of R683 and R707 with both leaflets and proposed that R696 only interacted with water and ions permeating into the center of the TMD timer (Hollingsworth et al., 2018).”

      A related limitation is that the work remains largely descriptive with respect to conformational coupling. Numerous experimental studies have demonstrated functional and conformational coupling between the TMD, CT, and the antigenic surface, with effects on Env stability, infectivity, and antibody binding (e.g., PMC4701381, PMC4304640, PMC5085267). In this context, the statement that ectodomain and TMD tilting motions are independent is a strong conclusion that is not fully supported by the analyses presented, particularly given the authors' acknowledgment that multiple independent simulations are required to adequately sample conformational space. More direct analyses of coupling, rather than correlations inferred from individual trajectories, would help align the simulations with the existing experimental literature. Given the scale of these simulations, a more thorough analysis of coupling could be this paper's most seminal contribution to the field.

      We have added a discussion of the coupling between TMD, CT and Env antigenicity, and the independent motion of ectodomain and TMD in our simulation.

      In the middle of the second paragraph in the Discussion section

      “Our analysis of the ectodomain and TMD coupling indicates that the motions of these two domains are largely independent. This observation does not contradict experimental studies demonstrating functional coupling between the TMD, CT, and the antigenic profiles of Env (Chen et al., 2015; Dev et al., 2016). Munro et al. proposed that unliganded Env is intrinsically dynamic, transitioning among three distinct prefusion conformations: a closed ground state (predominant), a transient state, and a CD4-/co-receptor-stabilized state. Both laboratory-adapted and clinically isolated strains can spontaneously transition among these three states, although their relative occupancies differ (Munro et al., 2014). It is therefore possible that TMD mutations or CT truncation also alter the equilibrium distribution among three states, thereby affecting the epitope exposure, particularly for epitopes that are occluded in the closed ground state while exposed in the CD4-/co-receptor-stabilized state. However, transition among three states occur on millisecond-to-second timescales. Our simulations on microsecond timescales primarily capture conformational variations within the closed ground state and suggest that the MPER acts as a hinge, providing substantial flexibility that enables the ectodomain and TMD to move independently while Env remains in the closed ground state.”

      We have also calculated the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD.

      At the end of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD”

      “We also calculated the dynamical cross-correlation maps (Ichiye et al., 1991) of Cα atoms for all systems using CPPTRAJ (Roe et al., 2013). The results indicate only very weak correlations between the ectodomain and the TMD (Figure 2—figure supplements 10–13).”

      We have added Figure 2—figure supplements 10–13.

      The choice of membrane composition also warrants deeper discussion. The manuscript states that it relies on a plasma membrane model derived from a prior simulation-based study, which itself is based on host plasma membrane (PMID: 35167752), but experimental analyses have shown that HIV virions differ substantially from host plasma membranes (e.g., PMC46679, PMC1413831, PMC10663554, PMC5039752, PMC6881329). In particular, virions are depleted in PC, PE, and PI, and enriched in phosphatidylserine, sphingomyelins, and cholesterol. These differences are likely to influence bilayer thickness, rigidity, and lipid-protein interactions and, therefore, may affect the generality of the conclusions regarding Env dynamics and antigenicity. Notably, the citation provided for membrane composition is a laboratory self-citation, a secondary source, rather than a primary experimental study on plasma membrane composition.

      We have added references to primary experimental studies on plasma membrane composition (van Meer et al., 2008; Sampaio et al., 2011), as well as the prior simulation study proposing the lipid and cholesterol distributions (Ingolfsson et al., 2014).

      At the beginning of the Membrane subsection in the Materials and methods section

      We have modified the following:

      The full-length and CT-truncated gp120–gp41 models were embedded into an asymmetric lipid bilayer with the lipid composition corresponding to a mammalian plasma membrane (van Meer et al., 2008; Sampaio et al., 2011; Ingolfsson et al., 2014; Pogozheva et al., 2022),

      We have also clarified the limitations associated with the choice of lipid composition and emphasized the need to investigate its influence in future studies.

      At the end of the second paragraph in the Discussion section we added:

      “In addition to the limitations inherent to protein structure modeling, the choice of lipid composition remains an open question. In this work, we selected an asymmetric mammalian plasma membrane because it is one of the 18 complex biomembrane systems we previously studied (Pogozheva et al., 2022), and among them, it provides the closest available approximation to the HIV membrane. Nevertheless, experimental studies have reported differences in lipid composition between HIV virions and the host plasma membrane (Aloia et al., 1993; Brugger et al., 2006; Huarte et al., 2016; Mucksch et al., 2019; Tomishige et al., 2023). Although we do not anticipate that our main conclusions regarding Env domain motions and MPER flexibility would change substantially, evaluating the influence of lipid composition represents an important direction for future work.”

      Finally, there are pervasive issues with citation and methodological clarity. Several structural models are referred to only by PDB ID without citation, and in at least one case, a structure described as cryo-EM is in fact an NMR-derived model. Statements regarding residue flexibility, missing regions in structures, and comparisons to prior dynamics studies are often presented without appropriate references. The Methods section also lacks sufficient detail for a system of this size and complexity, limiting readers' ability to assess robustness or reproducibility.

      We have corrected the error in which PDB structure 7LOI was described as a cryo-EM structure; it is in fact an NMR structure. We have also verified that all PDB structures are properly cited at their first occurrence in the manuscript.

      We have clarified that the modeling of palmitoylation sites, glycans and lipid bilayers are done in an automated fashion by different modules in CHARMM-GUI, and added Supplementary file 1–Supplementary Table 8 showing the simulation settings for equilibration and production stages.

      At the end of the subsection “Modeling of full-length gp120–gp41 trimer” we have modified the following:

      “Two mutations (S764C and S837C) were introduced in the CT to restore the palmitoylation sites, and lipid tails oriented towards the hydrophobic core of the bilayer were then attached to the palmitoylation sites using the PDB Manipulation module in CHARMM-GUI (Jo et al., 2008; Jo et al., 2014; Park et al., 2023) (Figure 1D).”

      At the end of the subsection “Glycosylation” we added:

      “The select glycan sequences were represented in the Glycan Reader Sequence format (Jo et al., 2011; Park et al., 2017) and added to the corresponding glycosylation sites using the Glycan Reader & Modeler graphical interface.”

      In the middle of the subsection “Membrane” we added:

      “Membrane systems were constructed using CHARMM-GUI Membrane Builder, which provides a user-friendly graphical interface for selecting lipid types and defining their numbers in each leaflet (Jo et al., 2007; Jo et al., 2009; Wu et al., 2014; Lee et al., 2016; Lee et al., 2019).”

      In the middle of the subsection “Simulation details” we added:

      We have modified the following:

      “Positional and dihedral restraints were applied to proteins, glycans, and lipids, with force constants progressively reduced over successive intervals (Supplementary file 1–Supplementary Table 8).”

      We added Supplementary file 1–Supplementary Table 8.

      Reviewer #3 (Recommendations for the authors):

      Major concerns:

      (1) Strengthen analysis of conformational coupling: Consider analyses that more directly assess coupling between the TMD/CT and ectodomain, such as residue-residue correlation networks, comparisons to smFRET-defined conformational states, or data-driven (e.g., machine learning-based) trajectory analyses. Machine-learning analysis would be particularly helpful in understanding otherwise elusive allosteric networks that could govern large-scale behavior. Discuss how, due to the apparent local minima that occur after ~0.5 us, enhanced sampling methods might be employed to better cover the Env conformational landscape.

      We have calculated the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD.

      At the end of the first paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD”

      “We also calculated the dynamical cross-correlation maps (Ichiye et al., 1991) of Cα atoms for all systems using CPPTRAJ (Roe et al., 2013). The results indicate only very weak correlations between the ectodomain and the TMD (Figure 2—figure supplements 10–13).”

      We added Figure 2—figure supplements 10–13.

      We have also noted in the Discussion section that enhanced sampling methods could be employed to better explore the conformational landscape of Env trimer, including fluctuations within the closed state as well as transitions among the closed ground, transient and CD4/co-receptor-stabilized states proposed in the previous experimental study (Munro et al., 2014).

      In the middle of the second paragraph in the Discussion section we added:

      “Enhanced sampling methods could be applied to more thoroughly explore the conformational landscape, including not only variations within the closed ground state but also transitions among the closed ground, transient and CD4-/co-receptor-stabilized states.”

      (2) Qualify strong independence claims: Rephrase or further support statements asserting independence of ectodomain and TMD motions, particularly in light of known experimental evidence for coupling (PMC4701381, PMC4304640, PMC5085267).

      In addition to adding the dynamical cross-correlation maps showing very weak correlations between the ectodomain and the TMD, we have added a discussion of the coupling between TMD, CT, and Env antigenicity, and the independent motion of ectodomain and TMD in our simulation.

      In the middle of the second paragraph in the Discussion section we added:

      “Our analysis of the ectodomain and TMD coupling indicates that the motions of these two domains are largely independent. This observation does not contradict experimental studies demonstrating functional coupling between the TMD, CT, and the antigenic profiles of Env (Chen et al., 2015; Dev et al., 2016). Munro et al. proposed that unliganded Env is intrinsically dynamic, transitioning among three distinct prefusion conformations: a closed ground state (predominant), a transient state, and a CD4-/co-receptor-stabilized state. Both laboratory-adapted and clinically isolated strains can spontaneously transition among these three states, although their relative occupancies differ (Munro et al., 2014). It is therefore possible that TMD mutations or CT truncation also alter the equilibrium distribution among three states, thereby affecting the epitope exposure, particularly for epitopes that are occluded in the closed ground state while exposed in the CD4-/co-receptor-stabilized state. However, transition among three states occur on millisecond-to-second timescales. Our simulations on microsecond timescales primarily capture conformational variations within the closed ground state and suggest that the MPER acts as a hinge, providing substantial flexibility that enables the ectodomain and TMD to move independently while Env remains in the closed ground state.”

      (3) Clarify membrane composition assumptions: Provide a clearer rationale for the chosen lipid composition, and explicitly discuss how differences between host plasma membranes and HIV virions (e.g., PS, sphingomyelin, and cholesterol enrichment) may affect the conclusions.

      We have clarified the limitations associated with the choice of lipid composition and emphasized the need to investigate its influence in future studies.

      At the end of the second paragraph in the Discussion section we added:

      “In addition to the limitations inherent to protein structure modeling, the choice of lipid composition remains an open question. In this work, we selected an asymmetric mammalian plasma membrane because it is one of the 18 complex biomembrane systems we previously studied (Pogozheva et al., 2022), and among them, it provides the closest available approximation to the HIV membrane. Nevertheless, experimental studies have reported differences in lipid composition between HIV virions and the host plasma membrane (Aloia et al., 1993; Brugger et al., 2006; Huarte et al., 2016; Mucksch et al., 2019; Tomishige et al., 2023). Although we do not anticipate that our main conclusions regarding Env domain motions and MPER flexibility would change substantially, evaluating the influence of lipid composition represents an important direction for future work.”

      (4) Address citation and reference issues: Replace PDB-only references with proper citations, correct mischaracterizations of structure determination methods, and ensure all supplementary citations are fully referenced.

      We have corrected the error in which PDB structure 7LOI was described as a cryo-EM structure; it is in fact an NMR structure. We have also verified that all PDB structures are properly cited at their first occurrence in the manuscript.

      (5) Expand the Methods section: Provide additional detail on system construction, glycan modeling, lipid asymmetry, equilibration, sampling, and limitations, including a discussion of potential benefits of enhanced-sampling approaches.

      We have clarified that the modeling of palmitoylation sites, glycans and lipid bilayers are done in an automated fashion by different modules in CHARMM-GUI, and added Supplementary file 1–Supplementary Table 8 showing the simulation settings for equilibration and production stages.

      At the end of the subsection “Modeling of full-length gp120–gp41 trimer” we have modified the following:

      “Two mutations (S764C and S837C) were introduced in the CT to restore the palmitoylation sites, and lipid tails oriented towards the hydrophobic core of the bilayer were then attached to the palmitoylation sites using the PDB Manipulation module in CHARMM-GUI (Jo et al., 2008; Jo et al., 2014; Park et al., 2023) (Figure 1D).”

      At the end of the subsection “Glycosylation” we added:

      “The select glycan sequences were represented in the Glycan Reader Sequence format (Jo et al., 2011; Park et al., 2017) and added to the corresponding glycosylation sites using the Glycan Reader & Modeler graphical interface.”

      In the middle of the subsection “Membrane” we added:

      “Membrane systems were constructed using CHARMM-GUI Membrane Builder, which provides a user-friendly graphical interface for selecting lipid types and defining their numbers in each leaflet (Jo et al., 2007; Jo et al., 2009; Wu et al., 2014; Lee et al., 2016; Lee et al., 2019).”

      In the middle of the subsection “Simulation details” we have modified the following:

      “Positional and dihedral restraints were applied to proteins, glycans, and lipids, with force constants progressively reduced over successive intervals (Supplementary file 1–Supplementary Table 8).”

      We added Supplementary file 1–Supplementary Table 8.

      The discussion of potential benefits of enhanced-sampling approaches is included in our response to major concern (1).

      (6) Data availability: In addition to code, deposit all MD trajectories for re-analysis. The scale of this simulation was likely costly (GPU time), and so data availability is imperative.

      We have deposit MD simulation trajectories to Zenodo.

      At the end of the section “Data availability” we added:

      “The simulation trajectories can be found at https://doi.org/10.5281/zenodo.18853902, https://doi.org/10.5281/zenodo.18854615, and https://doi.org/10.5281/zenodo.18854639.”

      Minor:

      (1) Stylistic: Suggested to revise Figure 1 to provide a clearer overview of all constructs with consistent nomenclature (e.g., "full-length" versus "ΔCT") and explicit domain boundaries. With a better overview figure, the current figures could comprise the Figure 1 associated with Figures 1 and 2.

      We have combined Figure 1 and Figure 1—figure supplement 1 to show both full-length and CT-truncated models in one figure.

      We have modified Figure 1.

      We have removed Figure 1—figure supplements 1.

      (2) Explicitly cross-validate against prior studies: Integrate comparisons to existing MD simulations and experimental studies (e.g., PMC2730987, PMC2980712, PMC4254001, PMC4040535, PMC6035291, PMC4701381, PMC5085267) directly into the Results and Discussion.

      We have added discussion of previous MD simulation studies to the Result section regarding interactions of the TMD residue R696 with ions and lipid headgroups.

      At the end of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “Previously, Kim et al. reported that the inter-chain interactions between protonated R696 gradually diminished over a short simulation time (23 ns), leading to increased crossing angles and reduced bundle length (Kim et al., 2009). Gangupomu et. al and Baker et. al observed that R696 snorkeled toward either exoplasmic or endoplasmic headgroups in simulations of the TMD monomer, resulting in TMD tilting and membrane thinning due to water penetration and lipid headgroups interacting with R696 (Gangupomu et al., 2010; Baker et al., 2014; Baker et al., 2014). These observations are consistent with our finding. Hollingsworth et. al also reported membrane thinning; however, they attributed this effect to interfacial interactions of R683 and R707 with both leaflets and proposed that R696 only interacted with water and ions permeating into the center of the TMD timer (Hollingsworth et al., 2018).”

      The discussion of PMC4701381 and PMC5085267 is included in our response to major concern (2).

      (3) "In the cryo-EM structure (PDB ID: 7LOI)": This is an NMR model and lacks citation.

      We have corrected this error and added the citation at the first occurrence of PDB ID: 7LOI in the Result section.

      In the middle of the first paragraph in the subsection “The energetically unfavorable R696 in the hydrophobic core results in asymmetric, kinked TMD conformations and disrupts membrane integrity” we have modified the following:

      “In the NMR structure (PDB ID: 7LOI) (Piai et al., 2021),”

      (4) "Higher RMSF values were observed in the residues missing from the cryo-EM structure": This is lacking citation, as there are multiple cryo-EM structures and several dynamics studies using NMR.

      The missing residues here specifically refer to those absent in the cryo-EM structure (PDB ID: 6B0N) used for model building, rather than all cryo-EM structures in the PDB. We have revised the text to clarify this distinction.

      In the middle of the second paragraph in the subsection “The ectodomain maintains a rigid internal structure and tilts independently of the TMD” we have modified th following:

      “Higher RMSF values were observed in the residues missing from the cryo-EM structure (PDB ID: 6B0N) (Sarkar et al., 2018), which was used for the ectodomain in model building (these missing residues are highlighted in red in Figure 1A, B),”

  5. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Index on Censorship. Interview with a troll. Index on Censorship, September 2011. URL: https://www.indexoncensorship.org/2011/09/interview-with-a-troll/ (visited on 2023-12-10).

      This article about trolling and how that has been seen from angles of hate crime, etc is interesting because you never really know the true intentions behind trolling. Because its often an algorithm or a line of code behind it, its hard to pinpoint a situation in which it would be taken very seriously. However, when its being done on suicide victims pages, that definitley feels more targeted and like a real serious issue.

    1. 9.2 Downsampling Strategies

      i wonder about writing down in pseudo-code what a reader has to do here, e.g. coming from an object query, how it finds what spatial chunks/fragments, order of the different range requests etc.