10,000 Matching Annotations
  1. Sep 2025
    1. Reviewer #1 (Public review):

      Summary:

      The study by Raiola et al. conducted a quantitative analysis of tissue deformation during the formation of the primitive heart tube from the cardiac crescent in mouse embryos. Using the tools developed to analyze growth, anisotropy, strain, and cell fate from time-lapse imaging data of mouse embryos, the authors elucidated the compartmentalization of tissue deformation during heart tube formation and ventricular expansion. This paper describes how each region of the cardiac tissue changes to form the heart tube and ventricular chamber, contributing to our understanding of the earliest stages of cardiac development.

      Strengths:

      In order to understand tissue deformation in cardiac formation, it is commendable that the authors effectively utilized time-lapse imaging data, a data pipeline, and in silico fate mapping.

      The study clarifies the compartmentalization of tissue deformation by integrating growth, anisotropy, and strain patterns in each region of the heart.

      Weaknesses:

      The significance of the compartmentalization of tissue deformation for the heart tube formation remains unclear.

    2. Reviewer #2 (Public review):

      The authors address an important challenge in developmental biology: the quantitative description of tissue deformation during organogenesis. They have developed a new pipeline to quantify early heart tube morphogenesis in the mouse, with cellular resolution. They adopt an elegant approach by integrating multiple 3D time-lapse datasets into a dynamic atlas of cardiac morphogenesis in order to compute spatio-temporal deformation patterns. The main findings highlight a strong compartmentalization of cell behaviors, with tissue growth and anisotropy exhibiting complementary and spatially segregated patterns. Using these data, the authors developed an in-silico fate mapping tool to interrogate cell displacement within the myocardium. This virtual model provides new mechanistic insights into how the bilateral cardiac primordia converge and transform into a three-dimensional heart tube. The authors identify "belt-like" constraints at the arterial and venous poles that prevent tissue expansion and thus shape the ventricular barrel morphology.

      The computational framework is highly innovative and impressive, providing an unprecedented 3D model of tissue deformation during heart morphogenesis. It also opens avenues for testing hypotheses regarding tissue growth and the forces that cause cell motion. However, the proposed model of ventricular chamber formation with the two constraining belts remains hypothetical, lacking biological validation and requiring strengthening or modulation.

      Overall, this carefully performed study provides a new model for exploring tissue deformation during organogenesis and will be of broad interest to computational and developmental biologists.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript by Raiola and colleagues entitled "Quantitative computerized analysis demonstrates strongly compartmentalized tissue deformation patterns underlying mammalian heart tube formation" takes a highly quantitative approach to interrogating the earliest stages of cardiogenesis (12 hours, from early cardiac crescent to early heart tube) in a new and innovative way. The paper presents a new computational framework to help identify both regional and temporal patterns of tissue deformation at cellular resolution. The method is applied to live embryo imaging data (newly generated and from the group's previous pioneering work). In the initial setup, the new model was applied directly to raw time-lapse data, and the results were compared to actual cell tracks identified manually, showing close correlations of the model with the manual tracking. Next, they integrated spatial and temporal information from different embryos to generate a new model for tissue movement, driven by parameters such as tissue growth and anisotropy. Key findings from their model suggest that there are distinct compartments of tissue deformation patterns as the bilateral cardiac crescent develops into the linear heart tube, and that the ventricular chamber forms by a defined expansion pattern, as a 'hemi-barrel shape', with the aterial and venous poles (IFT and OFT) acting as the harnessing belts constraining the expansion of the chamber further. Lastly, the model is tested for its ability to predict future residence of cardiac crescent cells in the heart tube, which it seems to be able to do successfully based on fate tracking validation experiments.

      Strengths:

      The manuscript provides an exceptionally careful analysis of a critical stage during heart development - that of the earliest stages of morphogenesis, when the heart forms its first tube and chamber structures. While numerous studies have interrogated this stage of heart development, few studies have performed time-lapse imaging, and, to my knowledge, no other report has performed such in in-depth quantitative analysis and modeling of this complex process. The computational model applied to normal heart development of the myocardium (labelled by Nkx2-5) has revealed multiple new and interesting concepts, such as the distinct compartments of tissue deformation patterns and the growth trajectories of the emerging ventricle. The fact that the model operates at cellular resolution and over a nearly continuous time period of approximately 12 hours allows for unprecedented depth of the analysis in a largely unbiased manner. Going forward, one can imagine such models revealing additional information on these processes, performing analyses of subpopulations that form the heart, and maybe most importantly, applying the model to various perturbation models (genetic or otherwise). The manuscript is very well written, and the data display is accessible and transparent.

      Weaknesses:

      No major weaknesses are noted with the study. It would have been very exciting to see the model applied to any kind of perturbation, for example, a left-right defect model, or a model with compromised cardiac progenitor populations. However, the amount of live imaging required for such analyses renders this out of scope for the current study.

    4. Author response:

      We are going to modify the text following Reviewer’s comments and perform embryo direct labelling experiments to experimentally address the contraction of the two “belts” proposed in our model. We feel that this aspect is feasible in a reasonable time and important for the model proposed. We appreciate the relevance of using this framework to identify molecular drivers of the regionalized tissue behaviours uncovered and how these might be altered in mutant models, but feel that these aspects demand efforts beyond the the reasonable revision periods.

    1. eLife Assessment

      This work presents valuable new data on the role of D-Serine and how it competes with its stereoisomer L-Serine to influence metabolism. The work presents a variety of solid experimental data combined with simulated results to investigate the mechanisms focused on one-carbon metabolism, which is relevant for several research fields. However, some claims are only partially supported by data, and critical areas comparing L- vs D-Serine and further mechanistic studies are incomplete. Furthermore, while the work has potential for various fields, the work has only been studied in a limited cell type and context.

    2. Reviewer #1 (Public review):

      Summary:

      The authors demonstrate the stereoselective role of D-serine in 1C metabolism, showing that D-serine competes with L-serine and inhibits mitochondrial L-serine transport. They observe expression of 1C metabolites in their metabolomics approach in primary cortical neurons treated with L-serine, D-serine, and a mixture of both. Their conclusions are based on the reduction in levels of glycine, polyamines, and their intermediates and formate. Single-cell RNA sequencing of N2a cells showed that cells treated with D-serine enhanced expression of genes associated with mitochondrial functions, such as respiratory chain complex assembly, and mitochondrial functions, with downregulation of genes related to amino acid transport, cellular growth, and neuron projection extension. Their work demonstrates that D-serine inhibits tumor cell proliferation and induces apoptosis in neural progenitor cells, highlighting the importance of D-serine in neurodevelopment.

      Strengths:

      D-amino acids are a marvel of nature. It is fascinating that nature decided to make two versions of the same molecule, in this case, an amino acid. While the L-stereoisomer plays well-known roles in biology, the D-stereoisomer seems to function in obscurity. Research into these novel signaling molecules is gathering momentum, with newer stereoisomers being discovered. D-serine has been the most well-studied among the different stereoisomers, and we still continue to learn about this novel neurotransmitter. The roles of these molecules in the context of metabolism is not well studied. The authors aim to elucidate the metabolic role of D-serine in the context of neuronal maturation with implications for 1C metabolism and in cell proliferation. The metabolic role of these molecules is just beginning to be uncovered, especially in the context of mammalian biology. This is the strength of the manuscript. The authors have done important work in prior publications elucidating the role of D-amino acids. The advancement of the field of D-amino acids in mammalian biology is significant, as not much is known. The presentation of RNA seq data is a valuable resource to the community, however, with caveats as mentioned below.

      Weaknesses:

      The following are some of the issues that come out in a critical reading of the manuscript. Addressing these would only strengthen and clarify the work.

      (1) Kinetic assessment of D-serine versus L-serine: While the authors mention that D-serine is not a good substrate for SHMT2 compared to L-serine, the kinetic data are presented for only D-serine. In a substrate comparison with an enzyme, data must be presented for L-serine as well to make the conclusion about substrate specificity and affinity. Since the authors talk about one versus another substrate, there needs to be a kinetic comparison of both with Km (affinity). (Ref Figure 2 panel).

      (2) Molecular Dynamics simulations, while a good first step in modeling interactions at the active site, rely on force fields. These force fields are approximations and do not represent all interactions occurring in the natural world. Setting up the initial conditions in the simulations can impact the final results in non-equilibrium scenarios. The basic question here is this: Is the simulated trajectory long enough so that the system reaches thermodynamic equilibrium and the measured properties converge? Prior studies have shown mixed results with the conclusion that properties of biological systems tend to converge in multi-second trajectories (not nanosecond scales as reported by the authors) and transition rates to low probability conformations require more time. (Ref Figure 2C).

      (3) The authors use N2a cell line to demonstrate D-serine burden on primary cortical neurons. N2a is an immortalized cell line, and its properties are very different from primary neurons. The authors need to mention a rationale for the use of an immortalized cell line versus primary neurons. The transcriptomic profile of an immortalized cell line is different compared to a primary cell. Hence, the response to D-serine may vary between the two different cell types.

      (4) In Figure 4D, the authors mention that D-serine activates the cleavage of caspase 3. Figure 4D shows only cleaved caspase 3 as a single band. They need to show the full blot that contains the cleaved fragments along with the major caspase 3 band.

      (5) In Figure panel 4, the authors use neural progenitor cells (NPCs). They need to demonstrate that the population they are working with is NPCs and not primary neurons. There must be a figure panel staining for NPC markers like SOX2 and PAX6. Also, Figure S5 needs to be properly labeled. It is confusing from the legend what panels B-E refer to? Also, scale bars are not indicated.

      (6) In Supplementary Figure panel 7F, the authors mention phosphatidyl L-serine and phosphatidyl D-serine. A chromatogram of the two species would clarify their presence as they used 2D-HPLC. On an MS platform, these 2 species are not distinguishable. Including a chromatogram of the 2 species would be helpful to the readers.

      (7) The authors mention about enantiomeric shift of serine metabolism during neural development, which appears to be a discussion of prior published data from Hubbard et al, 2013, Burk et al, 2020, and Bella et a,l 2021 in Supplementary Figure panels 8 A-E. This should not be presented as a figure panel, as it gives the false impression that the authors have performed the experiment, which is clearly not the case. However, its discussion can well serve as part of the manuscript in the discussion section.

      (8) The entire presentation of the section on enantiomeric shift of serine metabolism during neural development (lines 274-312) is a discussion and should be part of the discussion section and not in the results section. This is misleading.

      (9) The discussion section is not well written. There is no mention of recent work related to D-serine that has a direct bearing on its metabolic properties. In the discussion section, paragraph 1, the authors mention that their work demonstrates the selective synthesis of D-serine in mature neurons as opposed to neural progenitor cells. This concept has been referred to in prior publications:

      (a) Spatiotemporal relationships among D-serine, serine racemase, and D-amino acid oxidase during mouse postnatal development. PMID:14531937.

      (b) D-cysteine is an endogenous regulator of neural progenitor cell dynamics in the mammalian brain. PMID:34556581.

      (10) In the abstract, in lines 101 and 102, the authors mention "how D-serine contributes to cellular metabolism beyond neurotransmission remains largely unknown". In 2023, a paper in Stem Cell Reports by Roychaudhuri et al (PMID:37352848) showed that D and L-serine availability impacts lipid metabolism in the subventricular zone in mice, affecting proliferative properties of stem-cell derived neurons using a comprehensive lipidomics approach. There is no mention of this work even in the discussion section, as it bears directly on L and D-serine availability in neurons, which the authors are investigating. In the discussion section in lines 410-411, the authors mention the role of D-serine in neurogenesis, but surprisingly don't refer to the above reference. The role of D-serine in neurogenesis has been demonstrated in the Sultan et al (lines 855-857) and Roychaudhuri et al references.

      (11) Both D-serine and the structurally similar stereoisomer D-cysteine (sulfur versus oxygen atom) have a bearing on 1C metabolism and the folate cycle. With reference to the folate cycle, Roychaudhuri et al in 2024 (PMID:39368613) have shown in rescue experiments in mice that supplementing a higher methionine diet provides folate cycle precursors to rescue the high insulin phenotype in SR-deficient mice. Since 1C metabolism is being discussed in this manuscript, the authors seem to overlook prior work in the field and not include it in their discussion, even when it is the same enzyme (SR) that synthesizes both serine and cysteine. Since the field of D-amino acid research is in its infancy, the authors must make it a point to include prior work related to D-serine at least, and not claim that it is not known. The known D-stereoisomers are not many, hence any progress in the area must include at least a discussion of the other structurally related stereoisomers.

      (12) Racemases (serine and aspartate) in general are promiscuous enzymes and known to synthesize other stereoisomers in addition to D-serine, D-cysteine, and D-aspartate. A few controls, like D-aspartate, D-cysteine, or even D-alanine must be included in their study to demonstrate the specific actions of D-serine, especially in the N2a cell treatment experiments. Cysteine and Serine are almost identical in structure (sulfur versus oxygen atom), and both are synthesized by serine racemase (published). Cysteine has also been very recently shown to inhibit tumor growth and neural progenitor cell proliferation. (PMIDs: 40797101 and 34556581). How the authors' work relates to the existing findings must be discussed, and this would put things in perspective for the reader.

    3. Reviewer #2 (Public review):

      Summary:

      This study by Suzuki et al. reports an interesting stereo-selective role of D-serine in regulating one-carbon metabolism during neurodevelopment to adapt the functional transition, probably through the competition with mitochondrial transport of L-serine. The authors provide a multi-layered set of evidence, including metabolomics, enzyme assays, mitochondrial transport competition, and functional assays in immature/neural progenitor cells, to build up a conceptual integration of D-serine as both a neurotransmitter and a metabolic regulator in the central neural system, which raises a broad potential interest to the neuroscience and metabolism communities.

      Strengths:

      This work provides a conceptual advance that D-serine not only serves as a traditional neurotransmitter in the central neural system but also critically contributes to metabolic regulation of neural cells. The authors performed solid metabolomic assays to validate the suppressive effect of D-serine on the one-carbon metabolic pathway, providing some evidence that D-serine competitively inhibits mitochondrial serine transport, but not directly impairs SHMT2 enzymatic activity. All these data indicate a critical role of D-serine synthesis during neural maturation and suggest a potential translational strategy for targeting serine metabolism in neural tumors.

      Weaknesses:

      (1) The detailed mechanism by which D-serine competes with L-serine for its mitochondrial transport is not investigated. For example, although the authors made some discussion, they did not provide direct genetic or biochemical evidence linking these effects to the specific transporters, such as SFXN1.

      (2) Unlike tumor cells, where SHMT2 usually plays a predominant role in catalyzing serine/THF-derived one-carbon metabolism, normal cells may employ both SHMT1 and SHMT2 to do the work. Even under certain conditions that SHMT2-mediated one-carbon metabolism is suppressed, the activity of SHMT1 could be elevated for compensation. Thus, it is important to investigate whether D-serine affects SHMT1 activity or changes the balance between SHMT1- and SHMT2-mediated one-carbon metabolism. To this aim, the authors are strongly encouraged to perform a metabolic flux assay (MFA) by using 13C-labeled L-serine in the model cells in the presence and absence of D-serine.

      (3) A defect in serine-derived one-carbon metabolism may cause multiple cellular stress responses. It is valuable to detect whether cellular NADPH/NADH, GSH, or ROS is altered before and after D-serine treatment.

      (4) The physiological relevance between D-serine and neural cell maturation/death should be further tested and discussed, since the dosage of D-serine used in the in vitro assay is much higher than that in physiological conditions.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript presents a comprehensive and well-executed investigation into the metabolic role of D-serine in the central nervous system. The authors provide solid evidence that D-serine competitively inhibits mitochondrial L-serine transport, thereby impairing one-carbon metabolism. This stereoselective mechanism reduces glycine and formate production, suppresses cellular proliferation, and induces apoptosis in immature neural cells and glioblastoma stem cells. Developmental analyses further reveal a physiological enantiomeric shift in serine metabolism during neurogenesis, aligning with the transition from proliferation to maturation. Overall, the study bridges developmental neurobiology, cancer metabolism, and amino acid transport, uncovering a previously unrecognized metabolic function of D-serine beyond its role in neurotransmission.

      Strengths:

      (1) The discovery that D-serine inhibits one-carbon metabolism by competing for mitochondrial L-serine transport-rather than through enzymatic inhibition or receptor-mediated signaling-represents a significant and previously underappreciated mechanism. This finding has broad implications for understanding metabolic regulation during neurodevelopment and offers potential relevance for targeting metabolic vulnerabilities in cancer.

      (2) The authors integrate metabolomics, mitochondrial transport assays, molecular dynamics simulations, genetic and pharmacologic perturbations, transcriptomics, and both in vitro and ex vivo models. The breadth of experimental approaches, combined with the coherence of the findings across systems, provides strong support for the central conclusions and enhances the overall impact of the study.

      (3) The temporal shift in D-/L-serine levels during neurodevelopment is elegantly linked to the transition from proliferative to mature neuronal states. The selective vulnerability of neural progenitors and tumor cells-contrasted with the resistance of mature neurons-highlights a biologically meaningful and potentially targetable metabolic distinction.

      Weaknesses:

      (1) While the authors attribute D-serine's metabolic effects to competition with mitochondrial L-serine transport, the specific identity of the transporter(s) mediating this process remains undefined. This represents a meaningful mechanistic gap, as the central conclusion depends on D-serine limiting mitochondrial L-serine availability to inhibit one-carbon metabolism.

      (2) The effective concentrations of D-serine used in vitro (IC₅₀ ≈ 1-2 mM) exceed typical brain levels (~0.3 mM). While the authors acknowledge this, a more focused discussion on whether higher local D-serine concentrations could arise in specific microenvironments - such as synaptic compartments, tumor niches, or pathological states-would help contextualize the in vitro findings and strengthen their physiological relevance. For example, disruptions in D-serine clearance or altered expression of serine racemase and transporters in disease contexts could lead to localized accumulation. Moreover, differences between extracellular and intracellular D-serine pools - and the mechanisms governing their regulation - may further influence its metabolic impact in vivo.

      (3) While the manuscript focuses on neural stem/progenitor cells and neural tumors, it remains unclear whether the anti-proliferative effects of D-serine are specific to neural lineages or extend to other highly proliferative non-neural cell types. A brief discussion addressing this point would help clarify the scope of D-serine's metabolic impact and whether its mechanism of action reflects a unique vulnerability in neural cells or a more general feature of proliferative metabolism. This distinction is particularly relevant for assessing the broader therapeutic potential of targeting mitochondrial L-serine transport.

    1. eLife Assessment

      Plasmodesmata are channels that allow cell-cell communication in plants; based on the functional similarities between facilitated transport within plasmodesmata and into the nucleus, the authors speculate that nuclear pore complex proteins might be involved in plasmodesmata function. If supported, this would transform our understanding of cell-to-cell communication in plants. The authors localize nuclear pore complex proteins to plasmodesmata using proteomics and heterologous overexpression; however, the data are incomplete since key controls for localization, functionality, and expression level of fluorescent protein fusions are absent.

    2. Reviewer #1 (Public review):

      Summary:

      Plasmodesmata are channels that allow cell-cell communication in plants; based on the functional similarities between facilitated transport within plasmodesmata and into the nucleus, the authors speculate that nuclear pore complex proteins might be involved in plasmodesmata function. In this manuscript, they localize nuclear pore complex proteins to plasmodesmata using proteomics and heterologous overexpression. They also document a possible plasmodesmata transport defect in a mutant affecting one nuclear pore complex protein.

      Strengths:

      The main strength of this manuscript is the interesting and novel hypothesis. This work could open exciting new directions in our understanding of plasmodesmata function and cell-cell communication in plants. They also localized many NUPs (12/35 Arabidopsis NUPs).

      Weaknesses:

      The main weakness of this manuscript is that the data are incomplete. While the authors appropriately and frequently acknowledge caveats to their data, two controls are essential to interpret the results that fluorescently-tagged NUPs localize to the plasmodesmata: (1) assessment of the expression level of these fluorescently-tagged NUPs to determine whether the plasmodesmata localization might be an overexpression artefact; (2) assessment of the function of the fluorescently-tagged NUPs, either by molecular complementation of a knockout mutant phenotype or by biochemical methods to test whether the fluorescently-tagged NUP incorporates into nuclear pore complexes. Conducting these experiments for even one fluorescently-tagged NUP would substantially strengthen this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to address whether nuclear pore complex components localize and function at PD in plant cells to mediate cell-to-cell communication.

      Strengths:

      (1) Novelty and Significance:<br /> The core hypothesis, drawing parallels between PD and NPC transport, is highly original and addresses a critical gap in understanding plant intercellular communication. The idea that phase-separated domains formed by FG-NUPs could act as diffusion barriers at PD offers a plausible and sophisticated explanation for their complex transport properties, including size exclusion and facilitated translocation. This could fundamentally change how we view PD function.

      (2) Comprehensive Evidence:<br /> The study employs a rigorous and diverse set of experimental approaches, including a comprehensive bioinformatic analysis of both moss and Arabidopsis NUPs in available PD proteomic datasets, extensive imaging analysis of Nup localization in vivo, and functional transport assays using a loss-of-function nup mutant (cpr5). The transport assay is particularly important to provide functional evidence linking CPR5 to PD-mediated transport. The finding that callose levels were not significantly different in cpr5 mutants under these conditions is helpful and supports a distinct, callose-independent mechanism of transport regulation.

      (3) Objectivity:<br /> The authors are forthright in discussing the limitations and potential artifacts of their own data, clearly distinguishing between observations and definitive conclusions.

      Weaknesses:

      While the claims are generally justified as hypotheses or consistent observations, the authors themselves extensively detail the caveats, which are worth reiterating for clarity:

      (1) Potential Overexpression Artifacts in Localization:<br /> Although efforts were made to control expression levels, the authors acknowledge that transient overexpression could still lead to NUP accumulation at PD, either as a physiologically relevant accumulation under excess conditions or due to mis-targeting, or even as storage depots. The resolution of confocal microscopy also does not allow for a definitive conclusion on the nature of the location.

      (2) Proteomics Purity:<br /> The authors note that the presence of NUPs in PD fractions/proteomics cannot definitively rule out contamination, as PD cannot currently be purified to absolute homogeneity and is often contaminated with other organelles, including the nucleus.

      (3) CPR5 Mutant Interpretation:<br /> While cpr5 mutants exhibited reduced macromolecular transport, the authors state that they cannot exclude that the reduced transport is due to secondary effects in the cpr5 mutants, which show rather severe phenotypic defects. This is an important distinction, as CPR5 has known roles in defense responses and hormone signaling that could indirectly influence PD integrity, independent of callose deposition. The lack of effect on small molecule transport is a good control, but the broader pleiotropic effects of cpr5 mutants remain a consideration.

      (4) Conceptual Distinction between NPC and PD:<br /> The authors correctly point out that while similarities exist, the physical assembly of NUPs at PD must differ from that at the NPC due to the presence of the desmotubule and smaller cytoplasmic sleeve width at PD. Moreover, nucleocytoplasmic transport depends on karyopherin proteins that interact with the NPC central channel to complete the transport. Yet the role of karyopherins in this case is not clear. Therefore, the proposed "PD pore complex" may bear some NPC features, but not be identical.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript presents a step towards testing the hypothesis that plasmodesmata have homology to nuclear pores. The similarities between the two structures have long been noted as both structures allow the transport of proteins and nucleic acids, and both structures are composed of curved membranes. The manuscript has identified nuclear pore proteins (NUPs) in plasmodesmal protein fractions and uses live imaging in a non-endogenous system and functional assays of a mutant to propose that this might be a bona fide association.

      The conclusions the authors seek to draw are that: NUPs are present in plasmodesmal protein fractions; NUPs localise at plasmodesmata; NUPs might form a pore-gating complex at plasmodesmata, regulating non-specific (2xGFP) and specific (SHR) transport through plasmodesmata

      The authors then use these conclusions to propose the possibility that phase separation mediates transport through plasmodesmata. If there is phase separation at plasmodesmata or a nuclear pore-like complex, it would revolutionise the community. However, this data is insufficient to act as a cornerstone for such a discovery.

      Strengths:

      The strength of the manuscript lies in the boldness and novelty of the idea.

      Weaknesses:

      The weaknesses lie in the lack of informative controls. The authors' own assessments of their data suggest they agree with this - in their abstract alone, they point out that the transport defects they observe might be off-target effects, and suggest there is a requirement in the future to determine whether the NUPs are bona fide PD components.

      Across the proteomic and live imaging experiments, the conclusions could be stronger if they compared the NUP localisation and accumulation with ER proteins - the question of whether NUPs behave like other ER proteins is not addressed. As NUPs reside in the nuclear envelope, continuous with the ER, and the ER traverses plasmodesmata, a comparison between the NUPs and ER proteins would be extremely informative.

      Regarding the proteomic identification of NUPs in plasmodesmal fractions, the authors place significant weight on their own metric for PD enrichment, the PD score. As I understand it, this a metric derived from addition of two factors: a two component enrichment score that is the difference between intensity of peptides of a given protein in the PD fraction and cell wall fraction, added to the difference between intensity of peptides of a given protein in the PD fraction and total cell fraction, and a feature score that is a factor that describes representation of protein domains contained in said given protein in the plasmodesmal fraction relative to the representation of that domain in proteins in the whole proteome. The features chosen for analysis are not indicated, and the feature factor, as I understand it, is a score common to all proteins with a given feature. While each of the factors carries a measure of meaning and information, I do not understand how adding them is mathematically or biologically meaningful.

    1. eLife Assessment

      This important study demonstrates the potential of synthetic gene circuits to detect and target aberrant RAS activity in cancer cell lines. The circuit design is novel and the evidence supporting the claims is convincing. As a proof-of-concept, this will be of broad interest to researchers in synthetic biology and therapeutics development, while future work will be required to help translate this technology toward clinical applications in cancer therapeutics and address potential limitations of the strategy.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript presents a comprehensive study on the developing synthetic gene circuits targeting mutant RAS expressing cells. The aim of this study is to use these RAS targeting circuits as cancer cell classifiers and enable the selective expression of an output protein in correlation with RAS activity. The system is based on the bacterial two-component system NarX/NarL. A RAS-binding domain is fused to a NarX mutant either defective in the ATP binding (N509A) or the phosphorylation site (H399Q). Nanocluster formation of RAS-GTP reconstitutes an active histidine kinase sensor dimer that phosphorylates the response regulator NarL thus leading to the expression of an output protein. The integration of RAS-dependent MAPK responsive elements to express the RAS sensor components generates RAS circuits with an extended dynamic range between mutant and wild-type RAS. The selectivity of the RAS circuits is confirmed in a set of cancer cell lines expressing endogenous levels of mutant or wild-type RAS or oncogenes affecting RAS signaling upstream or downstream. Expression of the suicide gene HSV thymidine kinase as an outcome protein kills RAS-driven cancer cells demonstrating the functionality of the system.

      Strengths:

      This proof-of-concept study convincingly demonstrates the potential of synthetic gene circuits to target oncogenic RAS in tumor cell lines, act as RAS mutant cell classifier, and induce the killing of RAS-driven cells.

      Weaknesses:

      A therapeutic strategy based on of this four-plasmid system may be difficult to implement in RAS-driven solid cancers. However, potential solutions are discussed.

    3. Reviewer #2 (Public review):

      The manuscript describes an interesting approach towards designing genetic circuits to sense different RAS mutants in the context of cancer therapeutics. The authors created sensors for mutant RAS and incorporated feed-forward control that leverages endogenous RAS/MAPK signaling pathways in order to dramatically increase the circuits' dynamic range. The modularity of the system is explored through the individual screening of several RAS binding domains, transmembrane domains, and MAPK response elements, and the author further extensively screened different combinations of circuit components. This is an impressive synthetic biology demonstration that took it all the way to cancer cell lines. However, given the sole demonstrated output in the form of fluorescent proteins, the authors' claims related to therapeutic implications require additional empirical evidence or, otherwise, expository revision.

      Major comments:

      "These therapies are limited to cancers with KRASG12C mutations" is technically accurate. However, in this fast-moving field, there are examples such as MRTX1133 which holds the promise to target the very G12D mutation that is the focus of this paper. There are broader efforts too. It would help the readers better appreciate the background if the authors could update the intro to reflect the most recent landscape of RAS-targeting drugs.

      Only KRASG12D was used as a model in the design and optimization work of the genetic circuits. Other mutations should be quite experimentally feasible and comparisons of the circuits' performances across different KRAS mutations would allow for stronger claims on the circuits' generalizability. Particularly, the cancer cell line used for circuit validation harbored a KRASG13D mutation. While the data presented do indeed support the circuit's "generalizability," the model systems would not have been consistent in the current set of data presented.

      In Figure 2a, the text claims that "inactivation of endogenous RAS with NF1 resulted in a lower YFP/RBDCRD-NarX expression," but Figure 2a does not show a statistically significant reduction in expression of SYFP (measured by "membrane-to-total signal ratio [RU]).

      The therapeutic index of the authors' systems would be better characterized by a functional payload, other than florescent proteins, that for example induce cell death, immune responses, etc.

      Regarding data presented in "Mechanism of action" (Figure 2), the observations are interesting and consistent across different fluorescent reporters. However, with regard to interpretations of the underlying molecular mechanisms, it is not clear whether the different output levels in 2b, 2c, and 2d are due to the pathway as described by the authors or simply from varied expression levels of RBDCRD-NarX itself (2a) that is nonlinearly amplified by the rest of the circuit. From a practical standpoint, this caveat is not critical with respect to the signal-to-noise ratios in later parts of the paper. From a mechanistic interpretation standpoint, claims made forth in this section are not clearly substantiated. Some additional controls would be nice. For example, if the authors express NarXs that constitutively dimerize on the membrane, what would the RasG12D-responsiveness look like? Does RasG12D alter the input-output curve of NarL-RE? How would Figure 4f compare to a NaxR constitutively dimerized control that only relies on transcriptional amplification of the Ras-dependent promoters? It's also possible that these Ras could affect protein production at the post-transcriptional or even post-translational levels, which were not adequately considered.

      The text claims that "in contrast to what we saw in HEK293 overexpressing RAS (Figure 5d), the "AND-gate" RAS-targeting circuits do not generate higher output than the EF1a-driven, binding-triggered RAS sensor in HCT-116. Instead, the improved dynamic range results from decreased leakiness in HCT- 116k.o." Comparing the experiment from Figure 5d, which looks at activation in KRASG12D and KRASWT, to the experiments in Figure 6b-d, which looks at activation in HCT-116WT and HCT-116KO is misleading. In Fig 5d., cells are transfected with KRASG12D and KRASWT to emulate high levels of mutant RAS and high levels of wild-type RAS. In Figures 6b-d, HCT-116WT has endogenous levels of mutant RAS, while the KCT-116KO is a knock-out cell line, and does not have mutant or WT RAS. Therefore, the improved dynamic range or "decreased leakiness in HCT-116KO" in comparison to Figure 5d. is more comparable to the NF1 condition from Figure 2, which deactivates endogenous RAS. While this may not be feasible, the most accurate comparison would have been an HCT-116KO line with KRASWT stably integrated.

      We couldn't locate the citation or discussion of Figure 4d in the text. Conversely, based on the text description, Figure 6g would contain exciting results. But we couldn't find Figure 6g anywhere ... unless it was a typo and the authors meant Figure 6f, in which case the cool results in Figure S8 could use more elaboration in the main text.

      Comments on revisions:

      Now that the authors have extensively addressed my comments through text and additional experiments, I am supportive of its conclusions. I thank them for the rigorous updates and congratulate them on an important piece demonstrating the potential of synthetic biology circuits.

    4. Reviewer #3 (Public review):

      Summary:

      Mutations that result in consistent RAS activation constitute a major driver of cancer. Therefore, RAS is a favorable target for cancer therapy. However, since normal RAS activity is essential for the function of normal cells, a mechanism that differentiates aberrant RAS activity from normal one is required to avoid severe adverse effects. To this end, the authors designed and optimized a synthetic gene circuit that is induced by active RAS-GTP. The circuit components, such as RAS-GTP sensors, dimerization domains, and linkers. To enhance the circuit selectivity and dynamic range, the authors designed a synthetic promoter comprised of MAPK-responsive elements to regulate the expression of the RAS sensors, thus generating a feed-forward loop regulating the circuit components. Circuit outputs with respect to circuit design modification were characterized in standard model cell lines using basal RAS activity, active RAS mutants, and RAS inactivation.

      This approach is interesting. The design is novel and could be implemented for other RAS-mediated applications. The data support the claims, and while this circuit may require further optimization for clinical application, it is an interesting proof of concept for targeting of aberrant RAS activity. I therefore recommend accepting this paper.

      Strengths:

      Novel circuit design, through optimization and characterization of the circuit components, solid data.

      Weaknesses:

      This manuscript could significantly benefit from testing the circuit performance in more realistic cell lines, such as patient-derived cells driven by RAS mutations, as well as in corresponding non-cancer cell lines with normal RAS activity. Furthermore, testing with therapeutic output proteins in vitro, and especially in vivo, would significantly strengthen the findings and claims.

      Summary:

      Given the revision made, I would recommend a minor revision that discusses the specificity limitations of this experimental setup.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The manuscript by Senn and colleagues presents a comprehensive study on the developing synthetic gene circuits targeting mutant RAS-expressing cells. This study aims to exploit these RAS-targeting circuits as cancer cell classifiers, enabling the selective expression of an output protein in correlation with RAS activity. The system is based on the bacterial two-component system NarX/NarL. A RAS-binding domain, the RBDCRD domain of the RAS effector protein CRAF, is fused to the histidine kinase domain, which carries an inactivating amino acid exchange either in its ATP-binding site (N509A) or in its phosphorylation site (H399Q). Dimerization or nanocluster formation of RAS-GTP reconstitutes an active histidine kinase sensor dimer that phosphorylates the response regulator NarL. The phosphorylated DNA-binding protein NarL, fused to the transcription activator domain VP48, binds its responsive element and induces the expression of the output protein. In comparison to mutated RAS, the effect of the RAS activator SOS-1 and the RAS inhibitor NF1 on the sensing ability as well as the tunability of the RAS sensor were examined. A RAS targeting circuit with an AND gate was designed by expressing the RAS sensor proteins under the control of defined MAPK response elements, resulting in a large increase in the dynamic range between mutant and wild-type RAS. Finally, the RAS targeting circuits were evaluated in detail in a set of twelve cancer cell lines expressing endogenous levels of mutant or wild-type RAS or oncogenes affecting RAS signaling upstream or downstream. 

      Strengths: 

      This proof-of-concept study convincingly demonstrates the potential of synthetic gene circuits to target oncogenic RAS in tumor cell lines and to function, at least in part, as an RAS mutant cell classifier. 

      Weaknesses: 

      The use of an appropriate "therapeutic gene" might revert the oncogenic properties of RAS mutant cell lines. However, a therapeutic strategy based on this four-plasmid-based system might be difficult to implement in RAS-driven solid cancers. 

      Thank you for the insightful comments. We agree that the delivery of a four-plasmid system represents a major challenge for translating RAS-targeting circuits into therapeutic applications. Reducing the number of plasmids –ideally consolidating all components onto a single vector– will be critical for clinical implementation.

      Viral delivery is generally the most efficient strategy for DNA-based therapies, but viral vectors have limited packaging capacities, which differ by virus type[1]. The RAS_sensor_F.L.T. circuit under the EF1α promoter requires ~7.7 kb for the sensing components alone, excluding the output gene. This exceeds the packaging limit of adeno-associated virus (AAV) and is at the upper boundary for lentiviral vectors but could potentially be accommodated by larger vectors such as γ-retroviruses, poxviruses, or herpesviruses¹. Co-transduction with dual AAVs [2] or ongoing engineering to expand packaging capacity [3] may also offer future solutions. An additional route to reduce construct size could be alternative splicing, especially given redundancy between the two NarX fusion proteins[4]. 

      An advantage of our current architecture is that synthetic response elements replace constitutive promoters, reducing construct size. For example, the MAPK-driven PY2_NarX&NarL circuits range between 4.9 and 5.2 kb depending on the transactivation domain, bringing them within AAV packaging limits for the sensor module[5], though co-delivery of the output gene would still be necessary. For lentiviruses, this is within the packaging capacity of 8 kb<sup>1</sup> and would allow for inclusion of ~3 kb output genes.

      Still, assembling multiple modules onto a single vector introduces new challenges, including possible crosstalk or interference between neighboring promoters [6]. For example, placing the output gene too close to MAPK response elements may trigger unwanted MAPKdependent expression, potentially bypassing the intended AND-gate logic. Moreover, expressing three genes under separate response elements may shift expression ratios and reduce circuit functionality. Nonetheless, the absence of constitutive promoters and the RAS-dependence of MAPK response elements could provide partial robustness, since even unintended activation would still reflect RAS signaling to some extent. Further, our data (Fig. 1d) show that some deviation in component levels can be tolerated, provided all parts are sufficiently expressed. Nonetheless, assembling the circuit on a single vector will require careful design and rigorous validation to ensure optimal performance. 

      While addressing this is beyond the scope of the current study, we agree that future efforts should focus on vector consolidation and delivery strategies. We now include a paragraph discussing these challenges in the revised manuscript.

      Reviewer #2 (Public review): 

      The manuscript describes an interesting approach towards designing genetic circuits to sense different RAS mutants in the context of cancer therapeutics. The authors created sensors for mutant RAS and incorporated feed-forward control that leverages endogenous RAS/MAPK signaling pathways in order to dramatically increase the circuits' dynamic range. The modularity of the system is explored through the individual screening of several RAS binding domains, transmembrane domains, and MAPK response elements, and the author further extensively screened different combinations of circuit components. This is an impressive synthetic biology demonstration that took it all the way to cancer cell lines. However, given the sole demonstrated output in the form of fluorescent proteins, the authors' claims related to therapeutic implications require additional empirical evidence or, otherwise, expository revision. 

      Thank you very much for the thoughtful evaluation, precise critique, and constructive suggestions.

      As correctly noted, our study initially focused on developing and optimizing input sensors and processing units for synthetic gene circuits targeting mutated RAS. To address the concern regarding therapeutic relevance, we have now incorporated functional validation using a clinically relevant output protein: herpes simplex virus thymidine kinase (HSV-TK), which converts ganciclovir into a cytotoxic compound. We replaced the mCerulean reporter with HSV-TK and tested the resulting RAS-targeting circuits in both RAS-mutant and wild-type cancer cell lines. The results, now presented in a new chapter (Figure 8 and Supplementary Fig. 14), demonstrate robust killing of RAS-mutant cells and support the potential therapeutic utility of these circuits.

      Major comments: 

      "These therapies are limited to cancers with KRASG12C mutations" is technically accurate. However, in this fast-moving field, there are examples such as MRTX1133 which holds the promise to target the very G12D mutation that is the focus of this paper. There are broader efforts too. It would help the readers better appreciate the background if the authors could update the intro to reflect the most recent landscape of RAS-targeting drugs. 

      Thank you for this helpful suggestion. We have updated the introduction to reflect the rapidly evolving landscape of RAS-targeting therapies, including the development of inhibitors for nonG12C mutations such as KRASG12D (e.g., MRTX1133). Given the pace and breadth of these advances, we also refer readers to a recent comprehensive review that provides an in-depth overview of current RAS-targeting strategies.

      Only KRASG12D was used as a model in the design and optimization work of the genetic circuits. Other mutations should be quite experimentally feasible and comparisons of the circuits' performances across different KRAS mutations would allow for stronger claims on the circuits' generalizability. Particularly, the cancer cell line used for circuit validation harbored a KRASG13D mutation. While the data presented do indeed support the circuit's "generalizability," the model systems would not have been consistent in the current set of data presented. 

      To further support the generalizability of our RAS sensor, we titrated plasmid doses for a panel of oncogenic RAS variants, including multiple KRAS mutants as well as HRAS<sup>G12D</sup and NRAS<sup>G12D</sup. Across all tested variants, we observed concentration-dependent activation of the RAS sensor. At 1.67 ng/well, the sensor output for all oncogenic RAS variants was at least as high as that for KRAS<sup>G12D</sup>, suggesting that the behavior observed in our initial design and optimization is representative of a broader set of RAS mutations.

      We also noted that high overexpression of wildtype HRAS and NRAS can lead to substantial activation of the sensor, exceeding that observed with wildtype KRAS. This underscores the importance of considering all RAS isoforms when assessing circuit specificity and avoiding potential off-target activation in healthy cells.

      In Figure 2a, the text claims that "inactivation of endogenous RAS with NF1 resulted in a lower YFP/RBDCRD-NarX expression," but Figure 2a does not show a statistically significant reduction in expression of SYFP (measured by "membrane-to-total signal ratio [RU]). 

      Thank you for pointing this out. We repeated the experiment to reassess the effect of NF1 on RBDCRD-NarX-SYFP2 expression and were able to confirm statistical significance. Accordingly, we have replaced Figure 2a with updated data. To facilitate better visual comparison across conditions, we also standardized the y-axis range across all relevant flow cytometry plots.

      The therapeutic index of the authors' systems would be better characterized by a functional payload, other than florescent proteins, that for example induce cell death, immune responses, etc. 

      Thank you for this insightful comment. We agree that fluorescent reporters are limited to approximating expression levels, and that a functional output protein is more appropriate for assessing therapeutic potential. To address this, we replaced mCerulean with the therapeutic suicide-gene, HSV-TK, and tested the circuits in RAS-mutant and wild-type cancer cell lines. These experiments demonstrate that our circuits can express functional proteins and induce cell death in two RAS-mutant cell lines while showing low toxicity in a RAS wild type cell line (new chapter including Fig. 8 and Supplementary Fig.14). 

      Comparing confluence of cells transfected with the RAS-targeting circuits to cells transfected with non-toxic GFP-output negative control or the constitutively expressed EF1αHSV-TK positive control allowed us to estimate the killing-strength of the circuits in each cell line. In RAS-mutant HCT-116 the confluence curves were similar to the positive control, indicating effective killing (Fig. 8b). At lower DNA dose in HCT-116, or in SW620 with lower transfection efficiency, the killing of transfected RAS-driven cancer cells was less pronounced, falling approximately midway between the controls (Fig. 8g&j). In the RAS wild type cell line, Igrov-1, the RAS circuits showed continued growth similar to the non-toxic negative control (Fig. 8d), suggesting low toxicity. 

      While this may indicate low circuit activation in Igrov-1, an alternative explanation for the low toxicity could also be insufficient transfection efficiency. Testing in SW620 –which had similar transfection efficiency as Igrov-1 (Supplementary Fig. 14a)– showed that this moderate transfection efficiency was sufficient for RAS-circuit-dependent killing (Fig. 8d & 8g), supporting the notion of low activation in Igrov-1 and selective cytotoxicity in RAS-driven cancer cells.

      Nonetheless, it is important to note that comparisons between the cell lines need to be interpreted cautiously because of inter-cell line differences in transfection, growth, and HSV-TK/ganciclovir (GCV)-sensitivity (Supplementary Fig. 14) and further validation will be essential. 

      A conclusive assessment will require more efficient delivery strategies, such as viral vectors (as discussed above). Efficient delivery would allow to investigate selectivity in a more realistic setting with patient-derived RAS-mutant cancer and healthy cells as well as testing in an vivo model. While beyond the scope of the current study, we view it as a critical direction for future work and have therefore added a paragraph about this to our discussion.

      Regarding data presented in "Mechanism of action" (Figure 2), the observations are interesting and consistent across different fluorescent reporters. However, with regard to interpretations of the underlying molecular mechanisms, it is not clear whether the different output levels in 2b, 2c, and 2d are due to the pathway as described by the authors or simply from varied expression levels of RBDCRD-NarX itself (2a) that is nonlinearly amplified by the rest of the circuit. From a practical standpoint, this caveat is not critical with respect to the signal-to-noise ratios in later parts of the paper. From a mechanistic interpretation standpoint, claims made forth in this section are not clearly substantiated. Some additional controls would be nice. For example, if the authors express NarXs that constitutively dimerize on the membrane, what would the RasG12Dresponsiveness look like? Does RasG12D alter the input-output curve of NarL-RE? How would Figure 4f compare to a NaxR constitutively dimerized control that only relies on transcriptional amplification of the Ras-dependent promoters? 

      This is a great point. We agree that the observed differences in output levels (Fig. 2) could arise from non-linear amplification due to increased expression of RBDCRD-NarX, rather than RAS binding or dimerization alone. To further investigate this possibility, we performed titrations of KRAS<sup>G12D</sup> in combination with the functional RAS sensor and a series of constitutively active and inactive control constructs (Supplementary Fig. 4).

      Inactive controls lacking NarX dimerization showed only a modest increase in output expression, similar to direct mCerulean expression under the EF1α promoter. Transfection of the output plasmid alone, with NarL, or with NarL and non-RAS-binding RBD<sup>R89L</sup> CRD<sup>C168S</sup> -NarX, resulted in minimal RAS-dependent increases (Supplementary Fig. 4a). Importantly, after normalization using the EF1α-driven mCherry transfection control, these effects were fully or even slightly over-compensated (Supplementary Fig. 4b), showing that we don’t include the effect of EF1α-dependent increased leakiness in the data presented throughout the manuscript, but also that –due to the normalization– we potentially underestimate the dynamic range of the RAS-targeting circuits.

      In contrast, constitutively dimerizing NarX controls (both membrane-bound and cytosolic dimerized via the FKBP–FRB system) exhibited a more pronounced RAS-dependent increase in output –even after normalization– confirming the presence of non-linear amplification (up to 3–4fold). However, this effect was still lower than that achieved with the functional RAS-binding sensor (8-fold at 1.67 ng/well KRAS<sup>G12D</sup>; 14-fold at 5–15 ng/well), indicating that the increase in expression of the sensor parts is not the full explanation of the effect we see. Instead, RAS binding and dimerization further amplify the response and are necessary for full activation (Supplementary Fig. 4b).

      We also addressed the reviewer’s suggestion by testing the MAPK response elements used in Fig. 4f with constitutively dimerizing NarX. These controls generally showed lower fold changes between KRAS<sup>G12D</sup>; and KRAS<sup>WT</sup> than the corresponding RAS-binding circuits  (Supplementary Fig. 7), with one exception: the combination of SRE_NarX and PY2_NarL-VP48. 

      Together, these data show that non-linear amplification via increased expression and dimerization contributes to output activation. However, RAS binding and induced dimerization of the NarX sensor are required for full functionality and enhanced signal strength. This underscores that integrating the MAPK response elements with the binding-based RAS sensor into RAS-targeting circuits generally improves the distinction between cells with KRAS<sup>G12D</sup>;  and KRAS<sup>WT</sup> and that it was the combination that allowed to reach maximal fold changes.

      It's also possible that these Ras could affect protein production at the post-transcriptional or even post-translational levels, which were not adequately considered. 

      Thank you for this comment. We now mention in the manuscript the potential mechanisms by which (over-)activated RAS or MAPK signaling can increase protein synthesis. We cite relevant reports of the mechanisms we found, including upregulation of translational initiation and machinery[10]  and ribosomal biogenesis[11].

      The text claims that "in contrast to what we saw in HEK293 overexpressing RAS (Figure 5d), the "AND-gate" RAS-targeting circuits do not generate higher output than the EF1a-driven, bindingtriggered RAS sensor in HCT-116. Instead, the improved dynamic range results from decreased leakiness in HCT- 116k.o." Comparing the experiment from Figure 5d, which looks at activation in KRASG12D and KRASWT, to the experiments in Figure 6b-d, which looks at activation in HCT-116WT and HCT-116KO is misleading. In Fig 5d., cells are transfected with KRASG12D and KRASWT to emulate high levels of mutant RAS and high levels of wild-type RAS. In Figures 6b-d, HCT-116WT has endogenous levels of mutant RAS, while the KCT-116KO is a knock-out cell line, and does not have mutant or WT RAS. Therefore, the improved dynamic range or "decreased leakiness in HCT-116KO" in comparison to Figure 5d. is more comparable to the NF1 condition from Figure 2, which deactivates endogenous RAS. While this may not be feasible, the most accurate comparison would have been an HCT-116KO line with KRASWT stably integrated. 

      Thank you for this input. We understand that comparing the results from HEK293 cells transfected with KRAS<sup>G12D</sup>;  or KRAS<sup>WT</sup> (Fig. 5d) to those from HCT-116<sup>WT</sup>    and HCT-116<sup>k.o</sup>. cells (Fig. 6b–d) may be misleading if interpreted as a direct comparison of RAS signaling levels. Our intent was not to compare HEK293 with KRAS<sup>WT</sup> directly to HCT-116<sup>k.o</sup>.., but rather to contrast the behavior of the EF1α-driven RAS sensor and the MAPK-responsive RAS-targeting circuits within each cell line context.

      Specifically, we observed that in HEK293 cells expressing KRAS<sup>G12D</sup>, the MAPK-based RAS-targeting circuits produced higher output than the EF1α-expressed RAS sensor. In contrast, in HCT-116<sup>WT</sup> cells, the EF1α-expressed RAS sensor resulted in higher output levels than the RAS-targeting circuits. Despite this, the MAPK-driven circuits showed an improved dynamic range compared to the EF1α-expressed RAS sensor in HCT-116, due to the reduced background expression in the HCT-116<sup>k.o</sup>.. cells. We have revised the manuscript text to clarify this distinction.

      We agree that an HCT-116<sup>k.o</sup> cell line with stable integration of KRAS<sup>WT</sup> would provide a more direct comparison. Nonetheless, HCT-116<sup>k.o</sup>.. cells still express endogenous NRAS and HRAS, both of which are capable of activating the RAS sensor (as shown in Fig. 1g). Therefore, we believe that HCT-116<sup>k.o</sup>. cells are more comparable to HEK293 with KRAS<sup>WT</sup> than to the NF1 condition in Fig. 2, in which all endogenous RAS isoforms are inactivated.

      We couldn't locate the citation or discussion of Figure 4d in the text. Conversely, based on the text description, Figure 6g would contain exciting results. But we couldn't find Figure 6g anywhere ... unless it was a typo and the authors meant Figure 6f, in which case the cool results in Figure S8 could use more elaboration in the main text. 

      Thank you for this helpful observation. The figure references were indeed incorrect due to a typo. The results discussed in the text refer to Figure 6f (not 6g), which is now Figure 7a in the revised version. To further highlight these findings, we have added a new Figure 7b that better illustrates how different MAPK response elements enabled us to identify, for each RAS-mutant cell line, a RAS-targeting circuit that showed stronger activation than in all RAS wild-type lines. We have also expanded the corresponding section in the main text to elaborate on these results and their significance.

      Reviewer #3 (Public review): 

      Summary: 

      Mutations that result in consistent RAS activation constitute a major driver of cancer. Therefore, RAS is a favorable target for cancer therapy. However, since normal RAS activity is essential for the function of normal cells, a mechanism that differentiates aberrant RAS activity from normal one is required to avoid severe adverse effects. To this end, the authors designed and optimized a synthetic gene circuit that is induced by active RAS-GTP. The circuit components, such as RAS-GTP sensors, dimerization domains, and linkers. To enhance the circuit selectivity and dynamic range, the authors designed a synthetic promoter comprised of MAPK-responsive elements to regulate the expression of the RAS sensors, thus generating a feed-forward loop regulating the circuit components. Circuit outputs with respect to circuit design modification were characterized in standard model cell lines using basal RAS activity, active RAS mutants, and RAS inactivation. 

      This approach is interesting. The design is novel and could be implemented for other RASmediated applications. The data support the claims, and while this circuit may require further optimization for clinical application, it is an interesting proof of concept for targeting aberrant RAS activity. 

      Strengths: 

      Novel circuit design, through optimization and characterization of the circuit components, solid data. 

      Weaknesses: 

      This manuscript could significantly benefit from testing the circuit performance in more realistic cell lines, such as patient-derived cells driven by RAS mutations, as well as in corresponding non-cancer cell lines with normal RAS activity. Furthermore, testing with therapeutic output proteins in vitro, and especially in vivo, would significantly strengthen the findings and claims. 

      Thank you very much for the thoughtful and supportive comments. We fully agree with the reviewer’s suggestions for improving the translational potential of the RAS-targeting circuits.

      As a first step toward therapeutic relevance, we replaced the fluorescent reporter with HSV-TK, a clinically validated suicide gene, and demonstrated killing in RAS-mutant cancer cell lines. This is described above and in the new section of the manuscript (Figure 8).

      We also agree that testing in patient-derived cancer cells and especially healthy cells with wild-type RAS activity will be essential. However, testing in primary or patient-derived cells presents delivery challenges: transient transfection of our current four-plasmid system is unlikely to achieve sufficient expression. As discussed in our response to Reviewer #1, development of a more efficient delivery strategy –such as viral vector-based delivery– is a necessary next step.

      Once a delivery system is established, identifying relevant off-target tissues throughout the body with high physiological RAS signaling will be key to assessing selectivity. While comparative data on RAS activation across healthy tissues are scarce[12,13], recent atlases of transcription factor activity[14,15] provide insights to identify off-target cells with high activation of RAS-dependent transcription factors and may even approximate RAS activity across healthy tissue. Alternatively, our single-input sensors for RAS and MAPK pathway activity could be used in vivo to identify off-target cells based on endogenous activity.

      Once relevant target and off-target cells have been identified, patient-derived cancer and healthy cells can help select and adapt cancer-specific RAS-targeting circuits and nominate therapeutic candidates for further safety and efficacy assessment[6,8].

      Reviewer #1 (Recommendations for the authors): 

      For the most part, the data in this study are very convincing and very well presented. The cartoons make it easier to understand the complex experimental setups. 

      (1) Did the authors use wild-type Sos-1 or a constitutively active membrane-bound catalytic domain in their studies? How is SOS-1 activated when in case Sos-1 wild-type was used? 

      Thank you for this feedback. We used the constitutively active catalytic domain of Sos-1 (AA5641049; PDB ID 2II0). 

      (2) Figure 1f: In case of KRAS-G12D, it looks like the output expression does not really correlate with the RAS-GTP level. Can the authors give an explanation? 

      Thank you for this interesting question. We believe the observed discrepancy arises primarily from differences in the sensitivity and readout dynamics of the two assays. The RAS-GTP pulldown ELISA appears insufficiently sensitive to detect small changes in RAS-GTP levels at lower KRAS<sup>G12D</sup> plasmid doses (0.19, 0.56, or 1.67 ng). Only at 5 ng and 15 ng do we observe clear increases in RAS-GTP signal (25% and 700%, respectively). In contrast, the RAS sensor shows strong activation already in the 0.56–5 ng range but begins to saturate at higher doses (see Figure 1f and Figure 1e).

      Beyond the differing technical sensitivities of the ELISA (plate reader) and flow cytometry, an important conceptual distinction may further explain this behavior: the RAS sensor likely integrates RAS signaling over time. Once NarX binds RAS-GTP and dimerizes, it activates NarL, triggering mCerulean expression. If the rate of mCerulean production exceeds its degradation, signal accumulates throughout the assay duration. Thus, the flow cytometry readout reflects time-integrated signaling, allowing small differences in RAS-GTP to be amplified into measurable differences in output—especially at low input levels. This may explain why flow cytometry detects circuit activation earlier and more steeply than the pulldown assay, which provides a snapshot of RAS-GTP abundance at a single time point and saturates less readily at high input levels.

      Together, these factors likely explain the observed differences in signal dynamics: the RAS sensor exhibits steep activation followed by saturation at high plasmid doses (flow cytometry), while the ELISA shows limited sensitivity at low doses but a broader linear range at higher doses.

      (3) Figure 2b: It appears that even in the case of KRAS-G12D and Sos-1, only a few cells are positive. Does this result depend on low cell density, low transfection efficiency, or a wide range of the expression level? As a control, nuclear staining could be shown. 

      Thank you for this question. In the experiment shown in Figure 2b, our goal was to assess the membrane localization of the RBD^CRD-NarX-SYFP2 construct, which serves as a proxy for RAS-bound sensor. To enable accurate computational segmentation and separation of membrane signal from adjacent cells, we intentionally reseeded cells at low density in glassbottom plates for confocal imaging.

      The observed variability in signal likely reflects a combination of transient transfection and heterogeneous expression levels. While the overall transfection efficiency was approximately 70%, expression varied between individual cells. To account for this, we analyzed the membrane-to-total signal ratio per cell, which internally normalizes the membrane signal to the total cellular expression of SYFP2 and controls for differences in transfection efficiency.

      In response to the reviewer’s suggestion, we have updated the figure to include nuclear staining to aid interpretation. We would like to emphasize, however, that the images are intended to illustrate subcellular localization per cell, not expression frequency or intensity across the population.

      Minor points 

      (1) Figure 1b: "The third plasmid expresses NarL, .." should be changed to "The third plasmid expresses NarL-VP48, .." 

      Done

      (2) Figure 1c, right part: The orange arrow should be labeled NarX-H399Q (not N509A). 

      Done

      (3) Supplementary Table 6 and 7: [cells/wells] - should probably be [cells 10*3/well]. 

      Thank you for these points, we updated the manuscript accordingly

      Reviewer #2 (Recommendations for the authors): 

      Minor comments: 

      (1) N509A seems mislabeled in Figure 1b. 

      (2) It would help the readers if the authors could elaborate a bit on what is known about the RBD and CRD mutations used here. 

      Thank you for the input, we added a paragraph in the paper to expand on the effect of these commonly used mutations.

      (3) The KRASWT&Sos1 condition is not explained within the text for Figure 1f, which is the first figure with the KRASWT&Sos1 condition, but rather later on for Figure 2a. Adding a description of this condition to the discussion of Figure 1f would add clarity to this figure. 

      Thank you, we corrected this.

      (4) Citing AlphaFold2 structural predictions as having "revealed that longer linkers between the sensor's RBDCRD and NarX-derived domains could bring the NarX domains into closer proximity" is probably an overstatement. AlphaFold2 generally has low confidence in the placement of long flexible linkers, and the longer linkers in the illustration could facilitate NarX and NarL being even farther apart than they are in the original design. 

      Thank you for this input. We agree that AlphaFold2 predictions generally have low confidence in the placement of long, flexible linkers, and we did not intend to imply that the structural models were predictive of actual linker conformations. Rather, the models were used heuristically to generate the hypothesis that longer linkers might facilitate better positioning of the NarX domains for dimerization.

      As described in the Methods, we manually rotated the flexible linker regions to explore plausible conformations. These exploratory models showed that with a short (1x GGGGS) linker, it was more challenging to bring the NarX domains into close proximity, whereas longer linkers allowed greater positional flexibility. This modeling exercise provided a structural rationale for experimentally testing longer linkers. We have revised the manuscript text to clarify that the structural predictions were used to motivate linker design –not to validate or predict structural outcomes.

      (5) Figure 3b shows that the fold change (KRASG12D/KRASWT) is higher at shorter linker lengths and lower at longer linker lengths, and that the output expression of mCerulean is lower at shorter linker lengths and higher at longer linker lengths. Having a bar plot with the output expression mCerulean levels comparing KRASG12D and KRASWT next to each other would be a significantly more informative representation of this data. In particular, the readers might be interested in understanding the effect of linker length on off-target activation from the sensor, which is not clear from this figure. 

      Thank you for the suggestion. We adapted Figure 3b to better present this. 

      (6) While it is implied that the sentence "Among the tested binding domains, the Ras association domain (RA) of the natural RAS effector Rassf5, the RAS association domain 2 (RA2) of the phospholipase C epsilon (PLCe)33, and the synthetic RAS binder K5534 showed a slightly higher or similar dynamic range." is comparing these RAS binding domains to RBDCRD, for clarity it should be noted what the point of reference is for this "slightly higher or similar dynamic range." 

      (7) Claims are made throughout the text that require supporting data, and thus require a reference to a figure, but there are a few instances where the reference is several sentences after the discussion of data and findings begins. For example, the discussion of Figure 3c begins with the claim "Among the tested binding domains, the Ras association domain (RA) of the natural RAS effector Rassf5, the RAS association domain 2 (RA2) of the phospholipase C epsilon (PLCe)33, and the synthetic RAS binder K5534 showed a slightly higher or similar dynamic range," but there is no reference to the data or figure being discussed until the end of the discussion of Figure 3c. This formatting is also present in Figure 3d and Figure 6f. 

      Thank you for mentioning these imprecisions and inconsistencies, we addressed them in the manuscript. 

      (8) In Figures 5d and 5e, the formatting of underscores and dashes is occasionally inconsistent within the text. (ex. "PY2_NarX_FLT or PY2_NarL-FLT" on page 13.). 

      Thank you for this precise observation. The formatting differences were intentional and reflect distinct design principles. Specifically:

      An underscore (e.g., PY2_NarX_FLT) denotes that two separate proteins are expressed –here, PY2-driven RBDCRD-NarX and EF1α-driven NarL-F.L.T.

      A dash (e.g., PY2_NarL-F.L.T.) indicates a fusion protein –i.e., PY2-driven NarL-F.L.T. combined with EF1α-driven RBDCRD-NarX.

      This notation is used to distinguish expression sources and fusion constructs while avoiding redundancy with the base circuit (EF1α_NarX + EF1α_NarL-VP48). We hope the included schematic diagrams in each relevant figure helps the reader interpret these combinations.

      (9) The text claims that "loss-of-function mutations in RBDCRD decreased activation. However, the dynamic range was only 3-fold" and attributes this claim to Figure 6a. For a claim about specific fold-change activation, one would expect a corresponding figure with quantitative measurements of this fluorescence to be referenced. 

      Thank you for this remark. We made a supplementary figure (Supplementary Fig. 11) to show the quantitative measurement of the 3-fold dynamic range between HCT-116<sup>WT</sup> and HCT-116<sup>k.o</sup>. when using the EF1a-expressed RAS sensor with NarL-VP48.

      (10) The claim of this Figure 2d is that the effect of RAS-GTP levels on mCerulean output is amplified in comparison to Figures 2a, 2b, and 3c, representing expression, RAS binding, and dimerization respectively. While visually this might be true from the figure, the readers might be confused by the lack of significance between the control and the NF1 condition, alongside the variation between the triplicates. Could this experiment be repeated to gain clearer data and to support their claim more effectively? 

      Thank you for this important observation. To address the concern regarding variability and statistical significance in Figure 2d, we repeated the experiment using 24-well plates to increase the number of cells analyzed per condition. This improved the consistency of the data and allowed us to reduce variability across replicates. As a result, we now observe a statistically significant difference between the control and the NF1 condition. The updated results are shown in the revised Figure 2.

      (11) The readers might be less familiar with the concept of "composability" than "modularity" and it would be good to explain it if the authors did intend to use the former. 

      Thank you for this comment. We changed it to modularity to avoid confusion. 

      References

      (1) Shahryari, A., Burtscher, I., Nazari, Z. & Lickert, H. Engineering Gene Therapy: Advances and Barriers. Advanced Therapeutics vol. 4 Preprint at https://doi.org/10.1002/adtp.202100040 (2021).

      (2) Mcclements, M. E. & Maclaren, R. E. Adeno-Associated Virus (AAV) Dual Vector Strategies for Gene Therapy Encoding Large Transgenes. YALE JOURNAL OF BIOLOGY AND MEDICINE vol. 90 (2017).

      (3) Wagner, H. J., Weber, W. & Fussenegger, M. Synthetic Biology: Emerging Concepts to Design and Advance Adeno-Associated Viral Vectors for Gene Therapy. Advanced Science vol. 8 Preprint at https://doi.org/10.1002/advs.202004018 (2021).

      (4) Doshi, J., Willis, K., Madurga, A., Stelzer, C. & Benenson, Y. Multiple Alternative Promoters and Alternative Splicing Enable Universal Transcription-Based Logic Computation in Mammalian Cells. Cell Rep 33, 108437 (2020).

      (5) Wu, Z., Yang, H. & Colosi, P. Effect of genome size on AAV vector packaging. Molecular Therapy 18, 80–86 (2010).

      (6) Dastor, M. et al. A Workflow for in Vivo Evaluation of Candidate Inputs and Outputs for Cell Classifier Gene Circuits. ACS Synth Biol 7, 474–489 (2018).

      (7) Preuß, E. et al. TK.007: A novel, codon-optimized HSVtk(A168H) mutant for suicide gene therapy. Hum Gene Ther 21, 929–941 (2010).

      (8) Angelici, B., Shen, L., Schreiber, J., Abraham, A. & Benenson, Y. An AAV gene therapy computes over multiple cellular inputs to enable precise targeting of multifocal hepatocellular carcinoma in mice. Sci Transl Med 13, (2021).

      (9) Mesnil, M. & Yamasaki, H. Bystander Effect in Herpes Simplex Virus-Thymidine Kinase/Ganciclovir Cancer Gene Therapy: Role of Gap-Junctional Intercellular Communication 1. CANCER RESEARCH vol. 60 http://aacrjournals.org/cancerres/articlepdf/60/15/3989/2478218/ch150003989.pdf (2000).

      (10) Proud, C. G. Ras, PI3-kinase and mTOR signaling in cardiac hypertrophy. Cardiovascular Research vol. 63 403–413 Preprint at https://doi.org/10.1016/j.cardiores.2004.02.003 (2004).

      (11) Azman, M. S. et al. An ERK1/2driven RNAbinding switch in nucleolin drives ribosome biogenesis and pancreatic tumorigenesis downstream of RAS oncogene. EMBO J 42, (2023).

      (12) von Lintig, F. C. et al. Ras activation in normal white blood cells and childhood acute lymphoblastic leukemia. Clin Cancer Res 6, 1804–10 (2000).

      (13) Guha, A., Feldkamp, M. M., Lau, N., Boss, G. & Pawson, A. Proliferation of human malignant astrocytomas is dependent on Ras activation. Oncogene 15, 2755–2765 (1997).

      (14) Pan, L. et al. HTCA: a database with an in-depth characterization of the single-cell human transcriptome. Nucleic Acids Res 51, D1019–D1028 (2023).

      (15) Pan, L. et al. Single Cell Atlas: a single-cell multi-omics human cell encyclopedia. Genome Biol 25, (2024).

    1. eLife Assessment

      This study constitutes a fundamental advance for the uveal melanoma research field that might be exploited to target this deadly cancer and, more generally, for targeting transcriptional dependency in cancers. This work substantially advances our understanding of pharmacological inhibition of SWI/SNF as a therapeutic approach for cancer. The study is well written and provides compelling evidence, including comprehensive datasets, compound screens, gene expression analysis, epigenetics, as well as animal studies.

    2. Reviewer #1 (Public review):

      Summary:

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well written and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist.

      Strengths:

      This is a comprehensive and well-written study.

      Weaknesses:

      There are minimal weaknesses.

    3. Reviewer #2 (Public review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively, the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study, including the strong challenge of the on-target effect, the assays used and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

      Weaknesses:

      The authors have addressed weaknesses in the revised version.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity and have pronounced effects on uveal melanoma cell proliferation. They induce apoptosis and suppress tumor growth, with no toxicity in vivo. The report provides biological significance by demonstrating that the drugs alter chromatin accessibility at lineage specific gene enhancer regions and decrease expression of lineage specific genes, including SOX10 and SOX10 target genes.

      Strengths:

      The study provides compelling evidence for the therapeutic use of these compounds and does a thorough job at elucidating the mechanisms by which the drugs work. The study will likely have a high impact on the chromatin remodeling and cancer fields. The datasets will be highly useful to these communities.

      Weaknesses:

      The authors have addressed all my concerns.

    5. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 1:

      While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition.

      We thank the reviewer for bringing this to our attention. As the reviewer mentioned, we show 92-1 CDX model in our manuscript. Additionally, strong tumor growth inhibition in MP-46  CDX model treated with our BAF ATPase inhibitor can be found in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      Reviewer 3:<br /> Supplementary Figure 2C<br /> Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation.

      We thank the reviewer for bringing this to our attention. We updated the figure legend accordingly to reflect the genotype of the mutations highlighted in the table.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well-written, and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist.

      Strengths:

      This is a comprehensive and well-written study.

      Weaknesses:

      There are minimal weaknesses.

      We thank the reviewer for the positive comments.

      Reviewer #2 (Public Review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth-inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with the loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study including the strong challenge of the on-target effect, the assays used, and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

      Weaknesses:

      The authors introduce the field stating that SMARCA4 inhibitors are more effective in SMARCA2 deficient cancers and the converse. Since the desirable outcome of cancer therapy would be synthetic lethality it is not clear why a dual inhibitor is desirable. Wouldn't this be associated with more side effects? It is not known how the inhibitor developed here impacts normal cells, in particular T cells which are essential for any durable response to cancer therapies in patients. Another weakness is that the UM cell lines used do not molecularly resemble metastatic UM. These UM most frequently have mutations in the BAP1 tumor suppressor gene. It is not clear if the described SMARCA2/4 inhibitor is efficacious in BAP1 mutant UM cell lines in vitro or BAP1 mutant patient-derived xenografts in vivo.

      We thank the reviewer for their insightful and constructive comments. As we demonstrate in Fig. 1d, uveal melanoma cells are selectively and deeply sensitive to BAF ATPase inhibition, and provides a therapeutic window. This is confirmed in Fig. 4a-c, as we demonstrated robust tumor growth inhibition, achieved at a dose well-tolerated in xenograft study. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017) and manuscript describing results of this clinical trial is currently in preparation.

      As the reviewer mentioned, BAP1 loss is a signature of metastatic uveal melanoma. MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity that work through a different mode as previously developed SMARCA4/SMARCA2 inhibitors. They also demonstrate the anti-tumor effects of the compounds on uveal melanoma cell proliferation and tumor growth. The findings indicate that the drugs exert their effects by altering chromatin accessibility at binding sites for lineage-specific transcription factors within gene enhancer regions. In uveal melanoma, altered expression of the transcription factor, SOX10, and SOX10 target gene underlies the anti-proliferative effects of the compounds. This study is significant because the discovery of new SMARCA4/SMARCA2 inhibitory compounds that can abrogate uveal melanoma tumorigenicity has therapeutic value. In addition, the findings provide evidence for the therapeutic use of these compounds in other transcription factor-dependent cancers.

      Strengths:

      The strengths of this manuscript include biochemical evidence that the new compounds are selective for SMARCA4/SMARCA2 over other ATPases and that the mode of action is distinct from a previously developed compound, BRM014, which binds the RecA lobe of SMARCA2. There is also strong evidence that FHT1015 suppresses uveal melanoma proliferation by inducing apoptosis. The in vivo suppression of tumor growth without toxicity validates the potential therapeutic utility of one of the new drugs. The conclusion that FHT1015 primarily inhibits SMARCA4 activity and thereby suppresses chromatin accessibility at lineage-specific enhancers is substantiated by ATAC-seq and ChIP-seq studies.

      Weaknesses:

      The weaknesses include a lack of more precise information on which SMARCA4/SMARCA2 residues the drugs bind. Although the I1173M/I1143M mutations are evidence that the critical residues for binding reside outside the RecA lobe, this site is conserved in CHD4, which is not affected by the compounds. Hence, this site may be necessary but not sufficient for drug binding or specifying selectivity. A more precise evaluation of the region specifying the effect of the new compounds would strengthen the evidence that they work through a novel mode and that they are selective. Another concern is that the mechanisms by which FHT1015 promotes apoptosis rather than simply cell cycle arrest are not clear. Does SOX10 or another lineage-specific transcription factor underlie the apoptotic effects of the compounds?

      We thank the reviewer for the valuable comments.

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      The reviewer also poses a great question regarding the mechanism of apoptosis. The mechanism of apoptosis is extremely complex, but we observed a decrease in pro-survival BCL-2 protein expression in response to FHT-1015, in the experiment corresponding to Supplementary Fig. 5e. In the experiment described in Fig. 3k, we also monitored caspase 3/7 activity over time, and SOX10 overexpression rescued 92-1 cells from FHT-1015 induced apoptosis. This suggests the role of SOX10 as an important mediator of response to BAF ATPase inhibition, including apoptosis induced by FHT-1015.

      Additional Reviews:

      The referees would like to draw the authors' attention to the following issues that would best benefit from additional revision. 

      The clinical relevance of the study would be strengthened by the use of uveal melanoma cell lines with BAP1 mutations that better represent metastatic uveal melanoma. The use of patient-derived xenografts would also be pertinent and would be a useful addition. Similarly, attention to the effects of the inhibitor on non-cancerous proliferative cells such as blood/T/immune cells would also strengthen the manuscript. As the study reports the administration of one of the inhibitors in mice for the xenograft experiments, it would be important to assess any potential effects on blood cell counts and better discuss the eventual toxicity or lack of toxicity and how it was assessed. 

      The authors should better explain how SOX10 over expression can rescue viability in the presence of the inhibitor. Similarly given the critical roles of BRG1, SOX10, and MITF in cutaneous melanoma some specific discussion on the sensitivity of cutaneous melanoma cells to the inhibitor should be considered, and potential differences with uveal melanoma highlighted. 

      Aside from these issues, the authors are urged to consider the other points mentioned below. 

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1d, as well as the text in the manuscript referring to this figure, would benefit from indicating specific cell lines used for UM. The same for the sentence in line 153. 

      We thank the reviewer for bringing this to our attention. We have added the cell line names and updated the manuscript accordingly.

      For any of the studies conducted, is there any link with the genetics of UM? E.g. BAP1 wildtype/BAP1 mutant? 

      As addressed above in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Row 191 - How were peaks classified as enhancer-occupied? 

      We used annotatePeaks function of HOMER package to annotate genomic locations, as well as H3K27ac ChIP-seq to annotate peaks as enhancer-occupied. We thank the reviewer to pointing it out and have updated the manuscript accordingly to include this information.

      Row 259, the two cell lines should be named, also in Figure 3i. 

      We have added the cell line names and updated the manuscript accordingly.

      Reviewer #2 (Recommendations For The Authors): 

      As a proof of concept, this study is truly excellent and the authors should be commended. However, it is desirable that new knowledge in cancer is translated to the clinic. To this end there are a few things needed to strengthen the study. 

      I am rephrasing my statements from the public review to say that I would recommend testing the inhibitor in T cells (side effects) and BAP1 mutant cell lines (for clinical relevance). 

      As addressed in the public review section, MP38 is a BAP1 mutant uveal melanoma cell line, and we demonstrated growth inhibition and robust caspase 3/7 activity in response to FHT-1015 (Supplementary Fig. 3a and 3f). MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor.

      Regarding concerns for any potential side effect on T cells, we observed an increase in both CD4 and CD8 T-cell populations in the peripheral blood and the spleen, when naïve, non-tumor bearing CD-1 mice were dosed with SMARCA2/4 dual ATPase inhibitor FHD-286 once daily for 14 days. FHD-286 is a compound similar to FHT-1015 described in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/). In addition, FHD-286 has been tested in tumor bearing syngeneic models. When B16F10 tumor bearing C57BL/6 were dosed with FHD-286 for 10 days, we observed an increase in CD69+ activated CD8 T-cell infiltration in the tumor microenvironment (doi:10.1136/jitc-2022-SITC2022.0888).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Determine drug binding by crystal structure or generate additional SMARCA4 or SMARCA2 mutations in the region near I1173/I1143 that are not conserved in CHD4 and test them in an ATPase assay for effects on drug inhibition. For example, Q1166 in SMARCA4 and Q1136 in SMARCA4 could be changed to Alanine as in CHD4. Would this abrogate drug inhibition? 

      We believe that our dual ATPase inhibitor is selective and additional insights into binding specificity and selectivity for earlier stage compounds of this series were recently published in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      (2) The finding that SOX10 can rescue the antiproliferative effects of FHT1015 suggests that SMARCA4 is primarily needed for SOX10 expression. However, the co-occupancy of SMARCA4 and SOX10 at enhancers suggests that they cooperate to promote chromatin accessibility. It is unclear how over-expression of SOX10 can promote chromatin accessibility in drug-inhibited cells since SOX10 does not have chromatin remodeling activity. ATAC-seq in cells over-expressing SOX10 and treated with the drug could identify SOX10-dependent targets that do not require SMARCA4 activity and clarify the mechanism. It would also be informative to determine if SOX10 over-expression abrogates the effects of FHT1015 on both cell cycle and apoptosis, helping to resolve whether it is a partial or complete rescue of proliferation. 

      We agree that running ATAC-seq in cells overexpressing SOX10 would clarify this mechanism. However, shifts in corporate strategy deprioritized any further experiments for this project. One potential mechanism that SOX10 overexpression can partially rescue BAF inhibition phenotype is through overexpressed SOX10 localizing to open chromatin regions (mostly promoters) across the genome. We know from our ATAC-seq data (Fig. 2) that BAF inhibition leads to loss of chromatin accessibility at SOX10 enhancer sites, while promoter regions are only partially affected. Therefore, we think that overexpression of SOX10 would allow upregulation of its target genes via binding to the promoter regions. In this model, the enhancer-driven SOX10 target genes are likely to remain silenced.  

      (3) Although the in vivo studies indicate that the drugs are well-tolerated, additional in vitro studies to determine the effects of the drug on the proliferation/survival of non-cancerous cells would further validate their therapeutic utility.

      Author Response: The reviewer raises a critical question. FHD-286, a dual BRM/BRG1 inhibitor similar to FHT-1015 with optimized physical properties, has been evaluated in a Phase I trial in patients with metastatic uveal melanoma (NCT04879017), and it was well tolerated at continuous daily dose of up to 7.5 mg QD and at intermittent dose of up to 17.5 mg QD.  Manuscript describing results of this clinical trial is currently in preparation.

    1. eLife Assessment

      The is a valuable evaluation of a previously published simulation model on the role of heterozygote advantage in shaping MHC diversity, showing that the conclusions from this model hold only within a narrow parameter range that might be unrealistic. The author presents an alternative model, in which MHC homozygotes with duplicated MHC genes outperform heterozygotes with single genes, thereby challenging the explanation that heterozygote advantage will lead to high allelic variation at a given MHC gene. The topic is highly relevant for eco-immunology and evolutionary genetics, but several major aspects of the author's claim need to be clarified to make the model interpretable. While the work has the potential to improve our understanding of the question of how the extraordinary diversity at the MHC locus evolves, without this addition, the conclusions remain incomplete.

    2. Reviewer #1 (Public review):

      The manuscript "Heterozygote advantage cannot explain MHC diversity, but MHC diversity can explain heterozygote advantage" explores two topics. First, it is claimed that the recently published conclusion by Mattias Siljestam and Claus Rueffler (in the following referred to as [SR] for brevity) that heterozygote advantage explains MHC diversity does not withstand even a very slight change in ecological parameters. Second, a modified model that allows an expansion of the MHC gene family shows that homozygotes outperform heterozygotes. This is an important topic and could be of potential interest to readers if the conclusions are valid and non-trivial.

      Let me first comment on the second part of the manuscript that describes the fitness advantage of the 'gene family expansion'. I think this, by itself, is a totally predictable result. It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative. Yet, as I understood the narrative of the manuscript, the expansion of the gene family serves as a mere counter-example to the disputed finding of [SR], rather than a systematic study of the eco-evolutionary consequences of this process.

      Now to the first part of the manuscript, which claims that the point made in [RS] is not robust and breaks down under a small change in the parameters. An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text. The only piece of information given in the manuscript is that, unlike in [SR], the adjustable parameter c_{max} is kept constant when the number of pathogens is changed.

      In my opinion, the information provided in the manuscript does not allow one to conclude anything about the relevance and the validity of its main claim. At the same time, the simulations done in [SR] are described with a fair amount of detail. Which allows me to assume that the conclusions made in [SR] are fairly robust and, in particular, have been demonstrated not to be too sensitive to changes in the main "suspect', c_{max}. Let me briefly justify my point.

      First, it follows from Eqs (4,5) in the main text and (A12-A13) in the Appendix that c_{max} and K do not independently affect the dynamics of the model, but it's rather their ratio K/c_max that matters. It can be seen by dividing the numerator and denominator of (5) by c_max. Figure 3 shows the persistent branching for 4 values of K that cover 4 decades. As it appears from the schemes in the top row of Figure 3, those simulations are done for the same positions and widths/virulences of pathogens. So the position of x* should be the same in all 4 cases, presumably being at the center of pathogens, (x*,x*) = (0,0). According to the definition of x* given in the Appendix after Eqs (A12-A13), this means that c_max remains the same in all 4 cases. So one can interpret the 4 scenarios shown in Figure 3 as corresponding not to various K, but to various c_max that varied inversely to K. That is, the results would have been identical to those shown in Figure 3 if K were kept constant and c_max were multiplied by 0.1, 1, 10, and 100, or scaled as 1/K. This begs the conclusion that the branching remains robust to changes in c_max that span 4 decades as well.

      Naturally, most, if not all, the dynamics will break down if one of the ecological characteristics changes by a factor of 10^43, as it is reported in the submitted manuscript. As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c. In [SP], it is clearly shown where the pathogens are.

      Another argument that makes me suspicious in the utility of the conclusions made in the manuscript and plays for the validity of [SP] is the adaptive dynamics derivation of the branching conditions. It is confirmed by numerics with sufficient accuracy, and as it stands in its simple form of the inequality between two widths, the branching condition appears to be pretty robust with respect to reasonable changes in parameters.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

    3. Reviewer #2 (Public review):

      Summary:

      This study addresses the population genetic underpinnings of the extraordinary diversity of genes in the MHC, which is widespread among jawed vertebrates. This topic has been widely discussed and studied, and several hypotheses have been suggested to explain this diversity. One of them is based on the idea that heterozygote genotypes have an advantage over homozygotes. While this hypothesis lost early on support, a reason study claimed that there is good support for this idea. The current study highlights an important aspect that allows us to see results presented in the earlier published paper in a different light, changing strongly the conclusions of the earlier study, i.e., there is no support for a heterozygote advantage. This is a very important contribution to the field. Furthermore, this new study presents an alternative hypothesis to explain the maintenance of MHC diversity, which is based on the idea that gene duplications can create diversity without heterozygosity being important. This is an interesting idea, but not entirely new.

      Strengths:

      (1) A careful re-evaluation of a published model, questioning a major assumption made by a previous study.

      (2) A convincing reanalysis of a model that, in the light of the re-analysis-loses all support.

      (3) A convincing suggestion for an alternative hypothesis.

      Weaknesses:

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

    4. Reviewer #3 (Public review):

      This manuscript describes a careful and thorough evaluation of an evolutionary simulation model published previously. The model and this report address the question, whether heterozygote advantage (HA) by itself as a selection mechanism can explain a substantial level of allelic diversity as it is often seen in MHC immune genes. Despite decades of research on the topic of pathogen-mediated selection for MHC diversity, it remains an open question by which specific selection mechanisms this exceptional allelic diversity is maintained.

      The previously published paper posits, in contrast to various previous studies, that HA is, in fact, able to maintain a level of allelic diversity as seen in many populations, just by itself, given certain conditions. The current manuscript now challenges this conclusion by highlighting that the previous model results only hold under very narrow parameter ranges.

      Besides criticizing some of the conceptual points of the previous paper, the author carefully rebuilt the previously published model and replicated their results, before then evaluating the robustness of the model results to reasonable variation in different parameters. From this evaluation, it becomes clear that the previously reported results hinge strongly on a certain scaling or weighing factor that is adjusted for every parameter setting and essentially counteracts the changes induced by changing the parameters. The critical impact of this one parameter is not clearly stated in the previous paper, but raises serious doubts about the generalizability of the model to explain MHC allelic variation across diverse vertebrate species.

      Given the fact that the MHC genes are among the most widely studied genes in vertebrates, and that understanding their evolution will shed light on their association with various complex diseases, the insights from this report and the general discussion of how MHC diversity evolved are of interest to at least some of the community. The manuscript is very well written and makes it easy to follow the theoretical and methodological details of the model and the arguments. I have only a few minor comments that I am detailing below. Furthermore, I would be very interested to read a response by the previous authors, especially on the relevance of this scaling/weighing factor that they introduced into their model, as it is possible that I might have missed something about its meaning.

    5. Author response:

      Reviewer #1 (Public review):

      It appears obvious that with no or a little fitness penalty, it becomes beneficial to have MHC-coding genes specific to each pathogen. A more thorough study that takes into account a realistic (most probably non-linear in gene number) fitness penalty, various numbers of pathogens that could grossly exceed the self-consistent fitness limit on the number of MHC genes, etc, could be more informative.

      The reviewer seems to be referring to the cost of excessively high presentation breadth.  Such a cost is irrelevant to the inferior fitness of a polymorphic population with heterozygote advantage compared to a monomorphic population with merely doubled gene copy number.  It is relevant to the possibility of a fitness valley separating these two states, but this issue is addressed explicitly in the manuscript.

      An addition or removal of one of the pathogens is reported to affect "the maximum condition", a key ecological characteristic of the model, by an enormous factor 10^43, naturally breaking down all the estimates and conclusions made in [RS]. This observation is not substantiated by any formulas, recipes for how to compute this number numerically, or other details, and is presented just as a self-standing number in the text.

      It is encouraging that the reviewer agrees that this observation, if correct, would cast doubt on the conclusions of Siljestam and Rueffler.  I would add that it is not the enormity of this factor per se that invalidates those conclusions, but the fact that the automatic compensatory adjustment of c<sub>max</sub> conceals the true effects of removing a pathogen, which are quite large.

      I am not sure why the reviewer doubts that this observation is correct.  The factor of 2.7∙10<sup>43</sup> was determined in a straightforward manner in the course of simulating the symmetric Gaussian model of Siljestam and Rueffler with the specified parameter values.  A simple way to determine this number is to have the simulation code print the value to which c<sub>max</sub>  is set, or would be set, by the procedure of Siljestam and Rueffler for different parameter values.  In another section of this response I will describe how to do this with the simulation code written and used by Siljestam and Rueffler; doing so confirms the value that I obtained with my own code.  Furthermore, I will now give a theoretical derivation of this factor.

      As specified by Siljestam and Rueffler, the positions of the m pathogens in (m-1)-dimensional antigenic space correspond to the vertices of a regular simplex centered at the origin, with distance between vertices equal to 1.  The squared distance from the origin to each of the m vertices of such a simplex is (m-1)/2m (https://polytope.miraheze.org/wiki/Simplex).  Thus, the sum of the m squared distances is (m-1)/2.  For the (0, 0) homozygote, condition is multiplied by a factor of exp(-(vr)<sup>2</sup>/2) for each pathogen, where r is the distance from the origin.  It follows that, with v=20, all the pathogens together decrease condition by a factor of exp(20<sup>2</sup>∙(m-1)/4) = exp(100∙(m-1)).  Thus, increasing or decreasing m by 1 changes this value by a factor of exp(100) = 2.7∙10<sup>43</sup>.

      This begs the conclusion that the branching remains robust to changes in c_max that span 4 decades as well.

      That shows only that the results are not extremely sensitive to c<sub>max</sub> or K.  They are, nonetheless, exquisitely sensitive to m and v.  This difference in sensitivities is the reason that a relatively small change to m leads to such a large compensatory change in c<sub>max</sub> a change large enough to have a major effect on the results.

      As I wrote above, there is no explanation behind this number, so I can only guess that such a number is created by the removal or addition of a pathogen that is very far away from the other pathogens. Very far in this context means being separated in the x-space by a much greater distance than 1/\nu, the width of the pathogens' gaussians. Once again, I am not totally sure if this was the case, but if it were, some basic notions of how models are set up were broken. It appears very strange that nothing is said in the manuscript about the spatial distribution of the pathogens, which is crucial to their effects on the condition c.

      I did not explicitly describe the distribution of pathogens in antigenic space because it is exactly the same as in Siljestam and Rueffler, Fig. 4: the vertices of a regular simplex, centered at the origin, with unity edge length.

      The number in question (2.7∙10<sup>43</sup>) pertains to the Gaussian model with v=20.  As specified by Siljestam and Rueffler, each pathogen lies at a distance of 1 from every other pathogen, so the distance of any pathogen from the others is indeed much greater than 1/v.  This condition holds, however, for most of the parameter space explored by Siljestam and Rueffler (their Fig. 4), and for all of the parameter space that seemingly supports their conclusions.  Thus, if this condition indicates that “basic notions of how models are set up were broken”, they must have been broken by Siljestam and Rueffler.

      Overall, I strongly suspect that an unfortunately poor setup of the model reported in the manuscript has led to the conclusions that dispute the much better-substantiated claims made in [SD].

      The reviewer seems to be suggesting that my simulations are somehow flawed and my conclusions unreliable.  I will therefore describe how my conclusions about sensitivity to parameter values can be verified using the simulation code provided by Siljestam and Rueffler themselves, with only small, easily understood modifications.  I will consider adding this description as a supplement when I revise the manuscript.

      The starting point is the Matlab file MHC_sim_Dryad.m, available at https://doi.org/10.5061/dryad.69p8cz98j.  First, we can add a line that prints the value of the variable logcmax, which represents the natural logarithm of cmax determined and used by the code.  Below line 116 (‘prework’), add the line ‘logcmax’ (with no semicolon).

      Now, at the Matlab prompt, execute MHC_sim_Dryad(false, 8, 20, 1) to run the simulation for the Gaussian model with m=8, v=20, and K=1.  The output will indicate that logcmax=700, in accord with the theoretical factor exp(100*(m-1)) derived above.  The allelic diversity, n<sub>e</sub>, will rise to a steady state-level of about 140, as in the red curve of my Fig. 2.

      Now lower m to 7, i.e,  run MHC_sim_Dryad(false, 7, 20, 1).  The output will indicate that logcmax=600.  This confirms that lowering m by 1 causes the code to lower the value of c<sub>max</sub> by a factor exp(100)=2.7∙10<sup>43</sup>, which must also be the factor by which the condition of the most fit homozygote would increase without this adjustment.

      With the change of m to 7 and the compensatory change in c<sub>max</sub>, steady-state allelic diversity remains high.  But what if m changes but c<sub>max</sub> remains the same, as it would in reality?

      To find out, we can fix the value of c<sub>max</sub> to the value used with m=8 by adding the following line below the line previously added: ‘logcmax = 700’.  With this additional modification in place, executing MHC_sim_Dryad(false, 7, 20, 1) confirms that without a compensatory change to c<sub>max</sub>, lowering m from 8 to 7 mostly eliminates allelic diversity, in accord with the corresponding curve in my Fig. 2.  Similarly, raising m from 8 to 9, or changing v from 20 to 19.5 or 20.5 (executing MHC_sim_Dryad(false, 8, 19.5, 1) or MHC_sim_Dryad(false, 8, 20.5, 1)), largely eliminates diversity, confirming the other results in my Fig. 2.  Results for the bitstring model can also be confirmed, though this requires additional changes to the code.

      Thus, the extreme sensitivity of the results of Siljestam and Rueffler to parameter values can be verified with the code that they used for their simulations, indicating that my conclusions are not consequences of my having done a “poor setup of the model”.

      Response to Reviewer #2 (Public review):

      (1) The statement that the model outcome of Siljestam and Rueffler is very sensitive to parameter values is, in this form, not correct. The sensitivity is only visible once a strong assumption by Siljestam and Rueffler is removed. This assumption is questionable, and it is well explained in the manuscript by J. Cherry why it should not be used. This may be seen as a subtle difference, but I think it is important to pin done the exact nature of the problem (see, for example, the abstract, where this is presented in a misleading way).

      I appreciate the distinction, and the importance of clearly specifying the nature of the problem.  However, Siljestam and Rueffler do not invoke the implausible assumption that changes to the number of pathogens or their virulence will be accompanied by compensatory changes to c<sub>max</sub>.  Rather, they describe the adjustment of c<sub>max</sub> (Appendix 7) as a “helpful” standardization that applies “without loss of generality”.  Indeed, my low-diversity results could be obtained, despite such adjustment, by combining the small change to m or v with a very large change to K (e.g., a factor of 2.7∙10<sup>43</sup>).  In this sense there is no loss of generality, but the automatic adjustment of c<sub>max</sub> obscures the extreme sensitivity of the results to m and v.

      (2) The title of the study is very catchy, but it needs to be explained better in the text.

      I had hoped that the final paragraph of the Discussion would make the basis for the title clear.  I will consider whether this can be clarified in a revision.

    1. eLife Assessment

      This valuable study reveals that the GSK-3 inhibitor AZD2858 inhibits the formation of TOPBP1 condensates and hence DNA damage responses in colorectal cancer cells. The evidence supporting the claims of the authors is convincing, although uncovering how this drug blocks bio-condensate formation would have strengthened the study. The work will be of interest to cancer researchers searching for synergistic drug combination strategies.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      Laura Morano and colleagues have performed a screen to identify compounds that interfere with the formation of TopBP1 condensates. TopBP1 plays a crucial role in the DNA damage response, and specifically the activation of ATR. They found that the GSK-3b inhibitor AZD2858 reduced the formation of TopBP1 condensates and activation of ATR and its downstream target CHK1 in colorectal cancer cell lines treated with the clinically relevant irinotecan active metabolite SN-38. This inhibition of TopBP1 condensates by AZD2858 was independent from its effect on GSK-3b enzymatic activity. Mechanistically, they show that AZD2858 thus can interfere with intra-S-phase checkpoint signaling, resulting in enhanced cytostatic and cytotoxic effects of SN-38 (or SN-38+Fluoracil aka FOLFIRI) in vitro in colorectal carcinoma cell lines.

      Comments on latest version:

      The requested plots are in figure S7 of the latest manuscript version, and look convincing. My last point is now adequately addressed.

    3. Reviewer #2 (Public review):

      Summary:

      In 2021 (PMID: 33503405) and 2024 (PMID: 38578830) Constantinou and colleagues published two elegant papers in which they demonstrated that the Topbp1 checkpoint adaptor protein could assemble into mesoscale phase-separated condensates that were essential to amplify activation of the PIKK, ATR, and its downstream effector kinase, Chk1, during DNA damage signalling. A key tool that made these studies possible was the use of a chimeric Topbp1 protein bearing a cryptochrome domain, Cry2, which triggered condensation of the chimeric Topbp1 protein, and thus activation of ATR and Chk1, in response to irradiation with blue light without the myriad complications associated with actually exposing cells to DNA damage.

      In this current report Morano and co-workers utilise the same optogenetic Topbp1 system to investigate a different question, namely whether Topbp1 phase-condensation can be inhibited pharmacologically to manipulate downstream ATR-Chk1 signalling. This is of interest, as the therapeutic potential of the ATR-Chk1 pathway is an area of active investigation, albeit generally using more conventional kinase inhibitor approaches.

      The starting point is a high throughput screen of 4730 existing or candidate small molecule anti-cancer drugs for compounds capable of inhibiting the condensation of the Topbp1-Cry2-mCherry reporter molecule in vivo. A surprisingly large number of putative hits (>300) were recorded, from which 131 of the most potent were selected for secondary screening using activation of Chk1 in response to DNA damage induced by SN-38, a topoisomerase inhibitor, as a surrogate marker for Topbp1 condensation. From this the 10 most potent compounds were tested for interactions with a clinically used combination of SN-38 and 5-FU (FOLFIRI) in terms of cytotoxicity in HCT116 cells. The compound that synergised most potently with FOLFIRI, the GSK3-beta inhibitor drug AZD2858, was selected for all subsequent experiments.

      AZD2858 is shown to suppress the formation of Topbp1 (endogenous) condensates in cells exposed to SN-38, and to inhibit activation of Chk1 without interfering with activation of ATM or other endpoints of damage signalling such as formation of gamma-H2AX or activation of Chk2 (generally considered to be downstream of ATM). AZD2858 therefore seems to selectively inhibit the Topbp1-ATR-Chk1 pathway without interfering with parallel branches of the DNA damage signalling system, consistent with Topbp1 condensation being the primary target. Importantly, neither siRNA depletion of GSK3-beta, or other GSK3-beta inhibitors were able to recapitulate this effect, suggesting it was a specific non-canonical effect of AZD2858 and not a consequence of GSK3-beta inhibition per se.

      To understand the basis for synergism between AZD2858 and SN-38 in terms of cell killing, the effect of AZD2858 on the replication checkpoint was assessed. This is a response, mediated via ATR-Chk1, that modulates replication origin firing and fork progression in S-phase cell under conditions of DNA damage or when replication is impeded. SN-38 treatment of HCT116 cells markedly suppresses DNA replication, however this was partially reversed by co-treatment with AZD2858, consistent with the failure to activate ATR-Chk1 conferring a defect in replication checkpoint function.

      Figures 4 and 5 demonstrate that AZD2858 can markedly enhance the cytotoxic and cytostatic effects of SN-38 and FOLFIRI through a combination of increased apoptosis and growth arrest according to dosage and treatment conditions. Figure 6 extends this analysis to cells cultured as spheroids, sometimes considered to better represent tumor responses compared to single cell cultures.

      Significance:

      Liquid phase separation of protein complexes is increasingly recognised as a fundamental mechanism in signal transduction and other cellular processes. One recent and important example was that of Topbp1, whose condensation in response to DNA damage is required for efficient activation of the ATR-Chk1 pathway. The current study asks a related but distinct question; can protein condensation be targeted by drugs to manipulate signalling pathways which in the main rely on protein kinase cascades?

      Here, the authors identify an inhibitor of GSK3-beta as a novel inhibitor of DNA damage-induced Topbp1 condensation and thus of ATR-Chk1 signalling.

      This work will be of interest to researchers in the fields of DNA damage signalling, biophysics of protein condensation, and cancer chemotherapy.

      Comments on latest version:

      Having read the revised manuscript and rebuttal I am satisfied that the authors have resolved my various original concerns through a combination of clarification/ explanation and textual changes necessary to make the description of certain data precise. My impression is that they have also largely or completely satisfied the concerns of the other reviewers, with the possible exception of reviewer 1's point about the relative toxicity of AZD and FOLFIRI in colorectal cancer cell lines versus the untransformed CCD841 cell line. This is of course an important point with respect to the possible practical application of this combination for cancer therapy, however this seems somewhat subsidiary to the main novelty and significance of the findings, which are that protein liquid phase separation/ condensation can be manipulated pharmacologically to modify signal transduction processes and that existing drugs can be re-purposed to this end.

    4. Reviewer #3 (Public review):

      Summary:

      The authors have extended their previous research to develop TOPBP1 as a potential drug target for colorectal cancer by inhibiting its condensation. Utilizing an optogenetic approach, they identified the small molecule AZD2858, which inhibits TOPBP1 condensation and works synergistically with first-line chemotherapy to suppress colorectal cancer cell growth. The authors investigated the mechanism and discovered that disrupting TOPBP1 assembly inhibits the ATR/Chk1 signaling pathway, leading to increased DNA damage and apoptosis, even in drug-resistant colorectal cancer cell lines.

      Comments on latest version:

      This reviewer does not have further comments to the paper.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Comments on revised version: 

      I have reviewed the revised manuscript and read the rebuttal. The authors have carefully addressed my concerns. There is however one point that needs further work: 

      This follows up on my major point #1 in my initial review. I had I asked the authors to demonstrate that FOLFIRI + AZD are less toxic to untransformed colorectal cells than colorectal cancer cell lines.  It is good to see that the authors took my advice and show effects of the drug treatments on the untransformed colorectal cell line CCD841. It seems to be less sensitive to AZD and FOLFIRI in the figure in the rebuttal. What surprises me is that I cannot find these new figures anywhere in the revised manuscript. Also, the data seem preliminary, because I do not see any standard errors in the graphs, and I cannot find a description of the time of drug incubation. I ask the authors to make sure that the CCD841 data are reproducible, and make sure they incorporate the data in the revised manuscript. 

      We thank the reviewer for this insightful comment. In the initial revised version of the manuscript, we did not include results from the untransformed colorectal cell line CCD841, as those experiments had only been performed once and were considered preliminary. However, we fully agree with the reviewer on the importance of including these data.

      To address this, we have repeated the experiments in CCD841 cells to ensure reproducibility. We now report the results from three independent experiments testing the combination of AZD2858 and FOLFIRI on healthy epithelial colon cells. These results are shown in Supplementary Figure S7, where blue matrices represent cell viability and black matrices reflect the level of synergy between AZD2858 and FOLFIRI.

      Our results confirm that, individually, each drug has little to no effect on healthy cells, and no consistent synergistic interaction was observed, except in Experiment 1, which could not be reproduced. Importantly, the drug concentrations used were identical to those applied in the cancer cell experiments, allowing for direct comparison between normal and malignant cell responses.

      Reviewer #2:

      Comments on latest version: 

      Morano et al. have revised their manuscript in response to the points raised by reviewer #3 as follows.

      (1) Fig. 2E: Correcting the previously erroneous labelling of this Fig. makes it match the textual description. 

      (2) Figs 3A and B: The revised textual description of the flow cytometry BrdU incorporation is now precise. 

      (3) Fig. 3E: Removing the suspect WB images is a pragmatic decision that does not significantly affect the overall conclusions of the paper. 

      (4) Fig. 3D: Despite its puzzling appearance this data is now described accurately in the text as "DSBs remained elevated after the combined treatment" rather than "increased after the combined treatment. A more convincing increase in the presumed damaged DNA band is evident in Fig. 4D when AZD2858 is combined with a much lower concentration of SN38 (1.5nM) which could mean that the concentration used in Fig. 3D (300nM) induced maximal damage that could not be further enhanced. 

      We thank the reviewer for their thoughtful comments and constructive feedback, which have helped us improve the clarity and rigor of the manuscript.

      Reviewer #3:

      Comments on latest version: 

      The authors have addressed most of the concerns that I raised in the first round of revision and I have no further questions. I appreciate the authors's efforts in carrying out an preliminary in vivo work, although as the authors pointed out the compound seems to be not effective in vivo. Future work is desired to address this to clarify the significance of the work. 

      We thank the reviewer for acknowledging our efforts in addressing the previous concerns. We also appreciate the recognition of our preliminary in vivo work. While these results suggest limited in vivo efficacy of the compound at this stage, we agree that additional studies will be necessary to fully evaluate its therapeutic relevance. We consider this an important next step and are committed to pursuing it in future work.

    1. eLife Assessment

      This study reports the important finding that the dynamin inhibitor Dyngo-4a broadly affects lipid packing and plasma membrane dynamics, independently of its action on dynamin. While solid computational, biophysical, and cell-based evidence supports this conclusion, there is incomplete support for the authors' main claim on the role of lipid packing in caveolae internalization, as the causal relationship remains unclear and direct analyses are lacking. With stronger evidence, this work would be of significant interest to cell biologists, biophysicists, and chemists interested in membrane remodeling and drug-membrane interactions.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      The authors use Dyngo-4a, a known Dynami inhibitor to test its influence on caveolar assembly and surface mobility. They investigate whether it incorporates into membranes with Quartz-Crystal Microbalance, they investigate how it is organized in membranes using simulations. Finally, they use lipid-packing sensitive dyes to investigate lipid packing in the presence of Dyngo-4a, membrane stiffness using AFM and membrane undulation using fluorescence microscopy. They also use a measure they call "caveola duration time" to claim that something happens to caveolae after Dyngo-4a addition and using this parameter, they do indeed see an increase in it in response to Dyngo-4a, which is reduced back to the baseline after addition of cholesterol.

      Overall, the authors claim: 1) Dyngo-4a inserts into the membrane and this 2) results in "a dramatic dynamin-independent inhibition of caveola scission". 3) Dyngo-4a was inserted and positioned at the level of cholesterol in the bilayer and 4) Dyngo-4a-treatment resulted in decreased lipid packing in the outer leaflet of the plasma membrane 5) but Dyngo-4a did not affect caveola morphology, caveolae-associated proteins, or the overall membrane stiffness 6) acute addition of cholesterol counteracts the block in caveola scission caused by Dyngo-4a.

      Overall, in this reviewers opinion, claims 1, 3, 4, 5 are well-supported by the presented data from electron and live cell microscopy, QCM-D and AFM.

      However, there is no convincing assay for caveolar endocytosis presented besides the "caveola duration" which although unclearly described seems to be the time it takes in imaging until a caveolae is not picked up by the tracking software anymore in TIRF microscopy.

      Since the main claim of the paper is a mechanism of caveolar endocytosis being blocked by Dyngo-4a, a true caveolar internalization assay is required to make this claim. This means either the intracellular detection of not surface connected caveolar cargo or the quantification of caveolar movement from TIRF into epifluorescence detection in the fluorescence microscope. Otherwise, the authors could remove the claim and just claim that caveolar mobility is influenced.

      Significance:

      A number of small molecule inhibitors for the GTPase dynamics exist, that are commonly used tools in the investigation of endocytosis. This goes as far that the use of some of these inhibitors alone is considered in some publications as sufficient to declare a process to be dynamin-dependent. However, this is not correct, as there are considerable off-target effects, including the inhibition of caveolar internalization by a dynamin-independent mechanism. This is important, as for example the influence of dynamin small molecule inhibitors on chemotherapy resistance is currently investigated (see for example Tremblay et al., Nature Communications, 2020).

      The investigation of the true effect of small molecules discovered as and used as specific inhibitors and their offside effects is extremely important and this reviewer applauds the effort. It is important that inhibitors are not used alone, but other means of targeting a mechanism are exploited as well in functional studies. The audience here thus is besides membrane biophysicists interested in the immediate effect of the small molecule Dyngo-4a also cell biologists and everyone using dynamic inhibitors to investigate cellular function.

      Comments on revised version:

      Please include the promised data on caveolar internalization and remove the above mentioned claim on membrane undulations from the text.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors probe the mechanisms by which Dyngo-4a, a dynamin inhibitor used to block endocytosis, disrupts caveolae dynamics. They provide compelling evidence that Dyngo-4a inhibits caveolae dynamics and endocytosis (as well as several other aspects of plasma membrane dynamics) by a dynamin-independent mechanism. They also provide strong computational and experimental data showing that Dyngo-4a inserts into membranes and decreases lipid packing in the outer leaflet of the plasma membrane. Finally, they demonstrate that the addition of excess cholesterol to cells reverses the effects of Dyngo-4a on caveolae dynamics, presumably by reversing lipid packing defects. Based on these findings they conclude that lipid packing regulates caveolae dynamics and endocytosis in a cholesterol-dependent manner.

      This work should be of value to cell biologists interested in plasma membrane remodeling and membrane trafficking, biophysicists that study small molecule/membrane interactions and membrane remodeling processes, and chemists interested in designing drugs to target membrane trafficking machinery and pathways.

      Strengths:

      This work addresses the important topic of how a widely used endocytic inhibitor actually works. In the process of addressing this question, the authors uncover unexpected connections between how lipids are packed in cell membranes and membrane dynamics. The methods are appropriate and many of the claims made in this work are well supported by data.

      Weaknesses:

      I appreciate that the manuscript has already gone through one round of revisions and that many of the concerns from the previous reviewers appear to have been addressed. However, as an interested reader, I would like to offer several additional comments for the authors to consider.

      (1) It is not clear based on the data presented whether the effects of Dyngo-4a on lipid packing give rise to defects in caveolae dynamics or if these effects are merely correlated. To show this more definitively, one might expect additional experimental approaches to be used to perturb lipid packing. I appreciate this is probably beyond the scope of the current study. However, it seems important for the manuscript to be clear about how far this interpretation can be pushed in the absence of additional independent lines of evidence.

      (2) On a related note, it is not obvious how changes in lipid packing in the outer leaflet could impact caveolae dynamics. It would be helpful to include a cartoon illustrating how this might work.

      (3) The authors note that Dyngo-4a inhibits several dynamic processes including generalized plasma membrane mobility (Fig 4A&B), transferrin uptake (Fig S4C), and fusion of fusogenic liposomes (Fig S4G). This clearly indicates there is a major disruption of the plasma membrane going on here that is not limited to caveolae. They go on to show that the addition of cholesterol reverses the effects of Dyngo-4a on caveolae dynamics. However, they do not discuss whether adding back cholesterol has similar effects on plasma membrane mobility and transferrin uptake. This information could help to further pinpoint whether the mechanisms of action are shared, and if the role of cholesterol is more general in controlling these events or is instead specific to caveolae.

      (4) In Fig 4C, the morphology of the neck region of the Dyngo04a treated caveolae structure appears to be "pinched" compared to the control. I appreciate that more EM studies are underway. It would be useful to specifically compare the morphology of the caveolae as part of those studies.

      (5) In Line 91, a statement is made that 8S complex formation requires cholesterol. This is debatable, as they appear to form in E. coli in the absence of cholesterol (reference 14).

    4. Author response:

      General Statements

      In this paper we demonstrate that the lipid packing of the plasma membrane has a huge impact on the stability of caveolae. By using interdisciplinary techniques, we show that the widely used dynamin inhibitor Dyngo-4a adsorbs and inserts to lipid bilayers leading to a decreased lipid packing and hence reduced caveolae dynamics and internalization even in cells lacking dynamin. We have added experiments that validates that Dyngo-4a treatment does not result in fragmentation or disassembly of the caveolae.  A FRAP assay of cytosolic caveolae has been employed to address questions concerning scission. Moreover, as suggested by the reviewers, we have also included new simulation data that show and expand on the fact that Dyngo-4a positions in the lipid leaflet similar to cholesterol and preferentially associates with cholesterol clusters, affecting the spatial distribution of cholesterol in the membrane. We believe that these added data have greatly improved the paper and strengthened our conclusions that the lipid packing is a critical determinant in the balance between internalization and stable plasma membrane association of membrane vesicles.

      As requested, we have expanded the introduction to provide more detailed information about previous findings in the field. Changes and addition to the text has been highlighted in red for easier tracking.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors use Dyngo-4a, a known Dynami inhibitor to test its influence on caveolar assembly and surface mobility. They investigate, whether it incorporates into membranes with Quartz-Crystal Microbalance, they investigate how it is organized in membranes using simulations. Finally, they use lipid-packing sensitive dyes to investigate lipid packing in the presence of Dyngo-4a, membrane stiffness using AFM and membrane undulation using fluorescence microscopy. They also use a measure they call "caveola duration time" to claim that something happens to caveolae after Dyngo-4a addition and using this parameter, they do indeed see an increase in it in response to Dyngo-4a, which is reduced back to the baseline after addition of cholesterol.

      Overall, the authors claim: 1) Dyngo-4a inserts into the membrane and this 2) results in "a dramatic dynamin-independent inhibition of caveola scission". 3) Dyngo-4a was inserted and positioned at the level of cholesterol in the bilayer and 4) Dyngo-4a-treatment resulted in decreased lipid packing in the outer leaflet of the plasma membrane 5) but Dyngo-4a did not affect caveola morphology, caveolae-associated proteins, or the overall membrane stiffness 6) acute addition of cholesterol counteracts the block in caveola scission caused by Dyngo-4a.

      Overall, in this reviewers opinion, claims 1, 3, 4, 5 are well-supported by the presented data from electron and live cell microscopy, QCM-D and AFM.

      However, there is no convincing assay for caveolar endocytosis presented besides the "caveola duration" which although unclearly described seems to be the time it takes in imaging until a caveolae is not picked up by the tracking software anymore in TIRF microscopy.

      Since the main claim of the paper is a mechanism of caveolar endocytosis being blocked by Dyngo-4a, a true caveolar internalization assays is required to make this claim. This means either the intracellular detection of not surface connected caveolar cargo or the quantification of caveolar movement from TIRF into epifluorescence detection in the fluorescence microscope. Otherwise, the authors could remove the claim and just claim that caveolar mobility is influenced.

      We thank the reviewer for the nice constructive comments, and we very much appreciate the positive critique. We have now included a FRAP experiment of endocytic Cav1-GFP supporting the effect on internalization. In addition, we are currently preforming CTxB HRP experiments to quantify the number of caveolae at PM using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Reviewer #1 (Significance):

      A number of small molecule inhibitors for the GTPase dynamics exist, that are commonly used tools in the investigation of endocytosis. This goes as far that the use of some of these inhibitors alone is considered in some publications as sufficient to declare a process to be dynamin-dependent. However, this is not correct, as there are considerable off-target effects, including the inhibition of caveolar internalization by a dynamin-independent mechanism. This is important, as for example the influence of dynamin small molecule inhibitors on chemotherapy resistance is currently investigated (see for example Tremblay et al., Nature Communications, 2020).

      The investigation of the true effect of small molecules discovered as and used as specific inhibitors and their offside effects is extremely important and this reviewer applauds the effort. It is important that inhibitors are not used alone, but other means of targeting a mechanism are exploited as well in functional studies. The audience here thus is besides membrane biophysicists interested in the immediate effect of the small molecule Dyngo-4a also cell biologists and everyone using dynamic inhibitors to investigate cellular function.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This manuscript uses the small molecule dynamin inhibitors dynasore and dyngo to show that in dynamin triple knockout cells that these inhibitors impact lipid packing and organization in the plasma membrane. Data showing that dyngo affects caveolin dynamics using tirf microscopy is also shown and is interpreted to reflect inhibition of caveolae scission from the membrane.

      This data showing that dyngo and dynasore target membrane order is quite compelling and argues that the effects of these inhibitors is not dynamin specific and that inhibition of endocytosis by these small molecule inhibitors is dynamin-independent. The in vitro and in vivo data they provide is convincing.

      Similarly, the data showing that dynasore and dyngo affect caveolin dynamics and clathrin endocytosis (transferrin) is quite convincing and argues that altered lipid packing is impacting membrane dynamics at the plasma membrane.

      What is less convincing is the conclusion that dyngo is preventing caveolae scission from the membrane. Study of caveolae endocytosis is based on a TIRF assay that has inherent limitations:

      - Caveolae are defined as bright cav1-positive spots in diffraction limited TIRF and their disappearance presumed to be endocytic events. Cav1 spots are presumed to be caveolae but the authors do not consider that they may be flat non-caveolar oligomers. The diffraction limited TIRF approach interprets the large structures as caveolae but evidence to that effect is lacking.

      This is a valid comment and to address this we have now included data showing colocalization of cavin1 and EHD2 to the Cav1-GFP spots. We can however not determine if they are flat or invaginated. We do have extensive experience imaging caveolae using TIRF microscopy and carefully chose cells that display low expression of fluorescently labelled caveolin to avoid non-caveolar structures.

      - The analysis (and the diagram presented in figure 4) considers that caveolae can either diffuse laterally in the membrane or internalize and does not consider that caveolae can flatten and possibly fragment in the membrane. Is it not possible that loss of Cav1 spots is a fragmentation event and not necessarily a scission event?

      This is a good question, yet, fragmentation and disassembly would result in shorter track durations and this is not what is observed in data. We have now also included data showing that cavin1 is persistently associated with the Cav1 spots identified as caveolae during Dyngo-4a treatment indicating that these are caveolae. Furthermore, IF stainings showing colocalization of Cav1GFP with cavin1 or EHD2 after Dyngo-4a treatment have also been added. We have now also expanded on the different interpretations of the data in the results section.

      - The analysis is based on overexpression of Cav1-GFP that may alter the stoichiometry between Cav1 and cavin1 such that while caveolae may be expressed, larger non-caveolar structures may accumulate.

      Yes, this is correct, we have specifically imaged cell expressing low levels of Cav1-GFP to avoid accumulated non-caveolar structures that can be spotted in cells with high expression.

      - Cav1 has been shown to be internalized via the CLIC pathway (Chaudary et al, 2014) and if dyngo is impacting clathrin then maybe it is also impacting CLIC endocytosis and thereby Cav1 endocytosis via this pathway?

      Dyngo-4a has been shown to not affect CLIC endocytosis (McCluskey et al., 2013) and in our data we do not see internalization following Dyngo-4a treatment.

      - The longer Cav1 TIRF track time and shorter displacement with dyngo is consistent with inhibition of caveolae scission. However, as the authors discuss, could not reduced membrane undulations due to dyngo's impact on membrane order be responsible for the longer tracks? Alternatively, perhaps the altered lipid packing is corralling Cav1 movement and reducing non-caveolar Cav1 endocytosis, resulting in shorter tracks of longer duration? The proposed interaction of dyngo with cholesterol could prevent scission but also stabilize large (flat?) Cav1 oligomers in the membrane, perhaps reducing Cav1 oligomer fragmentation.

      We completely agree that membrane undulations contribute to instability of the TIRF-field and therefore disruption of cav1-GFP tracks as we discuss in the results section and have been described in previous work (Larsson et al., 2023). Yet, we have also shown that internalization of caveolae results in shorter tracks (Hubert et al., 2020; Larsson et al., 2023; Mohan et al., 2015). Furthermore, the tracked Cav1-GFP spots are persistently positive for cavin1 both with and without Dyngo-4a treatment showing that the majority do not disassemble become internalized by other pathways. Additionally, the added IF stainings after 30 min Dyngo-4a treatment also show that the Cav1-GFP spots remain positive for cavin1 and EHD2 just as ctrl-treated cells.

      My point here is not to discredit the data but only to suggest that the TIRF approach used is an indirect measure of caveolae scission from the membrane that requires substantiation using other approaches.

      We appreciate these comments and have tried to address these by adding new data and discussions on the interpretation of the tracking data in the results section.

      Dyngo is certainly generally affecting lipid packing via cholesterol and thereby affecting Cav1 dynamics in the plasma membrane. The claim of caveolae scission should be qualified and alternative possibilities considered and discussed. If the authors persist in arguing that dyngo is affecting caveolae scission then the effect should be substantiated by accumulation of caveolae by quantitative EM and high spatial and temporal resolution imaging of Cav1 and cavin1 to define the endocytic events. As the latter represents a new, and potentially very challenging, line of experimentation, I would suggest that it is beyond the scope of the current study. As indicated above the additional experiments are not necessary and qualification of the claims would be sufficient.

      We have now included a FRAP experiment of endocytic Cav1-GFP supporting the effect on internalization. We are also currently preforming CTxB HRP experiments to quantify the number of caveolae at the PM using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Other points

      Figure 1C - Cav1 positive spots cannot be interpreted to be caveolae from diffraction limited confocal images. Same comment applies to Fig 4G - caveola? duration.

      We completely agree with this and that the claims should be qualified. We have added IF stainings showing that the Cav1-GFP structures are also positive for cavin1. We have now clarified that we cannot distinguish between flat or different curved states of caveolae using this methodology. We have also changed the labelling of Fig. 4G.

      Figure 4C - it is not clear why this EM data is not quantified - for both the number of caveolae and clathrin coated pits - as this would help clarify the interpretation of the effect reported.

      We are currently preforming CTxB HRP experiments to quantify the number of caveolae using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Figure 4D - the AFM experiments should perhaps be repeated as the non-significant effect of dyngo on the Young's modulus may be a result of insufficient n values.

      We would like to clarify that to ensure the robustness of our AFM measurements, we performed the experiments with sufficient biological and technical replicates. Specifically, each data point shown in Figure 4D represents a Young’s modulus value averaged from approximately sixty force-distance curves per cell. For each condition, we collected force-distance maps on eight to nine individual cells, obtained from two separate petri dishes per day. We repeated this process on two independent days. In total, we analysed thirty-one cells for the DMSO control and thirty-three cells for the Dyngo-4a treatment. We performed the “student’s t-test with Welch’s correction” to access the statistical significance between the two conditions, as described in the main text. We believe that the sample size and statistical approach are sufficient to support the conclusions presented. Furthermore, we also analysed cell stiffness by calculating the slope of the linear portion of the force-distance curves. This analysis also did not reveal any statistically significant differences between the conditions (data not shown), further supporting our conclusion that Dyngo-4a treatment does not significantly alter the Young’s modulus under our experimental setup (or conditions).

      Reviewer #2 (Significance):

      This data showing that dyngo and dynasore target membrane order is quite compelling and argues that the effects of these inhibitors is not dynamin specific and that inhibition of endocytosis by these small molecule inhibitors is dynamin-independent. The in vitro and in vivo data they provide is convincing.

      Similarly, the data showing that dynasore and dyngo affect caveolin dynamics and clathrin endocytosis (transferrin) is quite convincing and argues that altered lipid packing is impacting membrane dynamics at the plasma membrane.

      What is less convincing is the conclusion is that dyngo is preventing caveolae scission from the membrane.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Larsson et al present experimental and computational data on the role of Dyngo4a (a compound that was developed to inhibit dynamin) on the dynamics of caveolae. The manuscript mostly documents effects of Dyngo on caveolae, with one experiment to suggest a mechanism for this result. This one rather unconvincing result forms the focus of the manuscript contributing to a disconnect between the data and the presentation. Additionally, there are concerns with data interpretation. The writing could also benefit from revision to address grammar mistakes, strengthen referencing, and increase precision. Overall, the manuscript requires substantial revisions before being considered for publication. The central claim, in particular, needs stronger evidence to support the proposed mechanism.

      We thank the reviewer for the thorough review and for experimental suggestions that we believe has strengthened our data further.

      Significant issues (in approximate order of importance):

      (1) The data supporting the central mechanistic explanation appears limited. There is no evidence that Dyngo remains in one leaflet

      The simulations show that the energy barrier for moving in between bilayers is very high. Furthermore, simulations of C-Laurdan has shown that it does not readily flip in between membrane leaflets (Barucha-Kraszewska et al., 2013) supporting that it reports on the outer lipid leaflet when added to cells. We have however now changed this and state that Dyngo-4a decreased the lipid order in the plasma membrane.

      - the GP of the PM is very low compared to previous measurements,

      The absolute GP-values will vary between setups depending on what filters are used so they are not comparable between laboratories. What is of importance is that we found a significant change in the relative GP-values in cells treated with Dyngo-4a and control cells. It is this change that we report. We have not performed any GP-measurements on this cell type earlier so it is unclear what previous measurements reviewer #3 are referring to.

      - effects on other membranes are not explored,

      The order of the intracellular membranes is as expected lower than that of the plasma membrane. Differentiating different intracellular membranes of interest like endocytotic vesicles from other intracellular membranes would be very difficult but, more importantly, our study is focused on what is happening in the plasma membrane where caveolae reside and would be of minor interest for plasma membrane dynamics.

      - dynamin-directed effects of Dyngo are not considered,

      In the discussion section we discuss the difficulties with disentangling dynamin-direct and indirect effects.

      (2) The QCM-D measurements and claims require explanation as several aspects remains unclear. In Fig S2, the 'softness' (what does this mean?) changes by 4-fold with DMSO alone (what does this mean?), then fractionally more with Dyngo. Then fractionally more again when Dyngo is removed (why?). Then it remains somewhat higher when both Dyngo and DMSO are removed, which is somehow interpreted as Dyngo remaining in the bilayer, but not DMSO.

      We understand the confusion of the reviewer and hope our explanations provide clarity. QCM-D measurements are based on an oscillating quartz crystal sensor. Specifically, alterations in oscillation frequency (ΔF) and the rate of energy dissipation from the sensor surface (ΔD) are what is measured. Allowing the measurement of: 1) materials adsorbing to the sensor surface, 2) changes in the viscoelastic properties of a solution in contact with the sensor surface, 3) changes in the material adsorbed to the sensor surface upone exposure to different solutions. The ratio of ΔD/-ΔF reports the mechanical softness or rigidity of an adsorbed material, in this case the SLB.

      A “buffer shift” is the term used when there is not an adsorption to the sensor surface, but rather an effect from altering the solution above the sensor surface. One reason is because different solutions can have different densities (e.g., a DMSO-buffer mixture vs buffer alone), which impacts the oscillations of the sensor. It was observed that the DMSO-buffer mixture alone gave a large buffer shift in comparison to the adsorption of the Dyngo-4a into the SLB, thereby muddling the data interpretation. Thus, in Fig. S2 the system was first equilibrated with the DMSO-buffer mixture prior to addition of the Dyngo-4a solution to allow for clearer visualization of the two events. In QCMD to assess if something has made a permeant change to the system you change back to the solutions used before the addition, thus first we washed with a DMSO-Buffer mixture followed by buffer alone. Control experiments were carried out in which no Dyngo-4a was added (also shown in Fig. S2). The control shows the same “buffer shift” from the DMSO-buffer mixture occurs in both systems and that upon returning to a buffer only condition there is no permanent change to the system caused from exposure to the DMSO. In contrast, once the system that received Dyngo-4a is changes back to a buffer only system we see that mass has been added to the system (ΔF) with little change to the dissipation (ΔD), thereby resulting in a lower ratio of ΔD/-ΔF, which is to say that the SLB after the adsorption of Dyngo-4a was more rigid that the SLB without Dyngo-4a.

      These interpretations are difficult to grasp, as the authors seem to be implying simple amphiphilic partitioning into the membrane, which should all be removable by efficient washing.

      Amphiphilic partitioning is not fully reversible by “efficient washing” it depends on partitioning coefficients.

      I do not doubt that this compound interacts with membranes, but the quantifications appear ambiguous. A bilayer with 16 mol% (or worse, 30% if all in one leaflet) Dyngo is very unlikely (to remain a bilayer). Even if such a bilayer was conceivable, the authors are claiming an ADDITION of Dyngo that would INCREASE the area of one leaflet by 30%, which needs explanation as it appears unlikely.

      We understand that in our attempt provide numbers in the results section for the amount of binding observed in QCM-D, this can easily be interpreted as this is what is observed to insert into the PM. However, as discussed in the discussion, we also see aggregations of Dyngo-4a that associate with the membrane in the simulations which likely could contribute to the binding observed in QCM-D prior to washing. The precise amount of membrane inserted Dyngo-4a is difficult to measure as we discuss in the text. In order to make this clearer, we have now moved all these details to the discussion section where we elaborate on this. Furthermore, since Dyngo-4a, like cholesterol, is intercalating in between the head groups of the lipids the area would not increase in direct proportion to the mol%.

      Also, there are no replicates shown, so unclear how reproducible these effects are?

      For clarity, only single experiments are shown. However, multiple experiments were performed and the range in measured values for 3 technical repeats can be observed in the standard deviations found in the main text (e.g., 6 ± 2 mol%).

      (3) The simulations are insufficiently described and difficult to interpret. How big are these systems? Why do the figures show the aqueous system with lateral boundaries?

      There are no explicit boundaries used in the simulations, periodic boundary conditions are applied in all three dimensions. The lateral boundaries observed in the figures correspond to the simulation box edges and are a visual artifact of 2D projections with QuickSurf representation. No artificial wall or constraints were introduced laterally. Additional technical details, including the system size and periodic boundary conditions have now been added to the methods section.

      It seems quite important that multiple Dyngo molecules aggregate rather than partition into membranes - is this likely to occur in experiment?

      Yes, this is important and with the additional simulation experiments suggested by Reviewer #3 it has been clarified that they contribute a great deal to the change in lipid packing of lipid bilayers containing cholesterol.  However, it is hard to test aggregation is the cellular system, but we believe that this happens and contribute to the effect on membranes. We have now emphasized the effect of the aggregates in the text.

      PMF simulations are strongly suggesting that Dyngo does not spontaneously cross membranes, which is inconsistent with its drug-like amphiphilicity (cLogP~2.5 is optimally suited for membrane permeation) and known effects on intracellular proteins. This suggests an artefact in these PMFs.

      As stated in the submitted version of the manuscript, logP was used to validate the topology and the observed value was in a very good agreement with cLogP. Moreover, this validation complemented the standard procedure of CHARMM-GUI ligand modelling, that provided a reasonable penalty score (around 20) for the Dyngo-4a topology. POPC and cholesterol molecules are standard in the force field and validated by numerous studies. The parameters used for the membrane simulations and AWH in particular are very common for this type of studies. Thus, we do not see what may cause any artifacts in the free energy profile construction. In fact, amphiphilicity of the molecule may be one of the key reasons that Dyngo-4a molecule remains at the aqueous interface of the membrane and does not cross the membrane spontaneously. Also, we believe that the energy barrier of 40-60 kJ/mol is not prohibitively high and Dyngo-4a molecules may still overcome the barrier eventually, though we expect majority to reside in the upper leaflet.

      The authors should experimentally measure the permeation of Dyngo through bilayers (or lack thereof), to more robustly support their finding that Dyngo does not cross membranes spontaneously.

      We thank the reviewer for the suggestion, however this if very technically challenging and would require establishment of precise systems which is beyond the scope of this manuscript.

      (4) Why not measure effect of Dyngo on lipid packing directly and more broadly in model membranes?

      With the added modelling experiments supporting the previous simulations and the calculated GP values from the C-Laurdan experiments on cellular plasma membrane, we do not find it necessary to include more model membranes experiments than the already existing ones on lipid monolayers and supported lipid bilayers.

      (5) Statistics should not be done on individual cells (n>26), but rather on independent experiment (N=3?)

      We have performed the statistics on live cell particle tracking according to previous literature on similar systems (Boucrot et al., 2011; Larsson et al., 2023; Shvets et al., 2015; Stoeber et al., 2012).

      (6) Fig 1G is important but rather unclear. Firstly, these kymographs are an odd way to show that the caveolae are not moving. More importantly, caveolae in normal cells have been shown to be quite stable and immobile (eg doi: 10.1074/jbc.M117.791400), yet here they are claimed to be very mobile.

      Although this might be an odd and unconventional way to depict dynamic processes, we believe that this is a very illustrative way to show track stability over time in bulk rather than just a kymograph over a few structures in a cell. Furthermore, we are not claiming that caveolae are very mobile but rather the opposite very stable in agreement with previous work (Boucrot et al., 2011; Larsson et al., 2023; Mohan et al., 2015). We have now edited the text to make this even clearer.

      Also, if Dyngo prevents caveolae scission, there should be more of them at the membrane - why no quantification like Fig 1C to show accumulation of caveolae upon Dyngo treatment? Or directly counting caveolae via EM, as in Fig 4C?

      We are currently preforming CTxB HRP experiments using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long. However, Dynasore has previously been shown, by EM, to increase the number of caveolae at the PM (Moren et al., 2012; Sinha et al., 2011).

      (7) The writing can be made more precise and referencing could be strengthened.

      The introduction was written in a short format, and we have now extended this and made it more precise.

      Some examples:

      (a) 'scissoned' is not a word in English,

      Thanks, we have now changed this.

      (b) what is meant by "Cav1 assembly is driven by high chol content"? There are many types of caveolin assemblies.

      We agree that this can be made more precise and have now clarified this in the introduction.

      (c) "This generates a unique membrane domain with distinct lipid packing and a very high curvature." Unclear what 'this' refers to and there is no reference here, so what is the evidence for either of these claims? Caveolin-8S oligomers are not curved. Perhaps 'this' is caveolae, but they are relatively large and also not very highly curved and I am unaware of measurements of lipid packing therein.

      Caveolae are around 50 nm which in biology is a very high curvature of a membrane. It has been extensively proven that caveolae have a distinct lipid composition highly enriched in cholesterol and sphingolipids, which thereby also will generate a unique lipid packing as compared to the surrounding membrane. Yet, the reviewer is correct that lipid packing has not been measured in a caveola for obvious technical challenges. Thus, we have now changed the text to “special lipid composition”.

      The sentence following that one again makes a specific, but unreferenced, claim.

      (d) intro claims that lipid packing is critical for fission, but it is unclear quite what is meant by this claim. The references do not help, as they are often about the basic biophysics of lipids, rather than how packing affects fission.

      We have now edited the text.  

      (e) intro strongly implies that caveolae remain membrane attached because of stalled scission. How strong is the evidence for this? The fact that EHD2 is at the neck is not definitive,

      We used the term stalled scission to describe that all omega shaped membrane invaginations do not scission in the same automatic way as clathrin coated vesicles. We have now changed this in the text. Caveolae are shown to be released (undergo scission) and be detected as internal caveolae if the protein EHD2 is removed. Hence this must be interpreted as if EHD2 stalls scission. The evidence includes data compiled over the last 12 years from others and us which include for example: 1) Caveolae with EHD2 have a longer duration time (Larsson et al., 2023; Mohan et al., 2015; Moren et al., 2012; Stoeber et al., 2012), Knock down of EHD2 results in more internalized caveolae as measured by CTxB HRP using EM (Moren et al., 2012) and shorter duration time at the PM (Hubert et al., 2020; Larsson et al., 2023; Mohan et al., 2015; Stoeber et al., 2012). 2) EHD2 overexpression results in less internalized caveolae as measured by CTxB HRP using EM (Stoeber et al., 2012). Furthermore, 3) overexpression or acute addition of purified EHD2 via microinjection counteracts lipid induced scission of caveolae and hence result in caveolae stabilization at the PM (Hubert et al., 2020). It is very hard to see that the release and internalization of caveolae could result from anything else than that these have undergone scission. EHD2 has been found around the rim of caveolae (Matthaeus et al., 2022) and overexpression of EHD2 oligomerizing mutants have been shown to expand the caveola neck (Hoernke et al., 2017; Larsson et al., 2023).

      (f) unclear what is meant by 'lipid packing frustration' and how Dyngo supposedly induces it.

      Lipid packing frustration refers to what is usually referred to as lipid packing defect, but since lipid membranes are describe as a fluid system it should not have defects whereby, we believe that lipid packing frustration is more accurate. However, we have now changed the text and use “decreased lipid packing” or “decreased lipid order” more thoroughly to describe the effect on the plasma membrane.

      (8) IF of Cav1 is insufficient to claim puncta as caveolae. Co-stained puncta of caveolin with cavin are much stronger evidence. Same issue for Cav1-GFP puncta.

      We agree and have now provided IF showing cavin1 and EHD2 colocalization to Cav1GFP in non and Dyngo-4a-treated cells.

      (9) Fig 3E claims that "preferred position of Dyngo-4a was closer to the head groups" but the minimum looks to be in similar place as Fig 3B without cholesterol. Response:

      We appreciate the reviewer’s observation. The PMF minima in the POPC and POPC:Chol membranes are indeed close in absolute position (~1.1–1.2 nm from the bilayer center). However, as clarified in the revised text, the presence of cholesterol leads to a slight shift of Dyngo-4a closer to the headgroup region and broadens the positional distribution. This is also evident from the added density profiles (Fig. S3A) and is now described more precisely in the manuscript.

      Critically, these results do not support the notion that Dyngo affects lipid packing sufficiently, which is not measured in the simulations (though could be).

      We thank the reviewer for the excellent suggestion. In response, we have now included a detailed analysis of Dyngo-4a’s effect on lipid packing in the simulations. As described in the revised manuscript, we measured deuterium order parameters, area per lipid (APL), and lipid–Dyngo–cholesterol spatial distributions (Figs. 3-H, S3C-E). The results demonstrate that Dyngo-4a decreases lipid order in POPC:Chol membranes. Both single molecules and clusters reduce the order parameter by up to 0.04 units, particularly in the upper leaflet, where Dyngo-4a reside.The reduction is most pronounced in the midchain region of the sn1 tail and around the double bond of the sn2 tail. These effects were accompanied by increased APL in POPC:Chol membranes and by colocalization of Dyngo-4a near cholesterol-rich regions. Together, these data confirm that Dyngo-4a perturbs membrane organization and lipid packing in a composition-dependent manner. We believe these additions directly address the concern and demonstrate that the simulations indeed support the conclusion that Dyngo-4a modulates lipid packing.

      Finally, the simulation data do not show "that Dyngo-4a is competing with cholesterol"; it is unclear what 'competition' means in this context, but regardless, the data only shows that Dyngo sits at a similar location as cholesterol.

      We agree with the reviewer that “competition” was an imprecise term. We have rephrased the relevant sections to clarify that Dyngo-4a and cholesterol localize to overlapping regions and exhibit spatial coordination. As now stated in the manuscript, cholesterol appears to partially displace Dyngo-4a from its preferred depth seen in pure POPC, broadens its membrane distribution, and alters lipid packing. According to the order parameters there is an interplay between chol and Dyngo-4a and the heatmaps show that the distribution of chol in the membrane gets less uniform in the presence of Dyngo-4a. These interactions suggest that Dyngo-4a perturbs cholesterol-rich domains.

      As new analysis routines were added to the study, we have now also added the details on those to the Methods section of the text.

      (10) AFM measures the stiffness of the cell (as correctly explained in Results section) not "overall stiffness of the PM" as stated in the Discussion.

      We thank the reviewer for pointing this out, we have now altered this in the discussion section.

      (11) Fig2A: what was the starting lipid surface pressure? How does Dyngo insertion depend on initial lipid packing?

      The starting pressure lipid pressure was 20 mN m<sup>-1</sup which we now have incorporated in the figure legend. We performed several such experiments with a starting pressure ranging from 20-23 mN m<sup>-1</sup> showing consistent results which we described in the materials and methods section. Given that we also performed QCMD analysis and simulations on bilayers showing that Dyngo-4a adsorbed and inserted respectively, we have not performed a titration of starting pressures resulting in a MIP of Dygo-4a.

      (12) Fig 4B is a strange approach to measure membrane motion. Why not RMSD or some other displacement based method? As its shown, it implies that the area of the cell changes.

      The method that we used to quantify the area of the cell which is attached (or close to) the glass and thereby is visible in TIRF microscopy. This is area indeed changes over time which has been frequently observed and used to describe and quantify the mobility, lamellipodia and filopodia formation among other things. We agree that RMSD can also be used to analyze the data before and after treatments and we have now included RMSD­­­­ analysis in the manuscript.

      Reviewer #3 (Significance):

      The title, abstract, and introduction of the manuscript are largely framed around lipid packing, but most of the data investigate other unexpected effects of treating cells with Dyngo4a. The only measurement for lipid packing (or any other membrane properties) is Fig 4E-F. Therefore, this paper is effectively an investigation of an artefact of a common reagent, which itself could be a valuable contribution. However, the mechanism to explain its effect requires stronger evidence, and its broad biological significance needs further exploration.

      Overall, the impact of documenting the effects of Dyngo4a on membranes appears modest but may be valuable to the membrane trafficking community.

      Barucha-Kraszewska, J., S. Kraszewski, and C. Ramseyer. 2013. Will C-Laurdan dethrone Laurdan in fluorescent solvent relaxation techniques for lipid membrane studies? Langmuir. 29:1174-1182.

      Boucrot, E., M.T. Howes, T. Kirchhausen, and R.G. Parton. 2011. Redistribution of caveolae during mitosis. J Cell Sci. 124:1965-1972.

      Hoernke, M., J. Mohan, E. Larsson, J. Blomberg, D. Kahra, S. Westenhoff, C. Schwieger, and R. Lundmark. 2017. EHD2 restrains dynamics of caveolae by an ATP-dependent, membrane-bound, open conformation. Proc Natl Acad Sci U S A. 114:E4360-E4369.

      Hubert, M., E. Larsson, N.V.G. Vegesna, M. Ahnlund, A.I. Johansson, L.W. Moodie, and R. Lundmark. 2020. Lipid accumulation controls the balance between surface connection and scission of caveolae. Elife. 9.

      Larsson, E., B. Moren, K.A. McMahon, R.G. Parton, and R. Lundmark. 2023. Dynamin2 functions as an accessory protein to reduce the rate of caveola internalization. J Cell Biol. 222.

      Matthaeus, C., K.A. Sochacki, A.M. Dickey, D. Puchkov, V. Haucke, M. Lehmann, and J.W. Taraska. 2022. The molecular organization of differentially curved caveolae indicates bendable structural units at the plasma membrane. Nat Commun. 13:7234.

      McCluskey, A., J.A. Daniel, G. Hadzic, N. Chau, E.L. Clayton, A. Mariana, A. Whiting, N.N. Gorgani, J. Lloyd, A. Quan, L. Moshkanbaryans, S. Krishnan, S. Perera, M. Chircop, L. von Kleist, A.B. McGeachie, M.T. Howes, R.G. Parton, M. Campbell, J.A. Sakoff, X. Wang, J.Y. Sun, M.J. Robertson, F.M. Deane, T.H. Nguyen, F.A. Meunier, M.A. Cousin, and P.J. Robinson. 2013. Building a better dynasore: the dyngo compounds potently inhibit dynamin and endocytosis. Traffic. 14:1272-1289.

      Mohan, J., B. Moren, E. Larsson, M.R. Holst, and R. Lundmark. 2015. Cavin3 interacts with cavin1 and caveolin1 to increase surface dynamics of caveolae. J Cell Sci. 128:979-991.

      Moren, B., C. Shah, M.T. Howes, N.L. Schieber, H.T. McMahon, R.G. Parton, O. Daumke, and R. Lundmark. 2012. EHD2 regulates caveolar dynamics via ATP-driven targeting and oligomerization. Mol Biol Cell. 23:1316-1329.

      Shvets, E., V. Bitsikas, G. Howard, C.G. Hansen, and B.J. Nichols. 2015. Dynamic caveolae exclude bulk membrane proteins and are required for sorting of excess glycosphingolipids. Nat Commun. 6:6867.

      Sinha, B., D. Koster, R. Ruez, P. Gonnord, M. Bastiani, D. Abankwa, R.V. Stan, G. Butler-Browne, B. Vedie, L. Johannes, N. Morone, R.G. Parton, G. Raposo, P. Sens, C. Lamaze, and P. Nassoy. 2011. Cells respond to mechanical stress by rapid disassembly of caveolae. Cell. 144:402-413.

      Stoeber, M., I.K. Stoeck, C. Hanni, C.K. Bleck, G. Balistreri, and A. Helenius. 2012. Oligomers of the ATPase EHD2 confine caveolae to the plasma membrane through association with actin. EMBO J. 31:2350-2364.

    1. eLife Assessment

      This manuscript describes a novel method for determining the mechanical parameters of the kinesin, KIF1A, that uses fluorescence microscopy and does not require an optical tweezer apparatus. The length of a tethered fluorescent DNA nanospring is measured as the kinesin moves processively along the microtubule and then stalls. The work reports important findings, and (barring a few exceptions) the evidence supporting the claims is generally convincing.

    2. Reviewer #1 (Public review):

      Summary:

      This study uses a novel DNA origami nanospring to measure the stall force and other mechanical parameters of the kinesin-3 family member, KIF1A, using light microscopy. The key is to use SNAP tags to tether a defined nanospring between a motor-dead mutant of KIF5B and the KIF1A to be integrated. The mutant KIF5B binds tightly to a subunit of the microtubule without stepping, thus creating resistance to the processive advancement of the active KIF1A. The nanospring is conjugated with 124 Cy3 dyes, which allows it to be imaged by fluorescence microscopy. Acoustic force spectroscopy was used to measure the relationship between the extension of the NS and force as a calibration. Two different fitting methods are described to measure the length of the extension of the NS from its initial diffraction-limited spot. By measuring the extension of the NS during an experiment, the authors can determine the stall force. The attachment duration of the active motor is measured from the suppression of lateral movement that occurs when the KIF1A is attached and moving. There are numerous advantages of this technology for the study of single molecules of kinesin over previous studies using optical tweezers. First, it can be done using simple fluorescence microscopy and does not require the level of sophistication and expense needed to construct an optical tweezer apparatus. Second, the force that is experienced by the moving KIF1A is parallel to the plane of the microtubule. This regime can be achieved using a dual beam optical tweezer set-up, but in the more commonly used single-beam set-up, much of the force experienced by the kinesin is perpendicular to the microtubule. Recent studies have shown markedly different mechanical behaviors of kinesin when interrogated by the two different optical tweezer configurations. The data in the current manuscript are consistent with those obtained using the dual-beam optical tweezer set-up. In addition, the authors study the mechanical behavior of several mutants of KIF1A that are associated with KIF1A-associated neurological disorder (KAND).

      Strengths:

      The technique should be cheaper and less technically challenging than optical tweezer microscopy to measure the mechanical parameters of molecular motors. The method is described in sufficient detail to allow its use in other labs. It should have a higher throughput than other methods.

      Weaknesses:

      The experimenter does not get a "real-time" view of the data as it is collected, which you get from the screen of an optical tweezer set-up. Rather, you have to put the data through the fitting routines to determine the length of the nanospring in order to generate the graphs of extension (force) vs time. No attempts were made to analyze the periods where the motor is actually moving to determine step-size or force-velocity relationships.

    3. Reviewer #2 (Public review):

      Summary:

      This work is important because it complements other single-molecule mechanics approaches, in particular optical trapping, which inevitably exerts off-axis loads. The nanospring method has its own weaknesses (individual steps cannot be seen), but it brings new clarity to our picture of KIF1A and will influence future thinking on the kinesins-3 and on kinesins in general.

      Strengths:

      By tethering single copies of the kinesin-3 dimer under test via a DNA nanospring to a strong binding mutant dimer of kinesin-1, the forces developed and experienced by the motor are constrained into a single axis, parallel to the microtubule axis. The method is imaging-based, which should improve accessibility. In principle, at least, several single-motor molecules can be simultaneously tested. The arrangement ensures that only single molecules can contribute. Controls establish that the DNA nanospring is not itself interacting appreciably with the microtubule. Forces are convincingly calibrated, and reading the length of the nanospring by fitting to the oblate fluorescent spot is carefully validated. The excursions of the wild-type KIF1A leucine zipper-stabilised dimer are compared with those of neuropathic KIF1A mutants. These mutants can walk to a stall plateau, but the force is much reduced. The forces from mutant/WT heterodimers are also reduced.

      Weaknesses:

      The tethered nanospring method has some weaknesses; it only allows the stall force to be measured in the case that a stall plateau is achieved, and the thermal noise means that individual steps are not apparent. The nanospring does not behave like a Hookean spring - instead linearly increasing force is reported by exponentially smaller extensions of the nanospring under tension. The estimated stall force for Kif1A (3.8 pN) is in line with measurements made using 3-bead optical trapping, but those earlier measurements were not of a stall plateau, but rather of limiting termination (detachment) force, without a stall plateau. More confidence in the 3.7 pN stall plateau determined in the current work could be obtained by demonstrating that a stall at a higher force is obtained using the nanospring method on kinesin-1, which stalls at >7 pN in single bead optical trapping.

    4. Author response:

      Reviewer #1 (Public review):

      We greatly appreciate Reviewer #1’s accurate public review of our study on the kinesin motor using the DNA origami nanospring (NS). With respect to the strengths, we fully agree with Reviewer #1’s comments. Regarding the weakness, we would like to respond as follows.

      It is true that, unlike optical tweezers, our method does not provide real-time data display. Optical tweezers enable real-time observation and manipulation of kinesin molecules at arbitrary time points. Achieving real-time observation and manipulation is indeed an important challenge for the future development of the NS technique. On the other hand, Iwaki et al. (our co-corresponding author) has already investigated dynamic properties of motor proteins under load, such as step size and force–velocity relationship of myosin VI using NS. We are now preparing high spatiotemporal resolution microscopy experiments on the KIF1A system to measure its step size and force–velocity relationship, which inherently require such resolution.

      Reviewer #2 (Public review):

      We would like to thank Reviewer #2 for providing a highly accurate assessment of the strengths of our experiments. Regarding the weaknesses, we would like to respond as follows.

      First, Iwaki et al. (our co-corresponding author) have already succeeded in observing the stepping motion of myosin VI using the nanospring (NS) in their previous work. We are also currently preparing high spatiotemporal resolution microscopy experiments to observe the stepping motion of KIF1A in our system. Second, while it is true that the NS does not follow Hooke’s law, it is possible to design and construct NSs with an appropriate dynamic range by tuning the spring constant to match the forces exerted by protein molecules. Finally, we agree that our first observation of the stall plateau in KIF1A using the NS is a meaningful achievement. However, with respect to the suggestion that “increasing validity requires also studying kinesin-1,” we have a somewhat different perspective. The validity of the NS method has already been thoroughly examined in the previous work on myosin VI by Iwaki et al., where results were compared with those obtained using optical tweezers. Moreover, the focus of this manuscript is on KAND caused by KIF1A mutations. From this perspective, although we appreciate the suggestion, we consider it important to keep the present study focused on KIF1A and its implications for KAND.

    1. eLife Assessment

      This paper presents valuable findings on how autophagosomes are positioned along microtubules for their efficient fusion with lysosomes, providing significant insights into the mechanism. The evidence supporting the conclusions is solid, with high-quality fluorescence microscopy combined with Drosophila genetics. This work will be of broad interest to cell biologists interested in autophagy and related cell biology fields.

    2. Reviewer #1 (Public review):

      Summary:

      It is well known that autophagosomes/autolysosomes move along microtubules. However, as these previous studies did not distinguish between autophagosomes and autolysosomes, it remains unknown whether autophagosomes begin to move after fusion with lysosomes or even before fusion. In this manuscript, the authors show using fusion-deficient vps16a RNAi cells that both pre-fusion autophagosomes and lysosomes can move along the microtubules towards the minus end. This was confirmed in snap29 RNAi cells. By screening motor proteins and Rabs, the authors found that autophagosomal traffic is primarily regulated by the dynein-dynactin system and can be counter-regulated by kinesins. They also show that Rab7-Epg5 and Rab39-ema interactions are important for autophagosome trafficking.

      Strengths:

      This study uses reliable Drosophila genetics and high-quality fluorescence microscopy. The data are properly quantified and statistically analyzed. It is a reasonable hypothesis that gathering pre-fusion autophagosomes and lysosomes in close proximity improves fusion efficiency.

      Weaknesses:

      (1) This study investigates the behavior of pre-fusion autophagosomes and lysosomes using fusion-incompetent cells (e.g., vps16a RNAi cells). However, the claim that these cells are truly fusion-incompetent relies on citations from previous studies. Since this is a foundational premise of the research, it should be rigorously evaluated before interpreting the data. It's particularly awkward that the crucial data for vps16a RNAi is only presented at the very end of Figure 10-S1; this should be among the first data shown (the same for SNAP29). It would be important to determine the extent to which autophagosomes and lysosomes are fusing (or tethered in close proximity), within each of these cell lines.

      (2) In the new Figures 8 and 9, the authors analyze autolysosomes without knocking down Vps16A (i.e., without inhibiting fusion). However, as this reviewer pointed out in the previous round, it is highly likely that both autophagosomes and autolysosomes are present in these cells. This is particularly relevant given that the knockdown of dynein-dynactin, Rab7, and Epg5 only partially inhibits the fusion of autophagosomes and lysosomes (Figure 10H). If the goal is to investigate the effects of fusion, it would be more appropriate to analyze autolysosomes and autophagosomes separately. The authors mention that they can differentiate these two structures based on the size of mCherry-Atg8a structures. If this is the case, they should perform separate analyses for both autophagosomes and autolysosomes.

      (3) This is also a continued Issue from the previous review. The authors suggest that autophagosome movement is crucial for fusion, based on the observed decrease in fusion rates in Rab7 and Epg5 knockdown cells (Fig. 10). However, this conclusion is not well supported. It is known that Rab7 and Epg5 are directly involved in the fusion process itself. Therefore, the possibility that the observed decrease is simply due to a direct defect in fusion, rather than an impairment of movement, has not been ruled out.

      (4) The term "autolysosome maturation" appears multiple times, yet its meaning remains unclear. Does it refer to autolysosome formation (autophagosome-lysosome fusion), or does it imply a further maturation process occurring after autolysosome formation? This is not a commonly used term in the field, so it requires a clear definition.

      (5) In Figure 1-S1D, the authors state that the disappearance of the mCherry-Atg8a signal after atg8a RNAi indicates that the observed structures are not non-autophagic vacuoles. This reasoning is inappropriate. Naturally, knocking down Atg8 will abolish its signal, regardless of the nature of the vacuoles. This does not definitively distinguish autophagic from non-autophagic structures.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript by Boda et al. describes the results of a targeted RNAi screen in the background of Vps16A-depleted Drosophila larval fat body cells. In this background, lysosomal fusion is inhibited, allowing the authors to analyze the motility and localization specifically of autophagosomes, prior to their fusion with lysosomes to become autolysosomes. In this Vps16A-deleted background, mCherry-Atg8a labeled autophagosomes accumulate in the perinuclear area, through an unknown mechanism.

      The authors found that depletion of multiple subunits of the dynein/dynactin complex caused an alternation of this mCherry-Atg8a localization, moving from the perinuclear region to the cell periphery. Interactions with kinesin overexpression suggest these motor proteins may compete for autophagosome binding and transport. The authors extended these findings by examining potential upstream regulators including Rab proteins and selected effectors, and they also examined effects on lysosomal movement and autolysosome size. Altogether, the results are consistent with a model in which specific Rab/effector complexes direct movement of lysosomes and autophagosomes toward the MTOC, promoting their fusion and subsequent dispersal throughout the cell.

      Strengths:

      Although previous studies of the movement of autophagic vesicles have identified roles for microtubule-based transport, this study moves the field forward by distinguishing between effects on pre- and post-fusion autophagosomes, and by its characterization of the roles of specific Dynein, Dynactin, and Rab complexes in regulating movement of distinct vesicle types. Overall, the experiments are well controlled, appropriately analyzed, and largely support the authors' conclusions..

      Weaknesses:

      One limitation of the study is the genetic background that serves as basis for the screen. In addition to preventing autophagosome-lysosome fusion, disruption of Vps16A has been shown to inhibit endosomal maturation and to block trafficking of components to the lysosome from both the endosome and Golgi apparatus. Additional effects previously reported by the authors include increased autophagosome production and reduced mTOR signaling. Thus Vps16A-depleted cells have a number of endosome, lysosome and autophagosome-related defects, with unknown downstream consequences. Additionally, the cause and significance of the perinuclear localization of autophagosomes in this background is unclear. Thus, interpretations of the observed reversal of this phenotype are difficult, and have the caveat that they may apply only to this condition, rather than to normal autophagosomes. Additional experiments to observe autophagosome movement or positioning in a more normal environment would improve the manuscript.

      Comments on revision:

      The revised manuscript and author responses have satisfactorily met my concerns. I have no further issues and congratulate the authors on this work.

    4. Reviewer #3 (Public review):

      Summary:

      In multicellular organisms, autophagosomes are formed throughout the cytosol, while late endosomes/lysosomes are relatively enriched in the perinuclear region. It is known that autophagosomes gain access to the lysosome-enriched region by microtubule-based trafficking. The mechanism by which autophagosomes move along microtubules remains incompletely understood. In this manuscript, Péter Lőrincz and colleagues investigated the mechanism driving the movement of nascent autophagosomes along microtubule towards non-centrosomal microtubule organizing center (ncMTOC) using fly fat body as a model system. The authors took an approach by examining autophagosome positioning in cells where autophagosome-lysosome fusion was inhibited by knocking down the HOPS subunit Vps16A. Despite being generated at random positions in the cytosol, autophagosomes accumulate around the nucleus when Vps16A is depleted. They then performed an RNA interference screen to identify the factors involved in autophagosome positioning. They found that the dynein-dynactin complex is required for trafficking of autophagosomes toward ncMTOC. Dynein loss leads to the peripheral relocation of autophagosomes. They further revealed that a pair of small GTPases and their effectors, Rab7-Epg5 and Rab39-ema, are required for bidirectional autophagosome transport. Knockdown of these factors in Vps16a RNAi cells causes scattering of autophagosomes throughout the cytosol.

      Strengths:

      The data presented in this study help us to understand the mechanism underlying the trafficking and positioning of autophagosomes.

      Weaknesses:

      (1) The experiments were performed in Vps16A RNAi KD cells. Vps16A knockdown blocks fusion of vesicles derived from the endolysosomal compartments such as fusion between lysosomes. The pleiotropic effect of Vps16A RNAi may complicate the interpretation.

      (2) In this study, the transport of autophagosomes is investigated in fly fat cells. In fat cells, a large number of large lipid droplets accumulate and the endomembrane systems are distinct from that in other cell types. The knowledge gain from this study may not apply to other cell types.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      (1) To distinguish autophagosomes from autolysosomes, the authors used vps16 RNAi cells, which are supposed to be fusion deficient. However, the extent to which fusion is actually inhibited by knockdown of Vps16A is not shown. The co-localization rate of Atg8 and Lamp1 should be shown (as in Figure 8). Then, after identifying pre-fusion autophagosomes and lysosomes, the localization of each should be analyzed.

      Thank you for this insightful comment. We analyzed the colocalization of 3xmCherry-Atg8a and GFP-Lamp1, which label autophagic structures and lysosomes, respectively, in Vps16A RNAi fat body cells. As expected, Vps16A silencing markedly reduced the overlap between these two signals, indicating a strong block in autophagosome–lysosome fusion. Moreover, both 3xmCherry-Atg8a and GFP-Lamp1 became more perinuclearly localized compared to the control (luciferase RNAi) cells.

      It is also possible that autophagosomes and lysosomes are tethered by factors other than HOPS (even if they are not fused). If this is the case, autophagosomal trafficking would be affected by the movement of lysosomes.  

      Thank you for raising this possibility. While we cannot fully exclude that autophagosomes might be indirectly transported via tethering to lysosomes, we consider this unlikely. We believe that in Drosophila fat cells, autophagosomes and lysosomes rapidly fuse once in close proximity. Therefore, even if alternative tethering mechanisms exist, they are unlikely to permit prolonged joint trafficking without fusion.

      (2) The authors analyze autolysosomes in Figures 6 and 7. This is based on the assumption that autophagosome-lysosome fusion takes place in cells without vps16A RNAi. However, even in the presence of Vps16A, both pre-fusion autophagosomes and autolysosomes should exist. This is also true in Figure 8H, where the fusion of autophagosomes and lysosomes is partially suppressed in knockdown cells of dynein, dynactin, Rab7, and Epg5. If the effect of fusion is to be examined, it is reasonable to distinguish between autophagosomes and autolysosomes and analyze only autolysosomes.  

      Thank you for this careful observation. The 3xmCherry-Atg8a reporter is well suited to identify both autophagosomes and autolysosomes, as the mCherry fluorophore is resistant to degradation in the acidic environment of autolysosomes. Notably, mCherry-Atg8a–positive autolysosomes appear larger and brighter than pre-fusion autophagosomes, which are typically smaller and dimmer, especially under fusion-deficient conditions (e.g., Figure 4). Therefore, we use these morphological differences as a proxy to distinguish between the two.

      To improve structural assignment, we incorporated endogenous Lamp1 staining (Figure 10) and a Lamp1-GFP reporter (Figure 10—figure supplement 1). Vesicles positive for mCherryAtg8a but negative for Lamp1 are considered pre-fusion autophagosomes. Structures double-positive for mCherry-Atg8a and Lamp1 represent autolysosomes, while Lamp1positive, Atg8a-negative vesicles correspond to non-autophagic lysosomes. To clarify these interpretations, we revised the Results section and explained these reporters in more detail.

      (3) In this study, only vps16a RNAi cells were used to inhibit autophagosome-lysosome fusion. However, since HOPS has many roles besides autophagosome-lysosome fusion, it would be better to confirm the conclusion by knockdown of other factors (e.g., Stx17 RNAi).  

      Thank you for this valuable suggestion. We initially considered using Syntaxin17 RNAi; however, our recent findings indicate that loss of Syx17 results in a HOPS-dependent tethering lock between autophagosomes and lysosomes (DOI: 10.1126/sciadv.adu9605). In this case, tethered vesicles would likely move together, confounding the interpretation of autophagosome-specific trafficking.

      Therefore, we turned to other SNAREs such as Vamp7 and Snap29. One Snap29 RNAi was located on the appropriate chromosome needed for our genetic experiments. We generated a transgenic fly line expressing both Snap29 RNAi and the mCherry-Atg8a reporter under a fat body-specific R4 promoter. When we tested our key trafficking hits in this background, we observed similar autophagosome localization phenotypes as in Vps16A RNAi cells. These results, now included in the revised manuscript (see Figure 6), confirm that the observed transport phenotypes are not specific to Vps16A or HOPS complex loss.

      (4) Figure 8: Rab7 and Epg5 are also known to be directly involved in autophagosomelysosome tethering/fusion. Even if the fusion rate is reduced in the absence of Rab7 and Epg5, it may not be the result of defective autophagosome movement, but may simply indicate that these molecules are required for fusion itself. How do the authors distinguish between the two possibilities?

      Thank you for this important point. While Rab7 and Epg5 indeed participate in autophagosome–lysosome tethering and fusion, our data suggest they also contribute to autophagosome movement. This is evident from the distinct phenotypes observed upon Rab7 or Epg5 RNAi compared to Vps16A or SNARE RNAi. Depletion of Vps16A, Syx17, Vamp7, or Snap29 (factors involved specifically in fusion) results in perinuclear accumulation of autophagosomes. In contrast, Rab7 or Epg5 RNAi leads to a dispersed autophagosome pattern throughout the cytoplasm.

      These differences suggest that Rab7 and Epg5 play additional roles in positioning autophagosomes. Supporting this, our co-immunoprecipitation experiments show that Epg5 interacts with dynein motors. Therefore, we propose that Rab7 and Epg5 influence both autophagosome fusion and their microtubule-based transport.

      Reviewer #2 (Public review):  

      One limitation of the study is the genetic background that serves as the basis for the screening. In addition to preventing autophagosome-lysosome fusion, disruption of Vps16A has been shown to inhibit endosomal maturation and block the trafficking of components to the lysosome from both the endosome and Golgi apparatus. Additional effects previously reported by the authors include increased autophagosome production and reduced mTOR signaling. Thus Vps16A-depleted cells have a number of endosome, lysosome, and autophagosome-related defects, with unknown downstream consequences. Additionally, the cause and significance of the perinuclear localization of autophagosomes in this background is unclear. Thus, interpretations of the observed reversal of this phenotype are difficult, and have the caveat that they may apply only to this condition, rather than to normal autophagosomes. Additional experiments to observe autophagosome movement or positioning in a more normal environment would improve the manuscript.  

      Thank you for highlighting this limitation. We have tried to conduct time-lapse imaging of live fat body cells expressing 3xmCherry-Atg8a and GFP-Lamp1 to visualize the movement and fusion events of pre-fusion autophagosomes (3xmCherry-Atg8a positive and GFP-Lamp1 negative) and lysosomes (GFP-Lamp1 positive). Despite different experimental setups and durations of starvation, no vesicle movement was observed at all, so live imaging of larval Drosophila fat tissue will require time-consuming optimizations of in vitro culture conditions. Consistent with this, we did not find any literature data where organelle motility in fat body cells was successfully observed. Nuclear positioning in fat body cells was investigated in detail in an excellent study, however the authors were able to observe only very little movement of the nuclei by live imaging (Zheng et al. Nat Cell Biol. 2020 Mar;22(3):297-309. doi: 10.1038/s41556-020-0470-7), further highlighting the technical difficulties of live or timelapse imaging in this tissue type.

      Specific comments  

      (1) Several genes have been described that when depleted lead to perinuclear accumulation of Atg8-labeled vesicles. There seems to be a correlation of this phenotype with genes required for autophagosome-lysosome fusion; however, some genes required for lysosomal fusion such as Rab2 and Arl8 apparently did not affect autophagosome positioning as reported here. Thus, it is unclear whether the perinuclear positioning of autophagosomes is truly a general response to disruption of autophagosome-lysosome fusion, or may reflect additional aspects of Vps16A/HOPS function. A few things here would help. One would be an analysis of Atg8a vesicle localization in response to the depletion of a larger set of fusionrelated genes. Another would be to repeat some of the key findings of this study (effects of specific dynein, dynactin, rabs, effectors) on Atg8a localization when Syx17 is depleted, rather than Vps16A. This should generate a more autophagosome-specific fusion defect.  

      Thank you for this insightful suggestion. We recently discovered that Syx17 depletion induces a HOPS-dependent tethering lock between autophagosomes and lysosomes (DOI: 10.1126/sciadv.adu9605), making it unsuitable for modeling autophagosome-specific fusion defects. In contrast, Vamp7 and Snap29 knockdowns do not appear to cause such tethering lock. We were able to generate a suitable Drosophila line using a Snap29 RNAi transgene located on a compatible chromosome. Upon testing key hits from our screen in this background, we found that autophagosomes redistributed similarly, supporting our conclusions. These new results have been included in the revised manuscript (see Figure 6)

      Third, it would greatly strengthen the findings to monitor pre-fusion autophagosome localization without disrupting fusion. Such vesicles could be identified as Atg8a-positive Lamp-negative structures. The effects of dynein and rab depletion on the tracking of these structures in a post-induction time course would serve as an important validation of the authors' findings.  

      Thank you for this helpful suggestion. As described above, we attempted time-lapse imaging of 3xmCherry-Atg8a and GFP-Lamp1-expressing fat body cells under various conditions to identify motile pre-fusion autophagosomes. However, we did not observe any vesicle movement, regardless of the starvation duration or experimental setup. As this likely reflects technical limitations of ex vivo fat body imaging, we were unable to achieve live tracking of autophagosome dynamics without introducing perturbations. This limitation is now discussed in the revised manuscript.

      (2) The authors nicely show that depletion of Shot leads to relocalization of Atg8a to ectopic foci in Vps16A-depleted cells; they should confirm that this is a mislocalized ncMTOC by colabeling Atg8a with an MTOC component such as MSP300. The effect of Shot depletion on Atg8a localization should also be analyzed in the absence of Vps16A depletion.  

      Thank you for this positive comment. We co-labeled Atg8a with the minus-end microtubule marker Khc-nod-LacZ in both shot single knockdown and shot; vps16A double knockdown cells. Ectopic Khc-nod-LacZ-positive MTOC foci were clearly visible in both conditions, and Atg8a-positive autophagosomes accumulated around these structures. These findings confirm that Shot depletion induces ectopic MTOC formation, which correlates with autophagosome relocalization. The new data have been incorporated into the revised manuscript (see Figure 1O-S).

      (3) The authors report that depletion of Dynein subunits, either alone (Figure 6) or codepleted with Vps16A (Figure 2), leads to redistribution of mCherry-Atg8a punctae to the "cell periphery". However, only cell clones that contact an edge of the fat body tissue are shown in these figures. Furthermore, in these cells, mCherry-Atg8a punctae appear to localize only to contact-free regions of these cells, and not to internal regions of clones that share a border with adjacent cells. Thus, these vesicles would seem to be redistributed to the periphery of the fat body itself, not to the periphery of individual cells. Microtubules emanating from the perinuclear ncMTOC have been described as having a radial organization, and thus it is unclear that this redistribution of mCherry-Atg8a punctae to the fat body edge would reflect a kinesin-dependent process as suggested by the authors.  

      Thank you for this detailed observation. We frequently observe autophagosomes accumulating in contact-free peripheral regions of dynein-depleted cells, resulting in an asymmetric distribution. While previous studies describe a radial microtubule organization in fat body cells, none of them directly label MT plus ends, the direction of kinesin-based transport.

      To further explore this, we overexpressed a HA-tagged kinesin, Klp98A-3xHA, in both control and Vps16A RNAi backgrounds. Immunolabeling revealed that Klp98A localizes to the contact-free peripheral regions in both conditions, consistent with the distribution of autophagosomes in dynein knockdown cells. This supports our interpretation that kinesindependent transport drives autophagosome redistribution in the absence of dynein, and that fat body cells exhibit subtle asymmetries in MT polarity that influence this transport. These new results have been included in the revised manuscript (see Figure 3G, H).

      (4) To validate whether the mCherry-Atg8a structures in Vps16A-depleted cells were of autophagic origin, the authors depleted Atg8a and observed a loss of mCherry- Atg8a signal from the mosaic cells (Figure S1D, J). A more rigorous experiment would be to deplete other Atg genes (not Atg8a) and examine whether these structures persist.  

      Thank you for the suggestion to further validate our reporter. We depleted Atg1, a key kinase required for phagophore initiation, in the Vps16A RNAi background. This completely abolished the punctate mCherry-Atg8a distribution in knockdown cells (see Figure 1—figure supplement 1E, K), confirming that the labeled structures are indeed of autophagic origin.

      (5) The authors found that only a subset of dynein, dynactin, rab, and rab effector depletions affected mCherry-Atg8a localization, leading to their suggestion that the most important factors involved in autophagosome motility have been identified here. However, this conclusion has the caveat that depletion efficiency was not examined in this study, and thus any conclusions about negative results should be more conservative.  

      Thank you for this constructive feedback. We agree that negative results must be interpreted conservatively due to potential differences in knockdown efficiency. We have revised our conclusions accordingly, clarifying that the factors identified are key for autophagosome motility, while acknowledging the possibility of false negatives.

      Reviewer #3 (Public review):  

      Major concerns:

      (1) The localization of EPG5 should be determined. The authors showed that EPG5 colocalizes with endogenous Rab7. Rab7 labels late endosomes and lysosomes. Previous studies in mammalian cells have shown that EPG5 is targeted to late endosomes/lysosomes by interacting with Rab7. EPG5 promotes the fusion of autophagosomes with late endosomes/lysosomes by directly recognizing LC3 on autophagosomes and also by facilitating the assembly of the SNARE complex for fusion. In Figure 5I, the EPG5/Rab7colocalized vesicles are large and they are likely to be lysosomes/autolysosomes.

      Thank you for suggesting to improve our Epg5 localization data. We performed triple immunostaining for Atg8a, Lamp1-3xmCherry, and Epg5-9xHA in S2R+ cells. In addition to triple-positive structures—likely representing autolysosomes—we observed Atg8a and Epg59xHA double-positive vesicles that lacked Lamp1-3xmCherry signal, which likely correspond to pre-fusion autophagosomes. Based on these results, we propose that in addition to arriving via the endocytic route, Epg5 may also reach lysosomes through autophagosomes. These findings have been included in the revised manuscript (see Figure 5K).

      (2) The experiments were performed in Vps16A RNAi KD cells. Vps16A knockdown blocks fusion of vesicles derived from the endolysosomal compartments such as fusion between lysosomes. The pleiotropic effect of Vps16A RNAi may complicate the interpretation. The authors need to verify their findings in Stx17 KO cells, as it has a relatively specific effect on the fusion of autophagosomes with late endosomes/lysosomes.  

      Thank you for this valuable suggestion. We initially considered Syntaxin17 for validation; however, we recently found that loss of Syx17 leads to a HOPS-dependent tethering lock between autophagosomes and lysosomes, which would confound interpretation, as autophagosomes remain tethered to lysosomes (DOI: 10.1126/sciadv.adu9605). Therefore, Syntaxin17 loss is not suitable for our purpose. Among the remaining fusion SNAREs, one RNAi line targeting Snap29 was available on a compatible chromosome for generating Drosophila lines equivalent to those used in the screen. We established this Snap29 RNAicontaining tester line and crossed it with our top hits. We observed that autophagosome motility was comparable to that in the Vps16A RNAi background, further supporting our conclusions. These results have been incorporated into the revised manuscript (see Figure 6)

      (3) Quantification should be performed in many places such as in Figure S4D for the number of FYVE-GFP labeled endosomes and in Figures S4H and S4I for the number and size of lysosomes.  

      Thank you for pointing this out. We performed the suggested quantifications and statistical analyses for FYVE-GFP labeled endosomes, as well as for the number and size of lysosomes. The updated data are now presented in the revised Figure 5—figure supplement 1.

      (4) In this study, the transport of autophagosomes is investigated in fly fat cells. In fat cells, a large number of large lipid droplets accumulate and the endomembrane systems are distinct from that in other cell types. The knowledge gained from this study may not apply to other cell types. This needs to be discussed.

      Thank you for raising this important point. We agree that our findings may not be fully generalizable to all cell types. Given that the organization of the microtubule network depends on both cell function and developmental stage, it is plausible that the molecular machinery described here operates differently elsewhere. We now mention this limitation in the Discussion.

      Minor concerns:  

      (5) Data in some panels are of low quality. For example, the mCherry-Atg8a signal in Figure 5C is hard to see; the input bands of Dhc64c in Figure 5L are smeared.  

      Thank you for pointing this out. We repeated the experiment shown in Figure 5C and replaced the panel with a clearer image. The smeared Dhc64C input bands in Figure 5L result from the unusually large size of this protein, which affects its electrophoretic migration. We mentioned this point in the corresponding figure legend.

      (6) In this study, both 3xmCherry-Atg8a and mCherry-Atg8a were used. Different reporters make it difficult to compare the results presented in different figures.  

      Thank you for this comment. Both 3xmCherry-Atg8a and mCherry-Atg8a are well-established reporters that behave similarly as autophagic markers. Nevertheless, to avoid confusion, we ensured that each figure uses only one type of reporter consistently, which is now clearly indicated in the revised manuscript.

      (7) The small autophagosomes presented in Figures such as in Figure 1D and 1E are not clear. Enlarged images should be presented.  

      Thank you for your suggestion. We repeated these experiments and replaced the relevant panels with higher-quality images, including enlarged insets to better visualize small autophagosomes. These updated figures are now included in the revised manuscript.

      (8) The authors showed that Epg5-9xHA coprecipitates with the endogenous dynein motor Dhc64C. Is Rab7 required for the interaction?  

      Thank you for this insightful question. We tested this by co-transfecting S2R+ cells with Epg5-9xHA and different forms of Rab7: wild-type, GTP-locked (constitutively active), and GDP-locked (dominant-negative). Our results indicate that the strength of Epg5-Dhc interaction does not change in the presence of either GTP-locked or GDP-locked Rab7. However, we believe that Epg5 and dynein are recruited to the vesicle membranes via Rab7 in vivo, so we did not include these results in the revised manuscript.

      (9) The perinuclear lysosome localization in Epg5 KD cells has no indication that Epg5 is an autophagosome-specific adaptor.

      Thank you for this important comment. Accordingly, we have toned down our statements about Epg5 functions throughout the revised manuscript.

      Reviewer #1 (Recommendations for the authors):  

      (1) Figure 6: What do "autolysosome maturation" and "small autolysosomes" mean? Do different numbers of lysosomes fuse to a single autophagosome?

      Thank you for highlighting this point. We concluded that the formation of smaller autolysosomes—compared to controls—is likely due to a defect in autolysosome maturation, as is often the case. We had not explicitly considered whether a different number of lysosomes fuse with each autophagosome during this process. We clarified this issue in the revised manuscript.

      (2) Figure 5A shows that the localization of endogenous Atg8 requires Epg5, but the data is not as clear as for mCherry-Atg8 (Figure 4B). Why the difference?  

      Thank you for this question. The difference arises because the mCherry-Atg8a reporter strongly labels autolysosomes, as the mCherry fluorophore remains stable in acidic compartments. As a result, mCherry-Atg8a labels both autophagosomes and autolysosomes, but the strong autolysosomal signal originating from the surrounding GFP negative, nonRNAi cells can make accumulated autophagosomes appear fainter in fusion-defective cells (as in Figure 4). In contrast, endogenous Atg8a is degraded in lysosomes, and therefore labels only autophagosomes. This means that the appearance of these two experiments can be slightly different, but since in both cases autophagosomes no longer accumulate in the perinuclear region of Vps16A,Epg5 double RNAi cells we can conclude that Epg5 is required for autophagosome positioning. We explained this difference of the two methods in the revised manuscript where it first appears (Figure 1B and Figure 1—figure supplement 1A).

      (3) Blue letters on the black micrographs are hard to see. Some of the other letters are also small and hard to read.  

      Thank you for this suggestion. We improved the visibility and readability of the labels in the revised figures.

    1. eLife Assessment

      This is an important study that utilizes proteomic and genetic approaches to identify the glycoprotein quality control factor malectin as a pro-viral host protein involved in the replication of coronavirus. The evidence supporting this conclusion is convincing, although continued elucidation of the mechanistic basis of malectin-mediated viral replication would further strengthen these findings. This work will be of interest to cell biologists studying the molecular mechanisms of glycoprotein quality control and virologists studying the host-pathogen interactions.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors employ a combined proteomic and genetic approach to identify the glycoprotein QC factor malectin as an important protein involved in promoting coronavirus infection. Using proteomic approaches, they show that the non-structural protein NSP2 and malectin interact in the absence of viral infection, but not in the presence of viral infection. However, both NSP2 and malectin engage the OST complex during viral infection, with malectin also showing reduced interactions with other glycoprotein QC proteins. Malectin KD reduce replication of coronaviruses, including SARS-COV2. Collectively, these results identify Malectin as a glycoprotein QC protein involved in regulating coronavirus replication that could potentially be targeted to mitigate coronavirus replication.

      In the revised manuscript, the authors have addressed many of my comments from the previous submission. Notably, they've provided some additional mechanistic data, focused primarily on the activation of different stress signaling pathways, to help define malectin impacts viral replication, although this is mostly suggests that activation of these pathways may not be the main mechanism of malectin-dependent reductions in viral replication. Regardless, I'm sure this mechanism will be the focus of continued efforts on this project. They have also addressed other concerns related to interactions between OST and malectin, as well as the curious interactions between non-structural proteins with both ER and mitochondrial proteins. Overall, the authors have been responsive to my comments and comments from other reviewers, and the manuscript has been improved. It will be a good addition to eLife.

    3. Reviewer #3 (Public review):

      Summary:

      In their revised manuscript, the authors addressed most of the reviewers' concerns. One concern was the emphasis on increased MLEC-OST interactions during infection, which the authors toned down in the revision. They clarified that MLEC interaction with OST is maintained-rather than increased-during infection, while its interaction with other QC factors decreases. They also added context and discussion of the co-localization of viral proteins with ER and mitochondrial proteins, noting that both nsp2 and MLEC localize to mitochondria-associated membranes (MAMs), providing a plausible explanation for these interactions.

      Another concern involved the effects of MLEC KD on the cellular environment. To address this, the authors analyzed stress pathway activation and glycosylation of endogenous proteins in MLEC KD cells. They found only modest upregulation of the HSF1 pathway and no changes in the UPR or other stress responses, suggesting MLEC KD does not broadly disrupt ER proteostasis. Additionally, glycopeptide profiling showed only minor changes in host protein glycosylation, supporting a more direct role for MLEC in viral replication rather than general host glycoprotein disruption.

      However, some weaknesses remain. Direct interaction between MLEC and nsp2 during infection was not detected, and the identified viral glycopeptides were limited to only five Spike sites. Furthermore, the mechanism by which MLEC promotes viral replication is still unclear.

      In summary, the authors strengthened the manuscript by addressing reviewers' concerns through additional data, clarified language, and expanded discussion. While the overall support for MLEC's pro-viral role is solid, its precise mechanism of action remains speculative. Future work will be needed to directly link MLEC's activity to specific steps in viral protein biogenesis and replication.

      Original summary: In this study, Davies and Plate set out to discover conserved host interactors of coronavirus non-structural proteins (Nsp). They used 293T cells to ectopically express flag-tagged Nsp2 and Nsp4 from five human and mouse coronaviruses, including SARS-CoV-1 and 2, and analyzed their interaction with host proteins by affinity purification mass-spectrometry (AP-MS). To confirm whether such interactors play a role in coronavirus infection, the authors measured the effects of individual knockdowns on replication of murine hepatitis virus (MHV) in mouse Delayed Brain Tumor cells. Using this approach, they identified a previously undescribed interactor of Nsp2, Malectin (Mlec), which is involved in glycoprotein processing and shows a potent pro-viral function in both MHV and SARS-CoV-2. Although the authors were unable to confirm this interaction in MHV-infected cells, they show that infection remodels many other Mlec interactions, recruiting it to the ER complex that catalyzes protein glycosylation (OST). Mlec knockdown reduced viral RNA and protein levels during MHV infection, although such effects were not limited to specific viral proteins. However, knockdown reduced the levels of five viral glycopeptides that map to Spike protein, suggesting it may be affected by Mlec.

      Strengths:

      This is an elegant study that uses a state-of-the-art quantitative proteomic approach to identify host proteins that play critical roles in viral infection. Instead of focusing on a single protein from a single virus, it compares the interactomes of two viral proteins from five related viruses, generating a high confidence dataset. The functional follow-ups using multiple live and reporter viruses, including MHV and CoV2 variants, convincingly depict a pro-viral role for Mlec, a protein not previously implicated in coronavirus biology.

      Weaknesses:

      Although a commonly used approach, AP-MS of ectopically expressed viral proteins may not accurately capture infection-related interactions. The authors observed Mlec-Nsp2 interactions in transfected 293T cells (1C) but were unable to reproduce those in mouse cells infected with MHV (3C). EIF4E2/GIGYF2, two bonafide interactors of CoV2 Nsp2 from previous studies, are listed as depleted compared to negative controls (S1D). Most other CoV2 Nsp2 interactors are also depleted by the same analysis (S1D). Previously reported MERS Nsp2 interactors, including ASCC1 and TCF25, are also not detected (S1D). Furthermore, although GIGYF2 was not identified as an interactor of MHV Nsp2/4 in human cells (S1D), its knockdown in mouse cells reduced MHV titers about 1000 fold (S4). The authors should attempt to explain these discrepancies.

      More importantly, the authors were unable to establish a direct link between Mlec and the biogenesis of any viral or host proteins, by mass-spectrometry or otherwise. Although it is clear that Mlec promotes coronavirus infection, the mechanism remains unclear. Its knockdown does not affect the proteome composition of uninfected cells (S15B), suggesting it is not required for proteome maintenance under normal conditions. The only viral glycopeptides detected during MHV infection originated from Spike (5D), although other viral proteins are also known to be glycosylated. Cells depleted for Mlec produce ~4-fold less Spike protein (4E) but no more than 2-fold less glycosylated spike peptides (5D), compounding the interpretation of Mlec effects on viral protein biogenesis. Furthermore, Spike is not essential for the pro-viral role of Mlec, given that Mlec knockdown reduces replication of SARS-CoV-2 replicons that express all viral proteins except for Spike (6A/B).

      Any of the observed effects on viral protein levels could be secondary to multiple other processes. Interventions that delay infection for any reason could lead to imbalance of viral protein levels, because Spike and other structural proteins are produced at a much higher rate than non-structural proteins due to the higher abundance of their cognate subgenomic RNAs. Similarly, the observation that Mlec depletion attenuates MHV-mediated changes to the host proteome (S15C/D) can also be attributed to indirect effects on viral replication, regardless of glycoprotein processing. In the discussion, the authors acknowledge that Mlec may indirectly affect infection through modulation of replication complex formation or ER stress, but do not offer any supporting evidence. Interestingly, plant homologs of Mlec are implicated in innate immunity, favoring a more global role for Mlec in mammalian coronavirus infections.

      Finally, the observation that both Nsp2 (3C) and Mlec (3E/F) are recruited to the OST complex during MHV infection neither support nor refute any of these alternate hypotheses, given that Mlec is known to interact with OST in uninfected cells and that Nsp2 may interact with OST as part of the full length unprocessed Orf1a, as it co-translationally translocates into the ER.

      Therefore, the main claims about the role of Mlec in coronavirus protein biogenesis are only partially supported.

      Comments on revisions:

      Figure 7B should be revised to show that MLEC maintains interactions with rather than recruited to the OST.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors employ a combined proteomic and genetic approach to identify the glycoprotein QC factor malectin as an important protein involved in promoting coronavirus infection. Using proteomic approaches, they show that the non-structural protein NSP2 and malectin interact in the absence of viral infection, but not in the presence of viral infection. However, both NSP2 and malectin engage the OST complex during viral infection, with malectin also showing reduced interactions with other glycoprotein QC proteins. Malectin KD reduce replication of coronaviruses, including SARS-COV2. Collectively, these results identify Malectin as a glycoprotein QC protein involved in regulating coronavirus replication that could potentially be targeted to mitigate coronavirus replication.

      Overall, the experiments described appear well performed and the interpretations generally reflect the results. Moreover, this work identifies Malectin as an important pro-viral protein whose activity could potentially be therapeutically targeted for the broad treatment of coronavirus infection. However, there are some weaknesses in the work that, if addressed, would improve the impact of the manuscript.

      Notably, the mechanism by which malectin regulates viral replication is not well described. It is clear from the work that malectin is a pro-viral protein in the work presented, but the mechanistic basis of this activity is not pursued. Some potential mechanisms are proposed in the discussion, but the manuscript would be strengthened if additional insight was included. For example, does the UPR activated to higher levels in infected cells depleted of malectin? Do glycosylation patterns of viral (or non-viral) proteins change in malectindepleted cells? Additional insight into this specific question would significantly improve the manuscript.

      We concur with the reviewer that the mechanism by which Malectin regulates viral replication is an important point to elucidate further. Our proteomics data were able to offer additional insight into the questions posed here. We examined the upregulation of protein markers of the UPR and other stress response pathways in cells depleted of MLEC (Fig. S15D). We find that the UPR pathways are moderately but insignificantly upregulated, while the Heat Shock Factor 1 (HSF1) pathway is moderately and significantly upregulated. The fold change increase of these marker proteins are relatively small, so while upregulation of this pathway may contribute to the suppression of CoV replication, it may not fully explain the phenotype.

      In addition, to address the second question, we compared the glycosylation patterns of endogenous proteins in MLEC-KD cells (Fig. S15E-G). We found that there is a small increase in abundance of glycopeptides associated with LAMP2, SERPHINH1, RDX, RPL3/5, CADM4, and ITGB1, however these fold changes are small and tested to be insignificant. These results indicate there is relatively little modification of endogenous glycoproteins upon MLEC-depletion. These findings support a more direct role for MLEC in regulating viral replication.

      We added the following section to the manuscript text to discuss these results:

      “In uninfected cells, MLEC KD leads to relatively little proteome-wide changes, with MLEC being the only protein significantly downregulated and no other proteins significantly upregulated, supporting the specificity of MLEC KD in MHV suppression (Fig.  S15C). To determine whether MLEC KD alters general host proteostasis, we further examined the levels of protein markers of stress pathways based on previous gene pathway definitions(Davies et al., 2023; Grandjean et al., 2019; Shoulders et al., 2013) (Fig. S15D). We find that there are modest but significant increases in protein levels associated with the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response (UPR) pathways are largely unmodified. 

      We also probed the effect of MLEC KD on endogenous protein glycosylation. We find that there is only a small increase in abundance of glycopeptides, including those associated with the ribosome (Rpl3, Rpl5), a cytoskeletal protein (Rdx), the integrin Itgb1, and the ER-resident chaperone Serphinh1 (Fig. S15E-G).”

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). In addition, there are only minor increases in endogenous glycopeptide levels (Fig. S15E-G). Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      Further, the evidence for increased interactions between OST and malectin during viral infection is fairly weak, despite being a major talking point throughout the manuscript. The reduced interactions between malectin and other glycoproteostasis QC factors is evident, but the increased interactions with OST are not well supported. I'd recommend backing off on this point throughout the text, instead, continuing to highlight the reduced interactions.

      We agree that the fold change increase of OST interactions with malectin are small compared to the fold change decrease of other glycoproteostasis factors We have modified the text to less emphasize this point and instead highlight the reduced interactions:

      “Further, MHV infection retains the association of MLEC with the OST complex while titrating off other interactors, potentially leading to more efficient glycoprotein biogenesis.”

      I was also curious as to why non-structural proteins, nsp2 and nsp4, showed robust interactions with host proteins localized to both the ER and mitochondria? Do these proteins localize to different organelles or do these interactions reflect some other type of dysregulation? It would be useful to provide a bit of speculation on this point.

      We also find these ER and mitochondrial protein interactions curious, which we initially reported on (Davies, Almasy et al. 2020 ACS Infectious Diseases). In this prior report, we found that when expressed in HEK293T cells, SARS-CoV-2 nsp2 and nsp4 have partial localization to mitochondrial-associated ER membranes (MAMs), as determined by subcellular fractionation. Given that malectin has also been shown to have MAMs localization (Carreras-Sureda, et al. 2019 Nature Cell Biology), we have added additional text in the Discussion to speculate on this point:

      “Additionally, MLEC has also been shown to localize to ER-mitochondria contact sites (MAMs)(Carreras-Sureda et al., 2019), which regulate mitochondrial bioenergetics. We have previously shown that SARS-CoV-2 nsp2 and nsp4 can partially localize to MAMs(Davies et al., 2020), so these viral proteins may also dysregulate MLEC and MAMs activity to promote infection.”

      Again, the overall identification of malectin as a pro-viral protein involved in the replication of multiple different coronaviruses is interesting and important, but additional insights into the mechanism of this activity would strengthen the overall impact of this work.

      Thank you for this endorsement. We hope the additional analyses and discussion points in the revised manuscript further homed in on a direct mechanistic function for MLEC in modulating viral replication.

      Reviewer #2 (Public Review):

      Summary:

      A strong case is presented to establish that the endoplasmic reticulum carbohydrate binding protein malectin is an important factor for coronavirus propagation. Malectin was identified as a coronavirus nsp2 protein interactor using quantitative proteomics and its importance in the viral life cycle was supported by using a functional genetic screen and viral assays. Malectin binds diglucosylated proteins, an early glycoform thought to transiently exist on nascent chains shortly after translation and translocation; yet a role for malectin has previously been proposed in later quality control decisions and degradation targeting. These two observations have been difficult to reconcile temporally. In agreement with results from the Locher lab, the malectininteractome shown here includes a number of subunits of the oligosaccharyltransferase complex (OST). These results place malectin in close proximity to both the co-translational (STT3A or OST-A) and post-translational (STT3B or OST-B) complexes. It follows that malectin knockdown was associated with coronavirus Spike protein hypoglycosylation.

      Strengths:

      Strengths include using multiple viruses to identify interactors of nsp2 and quantitative proteomics along with multiple viral assays to monitor the viral life cycle.

      Weaknesses:

      Malectin knockdown was shown to be associated with Spike protein hypoglycosylation. This was further supported by malectin interactions with the OSTs. However, no specific role of malectin in glycosylation was discussed or proposed.

      We have emphasized our hypotheses on this point in the discussion and added a summary figure to highlight the specific role of malectin.

      Given the likelihood that malectin plays a role in the glycosylation of heavily glycosylated proteins like Spike, it is unfortunate that only 5 glycosites on Spike were identified using the MS deamidation assay when Spike has a large number of glycans (~22 sites). The mass spec data set would also include endogenous proteins. Were any heavily glycosylated endogenous proteins hypoglycosylated in the MS analysis in Fig 5D?

      Thank you for this suggestion. We compared the glycosylation patterns of endogenous proteins in MLEC-KD cells (Fig. S15E-G). We found that there is a small increase in abundance of glycopeptides associated with LAMP2, SERPHINH1, RDX, RPL3/5, CADM4, and ITGB1, however these fold changes are small and tested insignificant. These results indicate there is relatively little modification of endogenous glycoproteins upon MLEC-depletion. We added the following sections:

      “We also probed the effect of MLEC KD on endogenous protein glycosylation. We find that there is only a small increase in abundance of glycopeptides, including those associated with the ribosome (Rpl3, Rpl5), a cytoskeletal protein (Rdx), the integrin Itgb1, and the ER-resident chaperone Serphinh1 (Fig. S15E-G).”

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). In addition, there are only minor increases in endogenous glycopeptide levels (Fig. S15E-G). Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      The inclusion of the nsp4 interactome and its partial characterization is a distraction from the storyline that focuses on malectin and nsp2.

      We believe the nsp4 comparative interactome and functional genomics data offers a rich resource for further functional investigation by others, if made public. While we found the malectin and nsp2 storyline the most compelling to pursue, we believe the inclusion of the nsp4 data strengthens the overall approach, in agreement with Reviewer #3’s comments.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Davies and Plate set out to discover conserved host interactors of coronavirus non-structural proteins (Nsp). They used 293T cells to ectopically express flag-tagged Nsp2 and Nsp4 from five human and mouse coronaviruses, including SARS-CoV-1 and 2, and analyzed their interaction with host proteins by affinity purification mass-spectrometry (AP-MS). To confirm whether such interactors play a role in coronavirus infection, the authors measured the effects of individual knockdowns on replication of murine hepatitis virus (MHV) in mouse Delayed Brain Tumor cells. Using this approach, they identified a previously undescribed interactor of Nsp2, Malectin (Mlec), which is involved in glycoprotein processing and shows a potent pro-viral function in both MHV and SARS-CoV-2. Although the authors were unable to confirm this interaction in MHVinfected cells, they show that infection remodels many other Mlec interactions, recruiting it to the ER complex that catalyzes protein glycosylation (OST). Mlec knockdown reduced viral RNA and protein levels during MHV infection, although such effects were not limited to specific viral proteins. However, knockdown reduced the levels of five viral glycopeptides that map to Spike protein, suggesting it may be affected by Mlec.

      Strengths:

      This is an elegant study that uses a state-of-the-art quantitative proteomic approach to identify host proteins that play critical roles in viral infection. Instead of focusing on a single protein from a single virus, it compares the interactomes of two viral proteins from five related viruses, generating a high confidence dataset. The functional follow-ups using multiple live and reporter viruses, including MHV and CoV2 variants, convincingly depict a pro-viral role for Mlec, a protein not previously implicated in coronavirus biology.

      Weaknesses:

      Although a commonly used approach, AP-MS of ectopically expressed viral proteins may not accurately capture infection-related interactions. The authors observed Mlec-Nsp2 interactions in transfected 293T cells (1C) but were unable to reproduce those in mouse cells infected with MHV (3C). EIF4E2/GIGYF2, two bonafide interactors of CoV2 Nsp2 from previous studies, are listed as depleted compared to negative controls (S1D). Most other CoV2 Nsp2 interactors are also depleted by the same analysis (S1D). Previously reported MERS Nsp2 interactors, including ASCC1 and TCF25, are also not detected (S1D). Furthermore, although GIGYF2 was not identified as an interactor of MHV Nsp2/4 in human cells (S1D), its knockdown in mouse cells reduced MHV titers about 1000 fold (S4). The authors should attempt to explain these discrepancies.

      We acknowledge these limitations in AP-MS from ectopically expressed viral proteins and have addressed these discrepancies with further elaboration in the text:

      “A limitation of our study is the initial overexpression of individual proteins for AP-MS, in which we find some variation between our data with other AP-MS studies. We sought to overcome these variations by focusing on conserved interactors and testing interactions in a live infection context.”

      “We also found GIGYF2-KD strongly suppressed MHV infection, despite GIGYF2 not interacting with MHV nsp2 (Fig. S1D), highlighting the importance of proteostasis factors in infection regardless of direct PPIs.”

      More importantly, the authors were unable to establish a direct link between Mlec and the biogenesis of any viral or host proteins, by mass-spectrometry or otherwise. Although it is clear that Mlec promotes coronavirus infection, the mechanism remains unclear. Its knockdown does not affect the proteome composition of uninfected cells (S15B), suggesting it is not required for proteome maintenance under normal conditions. The only viral glycopeptides detected during MHV infection originated from Spike (5D), although other viral proteins are also known to be glycosylated. Cells depleted for Mlec produce ~4-fold less Spike protein (4E) but no more than 2-fold less glycosylated spike peptides (5D), compounding the interpretation of Mlec effects on viral protein biogenesis. Furthermore, Spike is not essential for the pro-viral role of Mlec, given that Mlec knockdown reduces replication of SARS-CoV-2 replicons that express all viral proteins except for Spike (6A/B).

      Thank you, these are all important points. We have acknowledged these compounding factors in the Discussion:

      “Concurrently, knockdown of MLEC leads to impediment of nsp production and aberrant glycosylation of other viral proteins like Spike, though it should be noted that the decrease in Spike glycopeptides is compounded by the overall decrease in Spike protein. Given that MLEC is pro-viral in a SARS-CoV-2 replicon model lacking Spike (Fig. 6), MLEC can promote CoV replication independent of Spike production.”

      Any of the observed effects on viral protein levels could be secondary to multiple other processes.Interventions that delay infection for any reason could lead to an imbalance of viral protein levels because Spike and other structural proteins are produced at a much higher rate than non-structural proteins due to the higher abundance of their cognate subgenomic RNAs. Similarly, the observation that Mlec depletion attenuates MHV-mediated changes to the host proteome (S15C/D) can also be attributed to indirect effects on viral replication, regardless of glycoprotein processing. In the discussion, the authors acknowledge that Mlec may indirectly affect infection through modulation of replication complex formation or ER stress, but do not offer any supporting evidence. Interestingly, plant homologs of Mlec are implicated in innate immunity, favoring a more global role for Mlec in mammalian coronavirus infections.

      We examined the upregulation of protein markers of the UPR and other stress response pathways in cells depleted of MLEC (Fig. S15D). We find that the UPR pathways are moderately but insignificantly upregulated, while the Heat Shock Factor 1 (HSF1) pathway is moderately and significantly upregulated. The fold change increase of these marker proteins are relatively small, so while upregulation of this pathway may contribute to the suppression of CoV replication, it may not fully explain the phenotype. Please all see similar points brought up by reviewer 1 (comment 1). We added the following section to the manuscript text to discuss these results:

      “In uninfected cells, MLEC KD leads to relatively little proteome-wide changes, with MLEC being the only protein significantly downregulated and no other proteins significantly upregulated, supporting the specificity of MLEC KD in MHV suppression (Fig.  S15C). To determine whether MLEC KD alters general host proteostasis, we further examined the levels of protein markers of stress pathways based on previous gene pathway definitions(Davies et al., 2023; Grandjean et al., 2019; Shoulders et al., 2013) (Fig. S15D). We find that there are modest but significant increases in protein levels associated with the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response (UPR) pathways are largely unmodified. 

      “Our proteomics data reveals that there is only a modest increase in the Heat Shock Factor 1 (HSF1) pathway, while the Unfolded Protein Response is relatively unchanged (Fig. S15D). […] Together, these results indicate that while MLEC KD leads to some alterations in ER proteostasis and host glycosylation, these changes are modest and may not be the primary mechanism by which MLEC KD hinders CoV replication.”

      Finally, the observation that both Nsp2 (3C) and Mlec (3E/F) are recruited to the OST complex during MHV infection neither support nor refute any of these alternate hypotheses, given that Mlec is known to interact with OST in uninfected cells and that Nsp2 may interact with OST as part of the full length unprocessed Orf1a, as it co-translationally translocates into the ER. Therefore, the main claims about the role of Mlec in coronavirus protein biogenesis are only partially supported.

      We have acknowledged this point in the Discussion. 

      “We find that nsp2 interacts with several OST complex members, including DDOST, STT3A, and RPN1, though whether this is as part of the uncleaved Orf1a polyprotein during co-translational ER translocation or as an individual protein is unclear.”

      Reviewer #2 (Recommendations For The Authors):

      What is the proof that MLEC is a type I membrane protein? If it is strictly sequence analysis, this conclusion should be tapered in the text.

      Our response: We have added appropriate evidence on the biochemical characterization of MLEC topology from Galli et al., 2011, and cryo-EM structural characterization by Ramírez et al., 2019.

      “As it was surprising that nsp2, a non-glycosylated, cytoplasmic protein, would interact with MLEC, an integral ER membrane protein with a short two amino acid cytoplasmic tail(Galli et al., 2011; Ramírez et al., 2019), we assessed a broader genetic interaction between nsp2 and MLEC.”

      Validation of some of the nsp2 and malectin interactome components by pulldowns should be included.

      Our response: The interactions between nsp2 and Ddost, Stt3A, and Rpn1 passed a stringent confidence filter in our AP-MS experiment (Fig. 3C) based on several replication. For this reason, we do not believe additional validation by Western blotting will offer much useful information.

      NGI-1 inhibition of glycosylation looks to be very weak in Fig. 5B and Fig. S14B.

      Our response: It is important to note that the NGI-1 inhibition assay used a suboptimal NGI-1 concentration to prevent full suppression of MHV infection, which we have found previously. We have added this justification in the Methods section and associated figure legend (Fig. S14A).

      “The 5 uM NGI-1 dosage was chosen as it resulted in partial inhibition of glycosylation while not completely blocking MHV infection.”

      “This dosage and timing were chosen to partially inhibit the OST complex without fully ablating viral infection, as NGI-1 has been shown previously to be a potent positive-sense RNA virus inhibitor(Puschnik et al., 2017)  (Fig. S14)”

      Summary model figure at the end would help to communicate the conclusions.

      Our response: Thank you for this suggestion. We agree and have added a summary model figure at the end as suggested.

    1. eLife Assessment

      The paper is a fundamental study examining the role of CDK12 loss in prostate cancer. While previous studies have suggested that CDK12 loss confers HRD phenotypes, clinical trials using PARPi in CDK12 altered patients have not demonstrated significant benefit. This work investigates these mechanisms in depth and provides compelling evidence. A comprehensive genomic analysis serves an excellent resource to the field, showing that biallelic CDK12 alterations do not have genomic features of HRd. Moreover, the study explored both acute and chronic deletion of CDK12, with data suggestive of CDK12-altered cells being uniquely sensitive to CDK13 inhibition. While some minor weaknesses have been previously noted by the reviewers, the authors have adequately addressed these concerns with appropriate rigor.

    2. Reviewer #1 (Public review):

      Summary:

      The authors were attempting to identify the molecular and cellular basis for why modulators of the HR pathway, specifically PARPi, are not effective in CDK12 deleted or mutant prostate cancers and they seek to identify new therapeutic agents to treat this subset of metastatic prostate cancer patients. Overall, this is an outstanding manuscript with a number of strengths and in my opinion represents a significant advance in the field of prostate cancer biology and experimental therapeutics.

      Strengths:

      The patient data cohort size and clinical annotation from Figure 1 are compelling and comprehensive in scope. The associations between tandem duplications and amplifications of oncogenes that have been well-credentialed to be drivers of cancer development and progression are fascinating and the authors identify that in those that have AR amplification for example, there is evidence for AR pathway activation. The association between CDK12 inactivation and various specific gene/pathway perturbations is fascinating and is consistent with previously published studies - it would be interesting to correlate these changes with cell line-based studies in which CDK12 is specifically deleted or inhibited with small molecules to see how many pathways/gene perturbations are shared between the clinical samples and cell and mouse models with CDK12 perturbation. The short-term inhibitor studies related to changes in HRD genes and protein expression with CDK12/13 inhibition are fascinating and suggest differential pathway effects between short inhibition of CDK12/13 and long-term loss of CDK12. The in vivo studies with the inhibitor of CDK12/13 are intriguing but not definitive

      Weaknesses:

      Given that there are different mutations identified at different CDK12 sites as illustrated in Figure 1B it would be nice to know which ones have been functionally classified as pathogenic and for which ones that the pathogenicity has not been determined. This would be especially interesting to perform in light of the differences in the LOH scores and WES data presented - specifically, are the pathogenic mutations vs the mutations for which true pathogenicity is unknown more likely to display LOH or TD? For the cell inhibition studies with the CDK12/13 inhibitor, more details characterizing the specificity of this molecule to these targets would be useful. Additionally, could the authors perform short-term depletion studies with a PROTAC to the target or short shRNA or non-selected pool CRISPR deletion studies of CDK12 in these same cell lines to complement their pharmacological studies with genetic depletion studies? Also perhaps performing these same inhibitor studies in CDK12/13 deleted cells to test the specificity of the molecule would be useful. Additionally, expanding these studies to additional prostate cancer cell lines or organdies models would strengthen the conclusions being made. More information should be provided about the dose and schedule chosen and the rationale for choosing those doses and schedules for the in vivo studies proposed should be presented and discussed. Was there evidence for maximal evidence of inhibition of the target CDK12/13 at the dose tested given the very modest tumor growth inhibition noted in these studies?

    3. Reviewer #2 (Public review):

      Summary:

      The study explores the functional consequence of CDK12 loss in prostate cancer. While CDK12 loss has been shown to confer homologous recombination (HR) deficiency through premature intronic polyadenylation of HR genes, the response of PARPi monotherapy has failed. This study therefore performed an in-depth analysis of genomic sequencing data from mCRPC patient tumors, and showed that tumors with CDK12 loss lack pertinent HR signatures and scars. Furthermore, functional exploration in human prostate cancer cell lines showed that while the acute inhibition of CDK12 resulted in aberrant polyadenylation of HR genes like BRCA1/2, HR-specific effects were overall modest or absent in cell lines or xenografts adapted to chronic CDK12 loss. Instead, vulnerability to genetically targeting CDK13 resulted in a synthetic lethality in tumors with CDK12 loss, as shown in vivo with SR4825, a CDK12/13 inhibitor - thus serving as a potential therapeutic avenue.

      The evidence supporting this study is based on in-depth genomic analyses of human patients, acute knockdown studies of CDK12 using a CDK12/13 inhibitors SR4835, adaptive knockout of CDK12 using LuCaP 189.4_CL and inducible re-expression of CDK12, CDK12 single clones in 22Rv1 (KO2 and KO5) and Skov3 (KO1), Tet-inducible knockdown of BRCA2 or CDK12 followed by ionizing radiation and measurement of RAD51 foci, lack of sensitivity generally to PARPi and platinum chemotherapy in cells adapted to CDK12 loss, loss of viability of CDK13 knockout in CDK12 knockout cells, and in vivo testing of SE4825 in LuCaP xenografts with intact and CDK12 loss.

      Strengths:

      Overall, this study is robust and of interest to the broader homologous recombination and CDK field. First, the topic is clinically relevant given the lack of PARPi response in CDK12 loss tumors. Second, the strength of the genomic analysis in CDK12 lost PCa tumors is robust with clear delineation that BRCA1/2 genes and maintenance of most genes regulating HR are intact. Specifically, the authors find that there is no mutational signature or genomic features suggestive of HR, such as those found in BRCA1/2 tumors. Lastly, novel lines are generated in this study, including de novo LuCaP 189.4_CL with CDK12 loss that can be profound for potential synthetic lethalities.

      Weakness:

      One caveat that continues to be unclear as presented, is the uncoupling of cell cycle/essentiality of CDK12/13 from HR-directed mechanisms. Is this purely a cell cycle arrest phenotype acutely with associated down-regulation of many genes?

      While the RAD51 loading ssRNA experiments are informative, the Tet-inducible knockdown of BRCA2 and CDK12 is confusing as presented in Figure 5, shBRCA2 + and -dox are clearly shown. However, were the CDK12_K02 and K05 also knocked down using inducible shRNA or a stable knockout? The importance of this statement is the difference between acute and chronic deletion of CDK12. Previously, the authors showed that acute knockdown of CDK12 led to an HR phenotype, but here it is unclear whether CDK12-K02/05 are acute knockdowns of CDK12 or have been chronically adapted after single cell cloning from CRISPR-knockout.

      Given the multitude of lines, including some single-cell clones with growth inhibitory phenotypes and ex-vivo derived xenografts, the variability of effects with SR4835, ATM, ATR, and WEE1 inhibitors in different models can be confusing to follow. Overall, the authors suggest that the cell lines differ in therapeutic susceptibility as they may have alternate and diverse susceptibilities. It may be possible that the team could present this more succinctly and move extraneous data to the supplement.

      The in-vitro data suggests that SR4835 causes growth inhibition acutely in parental lines such as 22RV1. However, in vivo, tumor attenuation appears to be observed in both CDK12 intact and deficient xenografts, LuCAP136 and LuCaP 189.4 (albeit the latter is only nominally significant). Is there an effect of PARPi inhibition specifically in either model? What about the the 22RV1-K02/05? Do these engraft? Given the role of CDK12/13 in RNAP II, these data might suggest that the window of susceptibility in CDK12 tumors may not be that different from CDK12 intact tumors (or intact tissue) when using dual CDK12/13 inhibitors but rather represent more general canonical essential functions of CDK12 and CDK13 in transcription. From a therapeutic development strategy, the authors may want to comment in the discussion on the ability to target CDK13 specifically.

    4. Reviewer #3 (Public review):

      Significance:

      About 5% of metastatic castration-resistant prostate cancers (mCRPC) display genomic alterations in the transcriptional kinase CDK12. The mechanisms by which CDK12 alterations drive tumorigenesis in this molecularly-defined subset of mCRPC have remained elusive. In particular, some studies have suggested that CDK12 loss confers a homologous recombination deficiency (HRd) phenotype, However, clinical studies have not borne out the benefit to PARP inhibitors in patients with CDK12 alterations, despite the fact that these agents are typically active against tumors with HRd.

      In this study, Frank et al. reconcile these findings by showing that: (1) tumors with biallelic CDK12 alterations do not have genomic features of HRd; (2) in vitro, HR gene downregulation occurs with acute depletion of CDK12 but is far less pronounced with chronic CDK12 loss; (3) CDK12-altered cells are uniquely sensitive to genetic or pharmacologic inhibition of CDK13.

      Strengths:

      Overall, this is an important study that reconciles disparate experimental and clinical observations. The genomic analyses are comprehensive and conducted with a high degree of rigor and represent an important resource to the community regarding the features of this molecular subtype of mCRPC.

      Weaknesses:

      (1) It is generally assumed that CDK12 alterations are inactivating, but it is noteworthy that homozygous deletions are comparatively uncommon (Figure 1a). Instead many tumors show missense mutations on either one or both alleles, and many of these mutations are outside of the kinase domain (Figure 1b). It remains possible that the CDK12 alterations that occur in some tumors may retain residual CDK12 function, or may confer some other neomorphic function, and therefore may not be accurately modeled by CDK12 knockout or knockdown in vitro. This would also reconcile the observation that knockout of CDK12 is cell-essential while the human genetic data suggest that CDK12 functions as a tumor suppressor gene.

      (2) It is not entirely clear whether CDK12 altered tumors may require a co-occurring mutation to prevent loss of fitness, either in vitro or in vivo (e.g. perhaps one or more of the alterations that occur as a result of the TDP may mitigate against the essentiality of CDK12 loss).

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Given that there are different mutations identified at different CDK12 sites as illustrated in Figure 1B it would be nice to know which ones have been functionally classified as pathogenic and for which ones that the pathogenicity has not been determined. This would be especially interesting to perform in light of the differences in the LOH scores and WES data presented - specifically, are the pathogenic mutations vs the mutations for which true pathogenicity is unknown more likely to display LOH or TD?

      Alterations were classified as pathogenic when resulting in frameshift, nonsense, or cause an aminoacid change likely to alter function (according to ANNOVAR).  Four patients were called CDK12<sup>BAL</sup> but were negative for TDP signatures. Three of these had CDK12 mutations downstream of the kinase domain, which may be less likely to ablate protein activity. Most functionally validated pathogenic mutations include disruption of the kinase domain (PMID: 25712099). We added a sentence to the Results section (under “Identification of genomic characteristics that associate with CDK12 loss in prostate cancer”) to highlight this caveat on pathogenic mutation calls.

      For the cell inhibition studies with the CDK12/13 inhibitor, more details characterizing the specificity of this molecule to these targets would be useful. Additionally, could the authors perform short-term depletion studies with a PROTAC to the target or short shRNA or non-selected pool CRISPR deletion studies of CDK12 in these same cell lines to complement their pharmacological studies with genetic depletion studies? Also perhaps performing these same inhibitor studies in CDK12/13 deleted cells to test the specificity of the molecule would be useful.

      We are not aware of a CDK12-specific PROTAC, and generate such as reagent is beyond the scope of the present study. Regarding the specificity of the CDK12/13 inhibitor molecules, additional information on the specificity and in vivo dose selection were added to the Results section (under “CDK13 is synthetic lethal in cells with biallelic CDK12 loss”). Cells with CDK12-KO did not tolerate CDK13-KO, so we were unable to generate double knockouts to test for CDK12/13 inhibitor non-specific effects. 

      Additionally, expanding these studies to additional prostate cancer cell lines or organdies models would strengthen the conclusions being made. More information should be provided about the dose and schedule chosen and the rationale for choosing those doses and schedules for the in vivo studies proposed should be presented and discussed. Was there evidence for maximal evidence of inhibition of the target CDK12/13 at the dose tested given the very modest tumor growth inhibition noted in these studies.

      With respect to additional acute CDK12 loss models, our Tet-inducible shCDK12 models show only minor growth slowdown and do not appear to phenocopy the strong arrest or apoptosis seen with CDK12 KO or inhibition, respectively. Future work is ongoing to generate CDK12-degron regulated cell lines. We added a new immunoblot panel showing that acute CRISPR/sgRNA targeting of CDK12 does indeed lead to BRCA2 and ATM protein decrease (Fig. S4g), providing some orthogonal genomic targeting evidence of the acute HR gene effect.  We are continuing efforts to collect and generate additional CDK12<sup>BAL</sup> cell models, in both 2D and 3D culture systems, but none are presently available. We added a 3D culture drug dose curve with LuCaP189.4 exposed to THZ531 (Fig. S7m), which confirms heightened sensitivity vs two CDK12-intact lines. 

      Regarding assessments of CDK12 targets; as we are not aware of any unique CDK12 substrates, it is fair to ask but difficult to measure precise CDK12 inhibition by the compounds in tumors. We dosed mice using the same protocol as detailed in the original report testing SR4835 in mice (PMID: 31668947). We performed immunoblots on lysates from 3 and 28 day treated PDX tumors and did not see any consistent decreases in pRBP1(Ser2) or ATM or increases in γH2A.X (data not shown). However, we did see increases in APA usage and downregulation of DNA repair transcripts with three day treatment (Fig. 6k-l), as would be expected from on target acute effects.

      Reviewer #2 (Public review)

      One caveat that continues to be unclear as presented, is the uncoupling of cell cycle/essentiality of CDK12/13 from HR-directed mechanisms. Is this purely a cell cycle arrest phenotype acutely with associated down-regulation of many genes?

      In regard to untangling the effects of cell arrest on HR gene expression, this is a difficult question given that many HR genes, including BRCA2, are S/G2 linked. We attempted to account for those effects in the acute CDK12 inhibition experiment by including a palbociclib (CDK4/6i) control, which caused cell arrest and decreased BRCA1/2 RNA expression with no apparent 5/3’ transcript imbalance determined by qPCR (Fig. 4e,g). Though overall BRCA1 and BRCA2 mRNA levels are lower in the stable 22Rv1-CDK12-KO2 and KO5 lines, they do not show selective 3’ loss (Fig. 5c), suggesting the downregulation in these lines is mostly due to their slower growth (Fig. S4k) and not intronic polyA usage.

      While the RAD51 loading ssRNA experiments are informative, the Tet-inducible knockdown of BRCA2 and CDK12 is confusing as presented in Figure 5, shBRCA2 + and -dox are clearly shown. However, were the CDK12_K02 and K05 also knocked down using inducible shRNA or a stable knockout? The importance of this statement is the difference between acute and chronic deletion of CDK12. Previously, the authors showed that acute knockdown of CDK12 led to an HR phenotype, but here it is unclear whether CDK12K02/05 are acute knockdowns of CDK12 or have been chronically adapted after single cell cloning from CRISPR-knockout. 

      As a clarification, the 22Rv1-CDK12-KO2 and 22Rv1-CDK12-KO5 are stable CRISPR knockout clonal lines that were expanded from single cells. We added a new figure to include more validation of these lines (Fig. S5). We tried multiple times to reproduce the HRd phenotype and PARPi sensitivity with siRNA and inducible shRNA lines but were unable to see clear sensitivity differences, despite seeing the expected shifts with shBRCA2 controls (data not shown). It is possible the degree of knockdown (~80%), timing (8 days), or specific cell lines used in our experiments were not sufficient to expose the acute phenotype by this method.

      However, we were able to see acute HR gene decreases by inhibitor treatment (Fig. 4) or acute CRISPR (Fig. S4g).

      Given the multitude of lines, including some single-cell clones with growth inhibitory phenotypes and ex-vivo derived xenografts, the variability of effects with SR4835, ATM, ATR, and WEE1 inhibitors in different models can be confusing to follow. Overall, the authors suggest that the cell lines differ in therapeutic susceptibility as they may have alternate and diverse susceptibilities. It may be possible that the team could present this more succinctly and move extraneous data to the supplement.  

      We appreciate the complexity of the data and attempted to use multiple models to report consistency and variability. We are not able to ascertain what data would be extraneous, and elected to present data we view as relevant in the main figures while moving supporting data in the supplement.

      The in-vitro data suggests that SR4835 causes growth inhibition acutely in parental lines such as 22RV1. However, in vivo, tumor attenuation appears to be observed in both CDK12 intact and deficient xenografts, LuCAP136 and LuCaP 189.4 (albeit the latter is only nominally significant). Is there an effect of PARPi inhibition specifically in either model? What about the 22RV1-K02/05? Do these engraft? Given the role of CDK12/13 in RNAP II, these data might suggest that the window of susceptibility in CDK12 (mutant) tumors may not be that different from CDK12 intact tumors (or intact tissue) when using dual CDK12/13 inhibitors but rather represent more general canonical essential functions of CDK12 and CDK13 in transcription. From a therapeutic development strategy, the authors may want to comment in the discussion on the ability to target CDK13 specifically.

      Though the response of the CDK12<sup>BAL</sup> models to some compounds is variable, we believe those mixed results are important and future studies may be able to better explain why some show shifts in sensitivity while others do not. We hope future studies with additional models will help determine which sensitivities are more consistently true, and perhaps provide explanations for differences between models.

      Regarding SR4835, we find, and others have reported, a toxic (i.e. apoptotic) effect for in vitro treatment with dual CDK12/13 inhibitors (Fig. 4f, S4e,f); in fact, that may be why previous studies have used short timepoints in cell culture assays with these dual inhibitors. In mice, SR4835 was tolerated well but only LuCaP 189.4 showed statistically significant decreases in tumor volume and weight (Fig. 6j). We did not test PARPi responses in the PDX models, nor did we attempt engrafting the 22Rv1-CDK12-KO cell lines, but both would be worthwhile experiments in the future. Beyond CDK12<sup>BAL</sup> tumors, we agree that CDK12/13 inhibitors could be effective in cancer therapies more generally (e.g. triggering acute HRd, loss of RNAP2 phosphorylation). We added a line to the discussion section about ongoing efforts to combine PARPi and CDK12/13i, which we expect to be synergistic in CDK12-intact tumors due to the acute loss phenotype. We certainly agree that development of a specific CDK13 inhibitor would be the ideal therapeutic option for CDK12<sup>BAL</sup> tumors. However, CDK12 and CDK13 are 43% conserved at the protein level (PMID: 26748711), with 92% conservation in the active site (PMID: 30319007), and there are no available pharmacologic inhibitors that discriminate between CDK12 and CDK13.

      Reviewer #3 (Public review):

      It is generally assumed that CDK12 alterations are inactivating, but it is noteworthy that homozygous deletions are comparatively uncommon (Figure 1a). Instead many tumors show missense mutations on either one or both alleles, and many of these mutations are outside of the kinase domain (Figure 1b). It remains possible that the CDK12 alterations that occur in some tumors may retain residual CDK12 function, or may confer some other neomorphic function, and therefore may not be accurately modeled by CDK12 knockout or knockdown in vitro. This would also reconcile the observation that knockout of CDK12 is cell-essential while the human genetic data suggest that CDK12 functions as a tumor suppressor gene.

      Thank you for the feedback. It is a keen observation that homozygous deletions of CDK12 are not typical, though many mutations are upstream frameshifts that are expected to lead to loss of functional protein and mRNA via nonsense mediated decay. LuCaP189.4, our only natural mutant model, has two upstream frameshifts leading to complete protein loss (Fig 5b, S4h-i). We also added a caveat previously mentioned (in response to Reviewer 1) that mutations downstream of the kinase domain may be less likely to be fully pathogenic. For upstream missense mutations, the possibility of neuromorphic function remains an intriguing possibility that cannot be ruled out and would not be captured in our current models. Hopefully additional models can be developed, both natural and engineered, to help dissect that question in future studies.  

      It is not entirely clear whether CDK12 altered tumors may require a co-occurring mutation to prevent loss of fitness, either in vitro or in vivo (e.g. perhaps one or more of the alterations that occur as a result of the TDP may mitigate against the essentiality of CDK12 loss).

      We concur. Another caveat with the CRISPR models, beyond reliance on upstream frameshift mutations, is the simultaneous loss of alleles. In human tumors, there may be a period of single copy loss before the second hit that may provide a window for adaptation. It is possible that sequential loss is far easier for a cell to tolerate than acute bi-allelic inactivation. We agree that the question of what (if any) cooperating genetic alterations are required to tolerate CDK12 loss is an important one that we plant to further explore in future work.

      Recommendations for Authors:

      Reviewer #1 (Recommendations for Authors):

      The authors have thoroughly addressed all issues of data availability, reagents, in vivo protocols, and animal approvals associated with the studies presented in this manuscript. Specific comments and experimental suggestions that in my opinion would strengthen the conclusions of this interesting and compelling manuscript are included above

      Reviewer #2 (Recommendations for the authors):

      The authors were thorough in their studies. As a general note, switching between the cell lines is often overwhelming in interpreting the data given cell-to-cell variability in response. If possible, consolidating the text/conclusions in results would improve the readability of the manuscript.

      The variety of cell lines and models is perhaps expansive at times, but we hope the inclusion of these different models helps support the conclusions. 

      Is it possible to knockout CDK12 acutely using a degron-based approach, instead of utilizing an inhibitor that targets both CDK12/13?

      There is a HeLa cell line made with analog-sensitive CDK12 (Bartkowiak, Yan, and Greenleaf 2016) but we were unaware of any such prostate lines at the time of this work. We are attempting to develop engineered prostate lines with specific CDK12 degradation but do not yet have them available.

      How do the authors address a lower BRCA1/2 level in for example 22RV1-K05, does this cell line have increased sensitivity to PARPi over its parental 22RV1 line? Could this be added to Figure 5h/i?

      The lower BRCA2 levels in 22Rv1CDK12-KO5 is likely due to the slower growth rate (Fig. S4k), as BRCA2 expression is S/G2 linked. While the mRNA level of BRCA2 overall is lower in the KO5 line, we do not observe the 5’/3’ transcript imbalance (Fig. 5c). The 22Rv1-CDK12-KO lines did not show increased sensitivity to carboplatin, while inducible shBRCA2 did (Fig. S7a), so we do not believe this lower BRCA2 confers functional HRd. We did test the KO lines with olaparib (Fig. S7d) and saw a modest increased sensitivity compared to parental 22Rv1, but not to the extent measured in the BRCA1 mutant line UWB1.289.

      What is the clonality of the LuCAP 189.4 lines upon derivation? Is biallelic CDK12 loss seen in all cells?

      We do not know the exact clonality of the LuCAP 189.4 PDX or CL model, but we do see highly uniform CDK12 protein loss in these cells (quantified by IHC staining, data not shown).

      The authors state that 22RV1-K02/05 has an increased growth arrest to CDK13 inhibition. However, in Figure 6h, it appears the viability is not significantly different compared to the parental 22RV1 line. Similar aspects noted in 189.4-vec/CDK12?

      We found that 22Rv1 KO2/KO5 have growth arrest with sgCDK13 and cell death with CDK12/13 inhibitor. We did notice that SR4835 did not show the differential effects we anticipated (Fig. 6h), as was seen with THZ531 (Fig. 6i). SR4835 is a non-covalent inhibitor, while THZ531 is a covalent binder, so there are some functional differences between these compounds that might explain the lack of differential effects in the isogenic lines in a 4 day in vitro assay. We included the SR4835 in vitro data because it was used for the in vivo experiment. THZ531 is not suited for animal use.

      Could the authors comment on SR4835 response in vivo as a function of tumor growth rate?

      The in vivo SR4835 treated LuCaP189.4 did show signs of reduced proliferation with decreased Cell Cycle and DNA Replication in the RNA-seq signatures, but a more detailed investigation into cell cycle arrest vs apoptotic response has yet to be fully explored. We plan to conduct additional PDX experiments if we can obtain a selective CDK13 inhibitor. 

      Do the authors explore TDPs in their isogenic cell lines?

      We performed low coverage WGS on the 22Rv1 KO clones and added that to the paper (Fig. S5c). We did not see any obvious signs of TDP. We suspect the phenotype takes longer to accumulate and is not apparent within the ~20 passages our clones underwent in culture. This would be consistent with the tumor analysis (Fig. 2b) showing increase in TDs from primary to metastatic tumors, suggesting TDs accumulate over time.

      A future study may allow for screening synthetic lethals in the context of CDK12 loss in the presence or absence of SR4835 inhibition.

      We are actively pursuing experiments to identify new synthetic lethal targets by CRISPR and drug screens in CDK12 loss models and hope to report those in a future study.

      Reviewer #3 (Recommendations for the authors):

      As discussed above, the authors may wish to adjust their terminology to "CDK12-altered" rather than "CDK12 lost" or "CDK12-inactivated" to leave open the possibility that some tumors may retain residual CDK12 function or adopt neomorphic functions.

      Thank you for the additional comments and feedback. The possibility of neomorphic CDK12 allele function is important. As our models were all complete protein loss mutations, we decided to retain “biallelic loss” as our preferred nomenclature, but the note is well taken.

      The plots in Figures 1f-h are interesting and suggest that certain cancer genes (especially oncogenes) are recurrently gained in CDK12-altered tumors. It may be interesting to look at this on the individual level rather than the cohort level to see whether any groups of oncogenes tend to be gained together in an individual patient - this could inform whether certain combinations of cancer drivers cooperate with CDK12 alteration to drive oncogenesis.

      Thank you for the idea of looking at the patient-level for TDP-enriched oncogenes. A preliminary assessment did not identify recurrent co-gained oncogenes. We will continue these analyses as additional patient tumors with CDK12 alterations are identified. 

      The finding that AR gene or enhancer are recurrently gained with TDP is interesting and I am curious whether the authors have thoughts on whether these alterations can also be seen in the 1-2% of CDK12altered primary prostate cancers that are treatment naïve, and where AR pathway alterations are not as frequently seen.

      We did not focus on CDK12 altered primary prostate cancers, but we did check if there is AR amplification enrichment in the 6 CDK12<sup>BAL</sup> cases of the TCGA-PRAD dataset and did not identify enrichment. However, with such small numbers we would hesitate to draw any hard conclusions. 

      It could be interesting to more comprehensively characterize some of the CDK12 KO-adapted lines in Figure 5 (e.g. by WES or WGS) to determine whether they exhibit the TDP and/or whether they have acquired any secondary mutations that allow them to adapt to CDK12 loss.

      We are planning to do further genomics characterization of the CDK12-KO lines and will hopefully include that in a future study. Genomic analyses of the 22Rv1 clones (see copy number plots in Fig. S5c) did not identify a TDP. We plan to repeat the genomic assessments over additional cell passages and we have planned additional experiments designed to understand why some cells tolerate CDK12 loss and others do not.

    1. eLife Assessment

      This useful study informs the transcriptional mechanisms that promote stem cell differentiation and prevent degeneration in the adult eye. Through inducible mouse mutagenesis, the authors uncover a dual role for a transcription factor (Sox9) in stem cell differentiation and prevention of retinal degeneration. The data at hand convincingly support to the main conclusions. The study will be of general interest to the fields of neuronal development and neurodegeneration.

    2. Reviewer #1 (Public review):

      Summary:

      Hurtado et al. show that Sox9 is essential for retinal integrity, and its null mutation causes the loss of the outer nuclear layer (ONL). The authors then show that this absence of the ONL is due to apoptosis of photoreceptors and a reduction in the numbers of other retinal cell types such as ganglion cells, amacrine cells and horizontal cells. They also describe that Müller Glia undergoes reactive gliosis by upregulating the Glial Fibrillary Acidic Protein. The authors then show that Sox9+ progenitors proliferate and differentiate to generate the corneal cells through Sox9 lineage-tracing experiments. They validate Sox9 expression and characterize its dynamics in limbal stem cells using an existing single-cell RNA sequencing dataset. Finally, the authors show that Sox9 deletion causes progenitor cells to lose their clonogenic capacity by comparing the sizes of control and Sox9-null clones. Overall, Hurtado et al. underline the importance of Sox9 function in retinal cells.

      Strengths:

      The authors have characterized a myriad of striking phenotypes due to Sox9 deletion in the retina and limbal stem cells which will serve as a basis for future studies.

      Weaknesses:

      Hurtado et al. highlight the importance of Sox9 in the retina and limbal stem cells by describing several affects of Sox9 depletion in the adult eye. However, it is unclear how or where Sox9 precisely acts as a mechanistic investigation of the transcription factor's role in this tissue is lacking.

    3. Reviewer #2 (Public review):

      Summary:

      Sox9 is a transcription factor crucial for development and tissue homeostasis, and its expression continues in various adult eye cell types, including retinal pigmented epithelium cells, Müller glial cells, and limbal and corneal basal epithelia. To investigate its functional roles in the adult eye, this study employed inducible mouse mutagenesis. Adult-specific Sox9 depletion led to severe retinal degeneration, including the loss of Müller glial cells and photoreceptors. Further, lineage tracing revealed that Sox9 is expressed in a basal limbal stem cell population that supports stem cell maintenance and homeostasis. Mosaic analysis confirmed that Sox9 is essential for the differentiation of limbal stem cells. Overall, the study highlights that Sox9 is critical for both retinal integrity and the differentiation of limbal stem cells in the adult mouse eye.

      Strengths:

      In general, inducible genetic approaches in the adult mouse nervous system are rare and difficult to carry out. Here, the authors employ tamoxifen-inducible mouse mutagenesis to uncover the functional roles of Sox9 in the adult mouse eye.

      Careful analysis suggests that two degeneration phenotypes (mild and severe) are detected in the adult mouse eye upon tamoxifen-dependent Sox9 depletion. Phenotype severity nicely correlates with the efficiency of Cre-mediated Sox9 depletion.

      Molecular marker analysis provides strong evidence of Mueller cell loss and photoreceptor degeneration.

      A clever genetic tracing strategy uncovers a critical role for Sox9 in limbal stem cell differentiation.

      Comments on revised submission:

      The revised manuscript is very much improved and has addressed all my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Hurtado et al. show that Sox9 is essential for retinal integrity, and its null mutation causes the loss of the outer nuclear layer (ONL). The authors then show that this absence of the ONL is due to apoptosis of photoreceptors and a reduction in the numbers of other retinal cell types such as ganglion cells, amacrine cells, and horizontal cells. They also describe that Müller Glia undergoes reactive gliosis by upregulating the Glial Fibrillary Acidic Protein. The authors then show that Sox9+ progenitors proliferate and differentiate to generate the corneal cells through Sox9 lineage-tracing experiments. They validate Sox9 expression and characterize its dynamics in limbal stem cells using an existing single-cell RNA sequencing dataset. Finally, the authors argue that Sox9 deletion causes progenitor cells to lose their clonogenic capacity by comparing the sizes of control and Sox9-null clones. Overall, Hurtado et al. underline the importance of Sox9 function in retinal and corneal cells.

      Strengths:

      The authors have characterized a myriad of striking phenotypes due to Sox9 deletion in the retina and limbal stem cells which will serve as a basis for future studies.

      Weaknesses:

      Hurtado et al. investigate the importance of Sox9 in the retina and limbal stem cells. However, the overall experimental narrative appears dispersed.

      (1) The authors begin by characterizing the phenotype of Sox9 deletion in the retina and show that the absence of the ON layer is due to photoreceptor apoptosis and a reduction in other retinal cell types. The authors also note that Müller glia undergoes gliosis in the Sox9 deletion condition. These striking observations are never investigated further, and instead, the authors switch to lineage-tracing experiments in the limbus that seem disconnected from the first three figures of the paper. Another example of this disconnect is the comparison of Sox9 high and Sox9 low populations using an existing scRNA-seq dataset and the subsequent GO term analysis, which does not directly tie in with the lineage-tracing data of the succeeding Sox9∆/∆ experiments.

      We thank the reviewer for their thoughtful observations. We would like to clarify the rationale behind the structure of our study and how the different parts are conceptually connected.

      Our central aim was to investigate the role of Sox9 in the adult eye. Given that Sox9 has been extensively studied during embryonic development, we specifically chose to use an inducible conditional knockout strategy (CAG-CreERTM) in order to assess its function postnatally, in the adult eye. This approach revealed a severe retinal phenotype, whereas the cornea showed no overt phenotype. A major strength of our experimental design is that it allowed us to examine the role of Sox9 specifically in the adult eye, avoiding confounding effects from embryonic development. Nevertheless, this approach entails an inherent limitation: the mosaic nature of the CAG-CreERTM system leads to substantial variability in both the extent and distribution of Sox9 inactivation among individual animals. We invested considerable effort over extended periods to obtain reliable and biologically meaningful data despite this variability. We did not proceed further because this mosaicism poses a significant limitation when attempting to dissect downstream mechanisms in a consistent and reproducible manner, making it extremely challenging to perform in-depth mechanistic studies.

      Regarding the cornea, given the absence of a clear phenotype upon Sox9 deletion, we expanded our investigation by adding lineage-tracing and transcriptomic analyses to better understand Sox9’s potential role in adult limbal epithelial stem cells. These additional experiments provided valuable insight into Sox9 function in the adult cornea, even in the absence of gross morphological changes. Thus, while the retinal and corneal data stem from different experimental approaches, they are unified by a shared goal: understanding the celltype-specific and tissue-specific functions of Sox9 in the adult eye.

      To ensure that other readers do not perceive this apparent disconnect, and overstate our conclusions, we have modified the manuscript.  In the Introduction section, we have included the main findings from studies conducted to date on the role of Sox9 in the cornea and retina, and we have removed the corresponding section from the Discussion. We believe it is now clear that our study focuses on the role of Sox9 in the adult eye, in contrast to previous studies, which focused on the developing eye.

      In the Discussion section, we have added a new paragraph at the beginning and end that explicitly addresses the relationship between the retinal and limbal findings, illustrating how a single transcription factor can play distinct roles in different tissues within the same organ.

      Regarding the reviewer’s comment that the scRNA-seq analyses appear disconnected from the lineage-tracing data, we respectfully disagree. These analyses provide independent transcriptional confirmation that Sox9 is a marker of limbal stem cells, reinforcing the conclusions drawn from our in vivo experiments. These approaches are complementary and they converge on the same biological insight: Sox9 marks a population with stem-like properties in the adult limbus. Nevertheless, we acknowledge the reviewer’s concern and have moderated the tone of our statements in the revised version of the manuscript to better reflect the supporting nature of the scRNA-seq data, without overstating its functional implications.

      (2) A major concern is that a single Sox9∆/∆ limbal clone has a sufficiently large size, comparable to wild-type clones, as seen in Figure 6D. This singular result is contrary to their conclusion, which states that Sox9-deficient stem cells minimally contribute to the maintenance of the cornea.

      We thank the reviewer for this important observation.

      Ligand-independent activity of Cre-ER fusion proteins has been repeatedly reported in various mouse models (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009). This basal recombinase activity is thought to arise from inappropriate nuclear translocation or proteolysis of the Cre-ER fusion protein, leading to low-level recombination even in the absence of tamoxifen. Consistent with this, prior studies using the same CAGG-CreERTM; R26R-LacZ system for clonal analysis in the cornea have observed sparse reporter expression before tamoxifen administration (Dorà et al., 2015).

      In line with these findings, we also detected minimal background LacZ staining in Sox9Δ/ΔLacZ corneas (mean surface area: 0.85%; n = 8 eyes). This low-level staining likely reflects recombination events in transient amplifying or more differentiated cells, which are not expected to generate long-lived clones. However, in the rare instance of a large clone, as shown in Figure 6D, we believe that a spontaneous recombination event may have occurred in a bona fide limbal stem cell, giving rise to a sustained contribution. To rigorously address this potential artefact and assess the true contribution of Sox9-deficient stem cells, we conducted a comparative analysis of 8 control (Sox9Δ/+-LacZ) and 5 mutant (Sox9Δ/ΔLacZ) corneas. This analysis revealed a highly significant 8-fold reduction in the LacZpositive surface area in mutant samples (Sox9Δ/+-LacZ: 6.65 ± 1.77%; Sox9Δ/Δ-LacZ: 0.85 ± 0.85%; paired t-test, p = 0.00017; Figs. 6E and F; Table S12).

      We chose to include the image of the large clone in the main figure precisely because it does not align with our working hypothesis. We believe that showing such exceptions transparently is scientifically important and may be valuable for other researchers using similar inducible systems. Nonetheless, based on previous literature, the number of samples analyzed, and the statistically significant reduction in clonal contribution, we maintain that the observed phenotype reflects a true biological effect of Sox9 loss, supporting our conclusion that Sox9-deficient stem cells contribute minimally to corneal maintenance. To make that point clearer, we have introduced the following sentence in lines 462-464 of the revised version of the manuscript.

      “A possible explanation for this clone may be that spontaneous ligand-independent activity of Cre-ER fusion may have occurred in a bona fide limbal stem cell, as previously reported (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009, Dorà et al., 2015).”

      Reviewer #2(Public revciew):

      Sox9 is a transcription factor crucial for development and tissue homeostasis, and its expression continues in various adult eye cell types, including retinal pigmented epithelium cells, Müller glial cells, and limbal and corneal basal epithelia. To investigate its functional roles in the adult eye, this study employed inducible mouse mutagenesis. Adult-specific Sox9 depletion led to severe retinal degeneration, including the loss of Müller glial cells and photoreceptors. Further, lineage tracing revealed that Sox9 is expressed in a basal limbal stem cell population that supports stem cell maintenance and homeostasis. Mosaic analysis confirmed that Sox9 is essential for the differentiation of limbal stem cells. Overall, the study highlights that Sox9 is critical for both retinal integrity and the differentiation of limbal stem cells in the adult mouse eye.

      Strengths:

      In general, inducible genetic approaches in the adult mouse nervous system are rare and difficult to carry out. Here, the authors employ tamoxifen-inducible mouse mutagenesis to uncover the functional roles of Sox9 in the adult mouse eye.

      Careful analysis suggests that two degeneration phenotypes (mild and severe) are detected in the adult mouse eye upon tamoxifen-dependent Sox9 depletion. Phenotype severity nicely correlates with the efficiency of Cre-mediated Sox9 depletion.

      Molecular marker analysis provides strong evidence of Mueller cell loss and photoreceptor degeneration.

      A clever genetic tracing strategy uncovers a critical role for Sox9 in limbal stem cell differentiation.

      Weaknesses:

      (1) The Introduction can be improved by explaining clearly what was previously known about Sox9 in the eye. A lot of this info is mentioned in a single, 3-page long paragraph in the Discussion. However, the current study's significance and novelty would become clearer if the authors articulated in more detail in the Introduction what was already known about Sox9 in retina cell types (in vitro and in vivo).

      We appreciate this insightful comment. Following the reviewer`s suggestion, we have reorganized the manuscript to provide a clearer scientific context in the Introduction. Specifically, we have moved the relevant background information on Sox9 in different retinal cell types—previously included in a single, extended paragraph in the Discussion—into the Introduction. This helps to better frame our study within the context of existing knowledge.

      Additionally, we have emphasized more explicitly that our work does not focus on embryonic development, as most previous studies on Sox9 have done, but instead investigates its role in the adult mouse retina and limbus/cornea. We believe this represents an important and novel aspect of our study, as the mechanisms of retinal maintenance and limbal stem cell differentiation in the adult have been less extensively studied.

      (2) Because a ubiquitous tamoxifen-inducible CreER line is employed, non-cell autonomous mechanisms possibly contribute to the observed retina degeneration. There is precedence for this in the literature. For example, RPE-specific ablation of Otx2 results in photoreceptor degeneration (PMID: 23761884). Have the authors considered the possibility of non-cell autonomous effects upon ubiquitous Sox9 deletion?

      Given the similar phenotypes between animals lacking Otx2 and Sox9 in specific cell types of the eye, the authors are encouraged to evaluate Otx2 expression in the tamoxifen-induced Sox9 adult retina.

      We appreciate the insightful comment of the reviewer regarding the potential contribution of non-cell autonomous mechanisms to the retinal degeneration observed upon ubiquitous Sox9 deletion. We agree that this is an important consideration, particularly in the context of findings showing that RPE-specific deletion of Otx2 results in secondary photoreceptor degeneration.

      However, we would like to emphasize that RPE-specific deletion of Sox9 does not lead to photoreceptor loss or retinal degeneration, as previously shown (Masuda et al., 2014; Goto et al., 2018; Cohen-Tayar et al., 2018) [PMID: 24634209; PMID: 29609731; PMID: 29986868]. In addition, it was shown that Sox9 deletion in the RPE caused downregulation of visual cycle genes but did not compromise photoreceptor integrity or survival. Interestingly, Otx2 expression was found to be upregulated in the absence of Sox9, further supporting the view that Sox9 is not a simple upstream regulator of Otx2 in the adult RPE (Matsuda, 2014). These findings suggest that RPE dysfunction alone cannot account for the severe retinal phenotype we observe in our model.

      In our study, we observed that photoreceptor degeneration correlates strongly with the depletion of Sox9 Müller glial cells. Given the well-established supportive and neuroprotective roles of Müller glia, we interpret the retinal degeneration in our model to be primarily a consequence of Müller cell dysfunction (confirmed by the loss of Müller glia markers, such as SOX8 and S100). This interpretation is further supported by previous studies showing that selective ablation of Müller glia can lead to photoreceptor degeneration through cell-autonomous mechanisms (Shen et al., 2012) [PMID: 23136411].

      Nevertheless, we agree that this possibility deserves further investigation, and we have acknowledged it in the following paragraph that has been added to the Discussion section (lines 511-523 of the revised ms):

      “An important consideration in our model is the potential contribution of non-cell autonomous mechanisms to photoreceptor degeneration. Sox9 is expressed in both MG and RPE cells, and both cell types are known to support photoreceptor viability (Poché et al., 2008; Masuda et al., 2014). Notably, Sox9 and Otx2 cooperate to regulate visual cycle gene expression in the RPE (Masuda et al., 2014), and loss of Otx2 specifically in the adult RPE leads to secondary photoreceptor degeneration through non-cell autonomous mechanisms (Housset et al., 2013). However, RPE-specific deletion of Sox9 does not induce retinal degeneration and in fact results in Otx2 upregulation (Masuda et al., 2014; Goto et al., 2018; Cohen-Tayar et al., 2018), suggesting that Sox9 is not an upstream regulator of Otx2 in this context. Further investigation into the molecular and cellular interactions between MG, RPE, and photoreceptors may help to clarify the indirect pathways contributing to degeneration in the absence of Sox9.”

      Consistent with the above, a new citation has been included:

      Housset M, Samuel A, Ettaiche M, Bemelmans A, Béby F, Billon N, Lamonerie T. 2013. Loss of Otx2 in the adult retina disrupts retinal pigment epithelium function, causing photoreceptor degeneration. J Neurosci 33:9890–904. doi:10.1523/JNEUROSCI.1099-13.2013.

      (3) The most parsimonious explanation for the dual role of Sox9 in retinal cell types and limbal stem cells is that the cell context is different. For example, Sox9 may cooperate with TF1 in photoreceptors, TF2, in Mueller cells, and TF3 in limbal stem cells, and such cell typespecific cooperation may result in different outcomes (retinal integrity, stem cell differentiation). The authors are encouraged to add a paragraph to the discussion and share their thoughts on the dual role of Sox9.

      We thank the reviewer for this thoughtful and constructive suggestion. In , we have added a paragraph at the end of the Discussion addressing the potential dual role of Sox9 in the cornea and retina. In this new section, we discuss how Sox9 might exert distinct functions depending on the cellular context, possibly through interactions with different transcriptional partners in specific cell types. This may help explain the contrasting roles of Sox9 in maintaining retinal integrity versus regulating stem cell differentiation in the limbal epithelium.

      (4) One more molecular marker for Mueller glial cells would strengthen the conclusion that these cells are lost upon Sox9 deletion.

      We thank the reviewer for this constructive suggestion. To reinforce our conclusion that most Müller glial cells are lost following Sox9 deletion, we analysed the expression of S100, a well-established cytoplasmic marker of Müller glia. As S100 is primarily localized to the innermost Müller cell processes and not restricted to cell bodies, direct cell counting was not feasible. Instead, we quantified the S100+ signal intensity across defined retinal surface areas. This analysis revealed a statistically significant reduction in S100 signal in Sox9<sup>Δ/Δ</sup> retinas compared to controls. These new data, included in the revised Figure 1 (panels F and G), support and extend our previous observations using SOX8, further confirming the loss of Müller glial cells in Sox9-deficient retinas.

      We have also modified the manuscript based on this new evidences as follows:

      In the Results section, lines 168-177 of the revised ms, we have added the following paragraph: “To independently validate the loss of MG cells in Sox9-deficient retinas, we examined the expression of S100, a cytoplasmic marker that labels the processes of adult Müller cells. In control retinas, strong S100 immunoreactivity was observed across the inner retina, outlining the typical radial projections of Müller glia (Fig. 1F). In contrast, Sox9Δ/Δ retinas with an extreme phenotype exhibited a marked reduction in S100 signal (Fig. 1G). Given the diffuse cytoplasmic localization of S100, we quantified its expression by measuring the fluorescence signal within a defined surface area of the retina. This analysis revealed a statistically significant reduction in S100 signal intensity in mutant samples (including both mild and extreme phenotypes) compared to controls (Fig. 1G; Table S4), further supporting the loss of MG cells upon Sox9 deletion.”

      In Methods, line 684 of the revised ms, the anti-S100 antibody reference and its working dilution have been added.

      (5) Using opsins as markers, the authors conclude that the photoreceptors are lost upon Sox9 deletion. However, an alternate possibility is that the photoreceptors are still present and that Sox9 is required for the transcription of opsin genes. In that case, Sox9 (like Otx2) may act as a terminal selector in photoreceptor cells. This point is particularly important because vertebrate terminal selectors (e.g., Nurr1, Otx2, Brn3a) initially affect neuron type identity and eventually lead to cell loss.

      We perfectly understand the reviewer’s point. However, we believe that the possibility that Sox9 regulates opsin gene expression without affecting photoreceptor survival is very unlikely in our model. The primary evidence comes from the histological analysis shown in Figure 1B, where hematoxylin and eosin staining clearly demonstrates the complete loss of the ONL in Sox9<sup>Δ/Δ</sup> retinas exhibiting the extreme phenotype. Similarly, DAPI counterstain also evidences the lack of the ONL in many of our immunofluorescence images of these samples.  This morphological disappearance of the ONL strongly supports the conclusion that photoreceptor cells are not merely transcriptionally silent but are physically absent.

      Furthermore, TUNEL assays in two retinas with a mild phenotype revealed extensive apoptosis within the ONL, suggesting a progressive degeneration process rather than a selective transcriptional effect. While we acknowledge that transcriptional regulation of opsin genes by Sox9 cannot be entirely ruled out, the observed phenotype is more consistent with a structural loss of photoreceptors rather than a change in their molecular identity alone. Therefore, our data support the interpretation that Sox9 is required for photoreceptor survival, likely through non-cell autonomous mechanisms related to Müller glia dysfunction, rather than acting as a terminal selector within photoreceptor cells themselves.

      (6) Quantification is needed for the TUNEL and GFAP analysis in Figure 3.

      We have quantified the GFAP immunofluorescence signal across defined surface areas of the retina and found a statistically significant increase in GFAP expression in Sox9<sup>Δ/Δ</sup> mutants compared to controls (Mann-Whitney U test, P = 0.0240; n = 4 controls, 10 mutants). These quantification data are now included in the revised Figure 3.

      Regarding the TUNEL assay, although extensive apoptosis was clearly observed in two Sox9<<sup>Δ/Δ</sup> retinas with a mild phenotype (as shown in Figure 3A), this pattern was not consistent across the full study mouse cohort. Out of 15 mutant samples analyzed (5 of them previously analyzed and 10 additional ones that have been newly analyzed), only two exhibited this pronounced apoptotic pattern. However, in the remaining 13 mutants, we did observe a small but statistically significant increase in the number of TUNEL+ cells compared to controls (zero-inflated Poisson test, P = 0.028, n = 5 controls, 13 mutants). These results are now included in Figure 3 and in Tables S7 and S8.

      This pattern likely reflects the transient nature of apoptosis in the degenerative process, which may occur rapidly and thus be difficult to capture consistently at a single time point. Nevertheless, the quantification supports our conclusion that Sox9 loss is associated with increased photoreceptor cell death.

      Based on the above, we have included the following paragraphs in the Results section of the manuscript:

      In lines 224-252 of the revised ms, the final version of the paragraph is as follows: “Since photoreceptors are absent in severely affected Sox9-mutant retinas, we conducted TUNEL assays to study the role of cell death in the process of retinal degeneration. In control samples (n=5), almost no TUNEL signal was observed in the retina. In contrast, Sox9<sup>Δ/Δ</sup> mice (n=15) showed numerous TUNEL+ cells, mainly located in the persisting ONL, indicating that photoreceptor cells were dying (Fig. 3A). Although extensive TUNEL staining in the ONL was clearly observed in two Sox9<sup>Δ/Δ</sup> retinas with mild phenotypes, this pattern was not consistently present across the full cohort. In the remaining 13 mutant retinas, we observed a modest but noticeable increase in the number of apoptotic cells compared to controls (Fig. 3B; Table S7). Despite a high frequency of zero counts (particularly among controls), the difference between groups reached statistical significance when analyzed using a zeroinflated Poisson model (P = 0.028; n = 5 controls, 13 mutants). These findings suggest that photoreceptor apoptosis following Sox9 deletion may occur acutely and within a narrow temporal window, making it challenging to capture the full degenerative process at a single time point”.

      Lines 263-269 of the revised ms: “To support these observations quantitatively, we measured GFAP fluorescence intensity across defined retinal surface areas in control and Sox9<sup>Δ/Δ</sup> mice (Fig. 3D; Table S8). This analysis revealed a statistically significant increase in GFAP signal in mutant retinas compared to controls (Mann-Whitney U test, P = 0.0240; n = 4 controls, 10 mutants). These results are consistent with a progressive gliotic  following Sox9 deletion and provide further evidence that MG cells become reactive in the absence of Sox9”.

      Similarly, the section “Estimation of the percentage of tamoxifen-induced, Cre-mediated recombination” has been expanded as follows:

      Lines 660-665 of the revised ms: “In parallel, to quantify GFAP expression as a measure of MG reactivity, we analyzed GFAP immunofluorescence intensity across defined retinal surface areas. Given the cytoplasmic distribution of GFAP within glial processes, direct cell counting was not feasible. Instead, fluorescence intensity was measured using ImageJ, within full-thickness retinal regions in 20x microphotographs of a retinal sections stained for GAFP. The total GFAP signal was normalized to the measured area for each section”.

      (7) Line 269-320: The authors examined available scRNA-Seq data on adult retina. This data provides evidence for Sox9 expression in distinct cell types. However, the dataset does not inform about the functional role of Sox9 because Sox9 mutant cells were not analyzed with RNA-Seq. Hence, all the data that claim that this experiment provides insights into possible Sox9 functional roles must be removed. This includes panels F, G, and H in Figure 5. In general, this section of the paper (Lines 269-320) needs a major revision. Similarly, lines 442-454 in the Discussion should be removed.

      We thank the reviewer for this important observation. We agree that the scRNA-Seq dataset used in this section does not include Sox9 mutant cells and therefore does not allow us to assess the consequences of Sox9 loss-of-function. However, we believe that this analysis still provides valuable complementary information. Specifically, it confirms that Sox9 is expressed in a distinct population of limbal stem cells, and that its expression dynamically changes along differentiation trajectories. Although we do not infer causality or phenotypic consequences, the ability to observe how gene expression programs shift as Sox9 is downregulated offers insights into potential transcriptional programs associated with Sox9 activity.

      We have carefully revised Lines 269–320 to remove any overinterpretations, and eliminated the corresponding lines in the Discussion (Lines 442–454). However, we have retained Panels G, and H in Figure 5 with updated text that reflect the descriptive nature of these findings, specifically to illustrate that the Sox9-positive cell signature is consistent with a stem cell genetic program, and that when Sox9 is downregulated some gene pathways involved in stem cell differentiation are upregulated.

      Reviewer #1 (Recommendations for the authors):

      Major points

      (1) Figure 1C shows the proportions of Sox9+cells that express Sox8 in control, mild and extreme phenotypes. However, as no quantitative classification of mild and extreme phenotypes is reported along with Figure 1A, the large standard deviation for Sox9∆/∆ mild retina might be due to a misclassification of the sample. Therefore, the authors must ascribe each sample to "mild" or "extreme" based on a quantitative metric.

      We appreciate the reviewer’s suggestion to clarify the classification criteria used to distinguish “mild” and “extreme” phenotypes in Sox9<sup>Δ/Δ</sup> retinas. As noted, our classification was based on a qualitative, phenotypic assessment of retinal morphology in hematoxylin/eosin-stained sections. Specifically, retinas were classified as “extreme” when the outer nuclear layer (ONL) was completely absent, and as “mild” when the ONL was present, although often reduced in thickness. This classification reflects the observable structural depletion of the ONL and aligns well with the extent of Sox9 loss in Müller glial cells, as shown in Figure 1. We acknowledge that some variability exists within the “mild” group, likely due to differences in recombination efficiency and the mosaic nature of tamoxifen-induced deletion.

      The phenotypic classification of each individual sample is explicitly provided in Supplementary Table S1. We have also added a statement in the Results section clarifying that this classification was based on qualitative histological criteria rather than a numerical threshold.

      Lines 104-113 of the revised ms: “We categorized Sox9<sup>Δ/Δ</sup> retinas into “mild” and “extreme” phenotypes in order to facilitate interpretation of our data. Clasification was based on a qualitative assessment of ONL integrity in histological sections. Specifically, samples were classified as “extreme” when the ONL was completely depleted, and as “mild” when the ONL persisted, albeit variably reduced in thickness. This phenotypic classification reflects observable structural differences rather than a fixed quantitative threshold. Some variability exists within the “mild” group, likely due to differences in recombination efficiency and the mosaic nature of tamoxifen-induced Cre-mediated Sox9 deletion”

      (2) The authors infer Sox9 high and Sox9 low groups of limbal stem cells using an existing scRNA-seq dataset. However, an immunohistology-based validation of this difference is missing. Given that limbal stem cells express Sox9, the authors must examine the heterogeneity in Sox9 levels within the Sox8+ population to demonstrate their claim: "...Sox9 expression decreases as transiently amplifying progenitors undergo progressive differentiation from limbal to peripheral corneal cells." in Line 292. Ideally, this must be further validated using differentiation markers corresponding to CB and ILB populations that show lower Sox9 expression according to the pseudotime graph.

      To validate the Sox9 expression results obtained with scRNA-seq, we performed double immunofluorescence for Sox9 and P63, the latter expressed by the basal cells of the limbal epithelium, but not by transient amplifying cells covering the corneal surface (Pellegrini et al., 2001, https://www.pnas.org/doi/abs/10.1073/ pnas.061032098). These results can be observed in the new panel 5F. Accordingly we have included a new paragraph in lines 369-396 of the revised version of the ms:

      “To validate these results, we decided to closely examine Sox9 expression in the limbus using immunofluorescence. Previous analyses revealed that the outer limbus is approximately 100 μm wide, while the inner limbus is wider, around 240 μm (Altshuler 2021). We observed that in the region corresponding to the OLB, most cells showed strong Sox9 expression. In the area corresponding to the ILB, this immunoreactivity appeared weaker in the basal layer (corresponding to the ILB proper), and no expression was detected in the suprabasal layers (flattened cells; Fig 5F left). Double immunofluorescence for SOX9 and P63, which is expressed in basal cells of the limbal epithelium, but not by transient amplifying cells covering the corneal surface (Pellegrini et al., 2001) revealed that Sox9 expression was restricted to P63-positive cells (Fig 5F right). These observations confirm that Sox9 is expressed in a basal cell population within both the OLB and ILB, and that its expression decreases in differentiated transient amplifying cells. ”

      We also have deleted  “This expression pattern is consistent with our immunofluorescence observations" from line 356 of the revised ms.

      (3) The authors' claim of "...Sox9-null cells cannot survive or proliferate as well as their wildtype neighbors, and are hence outcompeted over time, leading to an essentially wild-type cornea" does not seem very convincing in the light of Fig.6D and S3B where Sox9 deletion can still allow for a large LacZ+ clone. Their claim of wild-type cornea due to out-competing neighbors must be validated by increasing the number of Sox9-null progenitors, which can be tested by administering tamoxifen for a significantly longer duration, leading to a majority Sox9 deficient progenitor population, and then examining limbal and corneal defects.

      As previously discussed, we observed only one instance of a large LacZ+ clone across 8 Sox9<sup>Δ/Δ</sup>-LacZ eyes. Based on prior reports of ligand-independent Cre activity (Vooijs et al., 2001; Kemp et al., 2004; Haldar et al., 2009; Dorà et al., 2015), we believe this rare event likely resulted from spontaneous recombination in a bona fide limbal stem cell, independent of tamoxifen administration. For this reason, we do not expect that increasing the dose or duration of tamoxifen would eliminate such rare events. Furthermore, due to the mosaic and highly variable recombination efficiency of the CAGG-CreERTM system in the adult eye (see McMahon et al., 2008), attempting to increase the TX dosage would likely lead to systemic toxicity or lethality, without guaranteeing full inactivation of the gene in the limbus. Thus, this system is not well-suited for generating a fully Sox9-deficient limbal epithelium. To overcome this limitation, we crossed our mice with the R26R-LacZ reporter line to track the clonal behavior of Sox9-deficient cells. In control animals (Sox9Δ/+-LacZ), LacZ+ stripes originating from limbal stem cells are readily observed. In contrast, in Sox9Δ/Δ-LacZ mutants, these clones are either absent or drastically reduced. This suggests that Sox9-null cells have a severely impaired ability to form and sustain clones. To rigorously quantify this effect, we compared 8 control and 5 mutant corneas, revealing a highly significant 8-fold reduction in LacZ-positive area in the mutants (6.65 ± 1.77% vs. 0.85 ± 0.85%; p = 0.00017; Fig. 6F; Table S12; Supp. Fig. X???), supporting our claim that Sox9null cells cannot survive or proliferate as well as their wild-type neighbors, and are hence outcompeted over time, leading to an essentially wild-type cornea.

      Minor points

      (1) Quantification for Figure 2C and 2D is missing.

      We have now included quantification of BRN3A+ retinal ganglion cells (Figure 2E) across control and Sox9<sup>Δ/Δ</sup> retinas. Cell counts were performed on matched retinal sections, and the difference between groups was found to be statistically significant through Mann–Whitney U test (Table S5).

      Regarding PAX6/AP2a, we quantified inner retinal neurons by analyzing AP2α+ amacrine cells and PAX6+/AP2α- horizontal cells as distinct subpopulations, rather than simply comparing total PAX6 or AP2α immunoreactivity. This approach allowed us to better resolve specific neuronal subtype changes. Both populations showed a statistically significant reduction in Sox9-deficient retinas relative to controls. The quantification for these analyses has now been incorporated into the revised Figure 2F and G (Table S6).

      Consequently with the above, the following paragraph of the Results section (line 210 of the revised ms:

      “We also studied the status of other retinal cell types. The transcription factor BRN3A was used to identify ganglion cells (Nadal-Nicolás et al., 2009), which were shown to decrease in number in the mutant retinas, compared to control ones (Fig. 2C). Similarly, double immunodetection of the transcription factors PAX6 and AP2A was used to identify both amacrine and horizontal cells, as previously described (Marquardt et al., 2001; Barnstable et al., 1985; Edqvist and Hallböök, 2004), showing a similar reduction in both cell types in degenerated retinas (Fig. 2D).”

      Has been modified as follows:

      “We also studied the status of other retinal cell types. The transcription factor BRN3A was used to identify ganglion cells (Nadal-Nicolás et al., 2009), which were shown to decrease in number in the mutant retinas, compared to control ones (Figs. 2C and 2D and Table S5; n = 5 controls, n = 12 mutants; Mann-Whitney U test, P = 3 × 10<sup>-4</sup>). Similarly, double immunodetection of the transcription factors PAX6 and AP2A was used to identify both amacrine and horizontal cells (Fig. 2E), as previously described (Marquardt et al., 2001; Barnstable et al., 1985; Edqvist and Hallböök, 2004), showing a similar reduction in both cell types in degenerated retinas (Figs. 2F and 2G and Table S6; AP2α+ amacrine cells: n = 3 controls, n = 8 mutants;  2-sample T-tests P = 0.029; PAX6+/AP2α− horizontal cells: n = 3 controls, n = 8 mutants; Mann-Whitney U test P = 0.021). These findings indicate that the loss of Sox9 in the adult retina ultimately leads to the degeneration of multiple inner retinal neuronal populations, beyond the previously described effects on photoreceptors and Müller glia.

      (2) Figure 4G & H: The authors must mention that the dashed lines enclose the limbal area.

      Done

      (3) The authors infer from an existing scRNA-seq dataset that OLB cells have high Sox9 expression as compared to ILB and corneal populations. However, Figures 4A and B do not indicate the anatomical positions of these cell types. The authors must label these for the reader's reference as they state that "[Sox9] expression pattern is consistent with our immunofluorescence observations" in Line 280.

      As previously indicated, we have generated a new panel 5F and a corresponding paragraph to illustrate Sox9 expression pattern in the limbus. Accordingly, we have removed the sentence from line 280.

      (4) Quantification for Figures 6A and 6B is missing.

      We have quantified the number of Sox9 and P63 positive cells in the limbus between mutant and control corneas and found no difference in the number of positive cells. We have included these data in new panel 6C and Table S11.

      Reviewer #2 (Recommendations for the authors):

      Line 24: "synapsis" should be "synapses".

      Done

      (1) Consider starting a new paragraph after line 30.

      Done

      (2) Lines 42-48: make clear that this paragraph provides information only for HUMAN SOX9.

      We now distinguish which studies were conducted in humans and which in mice.

      (3) Line 55: explain to the non-expert reader what the "visual cycle" is.

      Done (lines 64-65 of the revised ms)

      (4) Line 66: consider "inactivate" instead of "suppress".

      We substituted “suppress” with “inactivate”

      (5) Line 90-92: ONLY PCR for the cGMP will provide formal evidence that this is not present in the mouse line.

      We agree with the reviewer that PCR genotyping is the most straightforward method to exclude the presence of the Pde6<sup>brd</sup>1 allele. Although retinal degeneration was never observed in untreated or control animals in our study, we have now removed the term “formal possibility” from the text to better reflect this limitation.

      We have modified the following paragraph (page 116 in the revised version of the manuscript): “Retinal degeneration was never observed in mice that had not been tamoxifen-treated, nor any other controls, eliminating the formal possibility that the retinal degeneration allele of photoreceptor cGMP phosphodiesterase 6b (Pde6brd1) was present in our mice (Bowes et al., 1990).”

      As follows: “Retinal degeneration was never observed in mice that had not been tamoxifentreated, nor any other control groups, making the presence of the retinal degeneration allele of photoreceptor cGMP phosphodiesterase 6b (Pde6<sup>brd1</sup>) unlikely in our mice (Bowes et al., 1990). However, we acknowledge that definitive exclusion of this possibility would require PCR-based genotyping.”

      (6) Line 160-166: This paragraph needs a conclusion.

      We agree with the reviewer and have added the following sentence at the end of the paragraph:

      “These findings indicate that the loss of Sox9 in the adult retina ultimately leads to the degeneration of multiple inner retinal neuronal populations, beyond the previously described effects on photoreceptors and Müller glia”

      (7) Line: 240-265: This paragraph ends without a conclusion.

      We have include the following conclusion:

      “Thus, Sox9 is expressed in a basal limbal stem cell population with the ability to form two types of long-lived cell clones involved in stem cell maintenance and homeostasis.”

      (8) In Results, it needs to be specified when exactly in adulthood the tamoxifen treatment started. This information is only provided in the Methods.

      We have specified the age of the mice at the onset of tamoxifen treatment (two months)  and included it in the schemes of Figs 1A, 4C, 4H, 6E.

      (9) Line 250: Because live imaging is not conducted, the word "dynamics" is not suitable.

      We substituted “dynamics” with “contribution”

      (10) Panel C in Figure 6 is nice and helpful. Consider adding a similar panel in Figure 1.

      Done.

      (11) Line 420: is this the human Sox9 enhancer?

      Yes. It is a human enhancer. We have indicated it in the text.

      (12) Line 459: typo "detected tissue".

      Corrected

      (13) Line 448 and 468: citations are needed.

      Line 448 is deleted in the revised version of the ms.

      (14) 479: typo "clones clones'.

      Corrected.

    1. eLife Assessment

      Shen et al. present a computational account of individual differences in mouse exploration when faced with a novel object in an open field from a previously published study (Akiti et al.) that relates subject-specific intrinsic exploration and caution about potential hazards to the spectrum of behaviors observed in this setting. Overall, this computational study is an important contribution that leverages a very general modeling framework (a Bayes Adaptive Markov Decision Process) to quantify and interrogate distinct drivers of exploratory behavior under potential threat. Given their assumptions, the modeling results are convincing: the authors are able to describe a substantial amount of the behavioral features and idiosyncracies in this dataset, and their model affords a normative interpretation related to inherent risk aversion and predation hazard "flexibility" of individual animals and should be of broad interest to researchers working to understand open-ended exploratory behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      This work computationally characterized the threat-reward learning behavior of mice in a recent study (Akiti et al.), which had prominent individual differences. The authors constructed a Bayes-adaptive Markov decision process model, and fitted the behavioral data by the model. The model assumed (i) hazard function staring from a prior (with free mean and SD parameters) and updated in a Bayesian manner through experience (actually no real threat or reward was given in the experiment), (ii) risk-sensitive evaluation of future outcomes (calculating lower 𝛼 quantile of outcomes with free 𝛼 parameter), and (iii) heuristic exploration bonus. The authors found that (i) brave animals had more widespread hazard priors than timid animals and thereby quickly learned that there was in fact little real threat, (ii) brave animals may also be less risk-aversive than timid animals in future outcome evaluation, and (iii) the exploration bonus could explain the observed behavioral features, including the transition of behavior from the peak to steady-state frequency of bout. Overall, this work is a novel interesting analysis of threat-reward learning, and provides useful insights for future experimental and theoretical work. However, there are several issues that I think need to be addressed.

      Strengths:

      - This work provides a normative Bayesian account for individual differences in braveness/timidity in reward-threat learning behavior, which complements the analysis by Akiti et al. based on model-free threat reinforcement learning.

      - Specifically, the individual differences were characterized by (i) the difference in the variance of hazard prior and potentially also (ii) the difference in the risk-sensitivity in evaluation of future returns.

      Weakness:

      - Theoretically the effect of prior is diluted over experience whereas the effect of biased (risk-aversive) evaluation persists, but these two effects could not be teased apart in the fitting analysis of the current data.

      - It is currently unclear how (whether) the proposed model corresponds to neurobiological (rather than behavioral) findings, different from the analysis by Akiti et al.

      Comments on revisions:

      The authors have adequately replied to all the concerns that I raised in my review of the original manuscript. I do not have any remaining concern, and I am now more convinced that this work provides novel important insights and stimulates future experimental and theoretical examinations.

    3. Reviewer #3 (Public review):

      Summary:

      The manuscript presents computational modelling of the behaviour of mice during encounters with novel and familiar objects, originally reported in Akiti et al. (Neuron 110, 2022). Mice typically perform short bouts of approach followed by retreat to a safe distance, presumably to balance exploration to discover possible reward with the potential risk of predation. However, there is considerable heterogeneity in this exploratory behaviour, both across time as an individual subject becomes more confident in approaching the object, and across subjects; with some mice rapidly becoming confident to closely explore the object, while other timid mice never become fully confident that the object is safe. The current work aims to explain both the dynamics of adaptation of individual animals over time, and the quantitative and qualitative differences in behaviour between subjects, by modelling their behaviour as arising from model-based planning in a Bayes adaptive Markov Decision Process (BAMDP) framework, in which the subjects maintain and update probabilistic estimates of the uncertain hazard presented by the object, and rationally balance the potential reward from exploring the object with the potential risk of predation it presents.

      In order to fit these complex models to the behaviour the authors necessarily make substantial simplifying assumptions, including coarse-graining the exploratory behaviour into phases quantified by a set of summary statistics related to the approach bouts of the animal. Inter-individual variation between subjects is modelled both by differences in their prior beliefs about the possible hazard presented by the object, and by differences in their risk preference, modelled using a conditional value at risk (CVaR) objective, which focuses the subject's evaluation on different quantiles of the expected distribution of outcomes. Interestingly, these two conceptually different possible sources of inter-subject variation in brave vs timid exploratory behaviour turn out not to be dissociable in the current dataset as they can largely compensate for each other in their effects on the measured behaviour. Nonetheless, the modelling captures a wide range of quantitative and qualitative differences between subjects in the dynamics of how they explore the object, essentially through differences in how subject's beliefs about the potential risk and reward presented by the object evolve over the course of exploration, and are combined to drive behaviour.

      Exploration in the face of risk is a ubiquitous feature of the decision-making problem faced by organisms, with strong clinical relevance, yet remains poorly understood and under-studied, making this work a timely and welcome addition to the literature.

      Strengths:

      - Individual differences in exploratory behaviour are an interesting, important, and under-studied topic.

      - Application of cutting-edge modelling methods to a rich behavioural dataset, successfully accounting for diverse qualitative and qualitative features of the data in a normative framework.

      - Thoughtful discussion of the results in the context of prior literature.

      Limitations:

      - The model-fitting approach used of coarse-graining the behaviour into phases and fitting to their summary statistics may not be applicable to exploratory behaviours in more complex environments where coarse-graining is less straightforward.

      Comments on revisions:

      All recommendations to authors from the first review were addressed in the revised manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This work computationally characterized the threat-reward learning behavior of mice in a  recent study (Akiti et al.), which had prominent individual differences. The authors  constructed a Bayes-adaptive Markov decision process model and fitted the behavioral data  by the model. The model assumed (i) hazard function starting from a prior (with free mean  and SD parameters) and updated in a Bayesian manner through experience (actually no real  threat or reward was given in the experiment), (ii) risk-sensitive evaluation of future  outcomes (calculating lower 𝛼 quantile of outcomes with free 𝛼 parameter), and (iii) heuristic  exploration bonus. The authors found that (i) brave animals had more widespread hazard  priors than timid animals and thereby quickly learned that there was in fact little real threat,  (ii) brave animals may also be less risk-aversive than timid animals in future outcome  evaluation, and (iii) the exploration bonus could explain the observed behavioral features,  including the transition of behavior from the peak to steady-state frequency of bout. Overall,  this work is a novel interesting analysis of threat-reward learning, and provides useful  insights for future experimental and theoretical work. However, there are several issues that I  think need to be addressed.

      Strengths:

      (1) This work provides a normative Bayesian account for individual differences in  braveness/timidity in reward-threat learning behavior, which complements the analysis by  Akiti et al. based on model-free threat reinforcement learning.

      (2) Specifically, the individual differences were characterized by (i) the difference in the  variance of hazard prior and potentially also (ii) the difference in the risk-sensitivity in the  evaluation of future returns.

      Weakness:

      (1) Theoretically the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, but these two effects could not be teased apart in the  fitting analysis of the current data.

      (2) It is currently unclear how (whether) the proposed model corresponds to neurobiological ( rather than behavioral) findings, different from the analysis by Akiti et al.

      We thank reviewer #1 for their useful feedback which we’ve used to improve the discussion,  formatting and clarity of the paper, and for highlighting important questions for future  extensions of our work.

      Major points:

      (1) Line 219

      It was assumed that the exploration bonus was replenished at a steady rate when the animal  was at the nest. An alternative way would be assuming that the exploration bonus slowly  degraded over time or experience, and if doing so, there appears to be a possibility that the  transition of the bout rate from peak to steady-state could be at least partially explained by  such a decrease in the exploration bonus.

      Section 2.2.3 explains the mechanism of the exploration bonus which motivates approach.  We think that the mechanism suggested by the reviewer is, in essence, what is happening in  the model. The exploration pool is indeed depleted over time or bouts of experience at the  object. In the peak confident phase for brave animals and the peak cautious phase for timid  animals, the rate of depletion exceeds the rate of regeneration, since the agent spends only  a single turn at the nest between bouts. In the steady-state phase, the exploration pool has  depleted so much previously that the agent must wait multiple turns at the nest for the pool  to regenerate to a sufficiently high value to justify approaching the object again.

      We have updated section 2.2.3 to explain that agents spend one turn at the nest during peak  phase but multiple turns during steady-state phase. Hopefully, this makes our mechanism  clear:

      “In simulations, when 𝐺(𝑡) is high, the agent has a high motivation to explore the object,  spending only a single turn in the nest state between bouts. In other words, the depletion  from 𝐺0 substantially influences the time point at which approach makes a transition from  peak to steady-state; the steady-state time then depends on the dynamics of depletion  (when at the object) and replenishment (when at the nest). In particular, in the steady-state  phases, the agent must wait multiple turns at the nest for 𝐺(𝑡)  to regenerate so that  informational reward once again exceeds the potential cost of hazard.“

      (2) Line 237- (Section 2.2.6, 2.2.7, Figures 7, 9)

      I was confused by the descriptions about nCVaR. I looked at the cited original literature  Gagne & Dayan 2022, and understood that nCVaR is a risk-sensitive version of expected  future returns (equation 4) with parameter α (α-bar) (ranging from 0 to 1) representing risk  preference. Line 269-271 and Section 4.2 of the present manuscript described (in my  understanding) that α was a parameter of the model. Then, isn't it more natural to report  estimated values of α, rather than nCVaR, for individual animals in Section 2.2.6, 2.2.7,  Figures 7, 9 (even though nCVaR monotonically depends on α)? In Figures 7 and 9, nCVaR  appears to be upper-bounded to 1. The upper limit of α is 1 by definition, but I have no idea why nCVaR was also bounded by 1. So I would like to ask the authors to add more detailed  explanations on nCVaR. Currently, CVaR is explained in Lines 237-243, but actually, there is  no explanation about nCVaR rather than its formal name 'nested conditional value at risk' in  Line 237.

      Thank you for pointing out this error. We have corrected the paper to use nCVaR to refer to  the objective and nCVaR's α, or sometimes just α, to refer to the risk sensitivity parameter  and thus the degree of risk sensitivity.

      (3) Line 333 (and Abstract)

      Given that animals' behaviors could be equally well fitted by the model having both nCVaR ( free α) and hazard prior and the alternative model having only hazard prior (with α = 1), may  it be difficult to confidently claim that brave (/timid) animals had risk-neutral (/risk-aversive)  preference in addition to widespread (/low-variance) hazard prior? Then, it might be good to  somewhat weaken the corresponding expression in the Abstract (e.g., add 'potentially also'  to the result for risk sensitivity) or mention the inseparability of risk sensitivity and prior belief  pessimism (e.g., "... although risk sensitivity and prior belief pessimism could not be teased  apart").

      Thank you for this suggestion, we have duly weakened the wording in the Abstract to say  “potentially more risk neutral”:

      “Some animals begin with cautious exploration, and quickly transition to confident approach  to maximize exploration for reward; we classify them as potentially more risk neutral, and  enjoying a flexible hazard prior. By contrast, other animals only ever approach in a cautious  manner and display a form of  self-censoring; they are characterized by potential risk  aversion and high and inflexible hazard priors.”

      Reviewer #2 (Public Review):

      Shen and Dayan build a Bayes adaptive Markov decision process model with three key  components: an adaptive hazard function capturing potential predation, an intrinsic reward  function providing the urge to explore, and a conditional value at risk (CvaR, closely related  to probability distortion explanations of risk traits). The model itself is very interesting and  has many strengths including considering different sources of risk preference in generating  behavior under uncertainty. I think this model will be useful to consider for those studying  approach/avoid behaviors in dynamic contexts.

      The authors argue that the model explains behavior in a very simple and unconstrained  behavioral task in which animals are shown novel objects and retreat from them in various  manners (different body postures and patterns of motor chunks/syllables). The model itself  does capture lots of the key mouse behavioral variability (at least on average on a  mouse-by-mouse basis) which is interesting and potentially useful. However, the variables in  the model - and the internal states it implies the mice have during the behavior - are  relatively unconstrained given the wide range of explanations one can offer for the mouse  behavior in the original study (Akiti et al). This reviewer commends the authors on an original  and innovative expansion of existing models of animal behaviour, but recommends that the  authors  revise their study to reflect the obvious  challenges . I would also recommend a  reduction in claiming that this exercise gives a normative-like or at least quantitative account  of mental disorders.

      We thank reviewer #2 for highlighting some of the strengths of our paper as well as pointing  out important limitations of Akiti et al’s original study which we’ve inherited as well as some  limitations of our own method. We address their concerns below.

      We have added a paragraph to the discussion discussing the limitations of the state  representation we adopted from Akiti’s study.

      (Reviewer #1 had the same concern, see above) “Motivated by tail-behind versus  tail-exposed in Akiti et al. (2022), we model approach using a dichotomy between cautious  and confident approach states [...]”

      We have reduced the suggestion that our model provides an account of mental disorders in  the abstract.

      Before:

      “On the other hand, “timid” animals, characterized by risk aversion and high and inflexible  hazard priors, display self-censoring that leads to the sort of asymptotic maladaptive  behavior that is often associated with psychiatric illnesses such as anxiety and depression.”

      After:

      “By contrast, other animals only ever approach in a cautious manner and display a form of  self-censoring; they are characterized by potential risk aversion and high and inflexible  hazard priors. “

      My main comment is that this paper is a very nice model creation that can characterize the  heterogeneity rodent behavior in a very simple approach/avoid context (Akiti et al; when a  novel object is placed in an arena) that itself can be interpreted in a multitude of ways. The  use of terms like "exploration", "brave", etc in this context is tricky because the task does not  allow the original authors (Akiti et al) to quantify these "internal states" or "traits" with the  appropriate level of quantitative detail to say whether this model is correct or not in capturing  the internal states that result in the rodent behavior. That said, the original behavioral setup  is so simple that one could imagine capturing the behavioral variability in multiple ways ( potentially without evoking complex computations that the original authors never showed  the mouse brain performs). I would recommend reframing the paper as a new model that  proposes a set of internal states that could give rise to the behavioral heterogeneity  observed in Akiti et al, but nonetheless is at this time only a hypothesis. Furthermore, an  explanation of what would be really required to test this would be appreciated to make the  point clearer.

      We thought very hard about using terms that might be considered to be anthropomorphic  such as ‘timid’ and ‘brave’. We are, of course, aware, of the concerns articulated by  investigators such as LeDoux about this. However, we think that, provided that we are clear  on the first appearance (using ‘scare’ quotes) that we are using them as indeed labels for  latent characteristics that capture correlations in various aspects of behaviour, they are more  helpful than harmful in making our descriptions understandable.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript presents computational modelling of the behaviour of mice during  encounters with novel and familiar objects, originally reported by Akiti et al. (Neuron 110, 2022)          . Mice typically perform short bouts of approach followed by a retreat to a safe  distance, presumably to balance exploration to discover possible rewards with the potential  risk of predation. However, there is considerable heterogeneity in this exploratory behaviour,  both across time as an individual subject becomes more confident in approaching the object,  and across subjects; with some mice rapidly becoming confident to closely explore the  object, while other timid mice never become fully confident that the object is safe. The  current work aims to explain both the dynamics of adaptation of individual animals over time,  and the quantitative and qualitative differences in behaviour between subjects, by modelling  their behaviour as arising from model-based planning in a Bayes adaptive Markov Decision  Process (BAMDP) framework, in which the subjects maintain and update probabilistic  estimates of the uncertain hazard presented by the object, and rationally balance the  potential reward from exploring the object with the potential risk of predation it presents.

      In order to fit these complex models to the behaviour the authors necessarily make  substantial simplifying assumptions, including coarse-graining the exploratory behaviour into  phases quantified by a set of summary statistics related to the approach bouts of the animal.  Inter-individual variation between subjects is modelled both by differences in their prior  beliefs about the possible hazard presented by the object and by differences in their risk  preference, modelled using a conditional value at risk (CVaR) objective, which focuses the  subject's evaluation on different quantiles of the expected distribution of outcomes.  Interestingly these two conceptually different possible sources of inter-subject variation in  brave vs timid exploratory behaviour turn out not to be dissociable in the current dataset as  they can largely compensate for each other in their effects on the measured behaviour.  Nonetheless, the modelling captures a wide range of quantitative and qualitative differences  between subjects in the dynamics of how they explore the object, essentially through  differences in how subject's beliefs about the potential risk and reward presented by the  object evolve over the course of exploration, and are combined to drive behaviour.

      Exploration in the face of risk is a ubiquitous feature of the decision-making problem faced  by organisms, with strong clinical relevance, yet remains poorly understood and  under-studied, making this work a timely and welcome addition to the literature.

      Strengths:

      (1) Individual differences in exploratory behaviour are an interesting, important, and  under-studied topic.

      (2) Application of cutting-edge modelling methods to a rich behavioural dataset, successfully  accounting for diverse qualitative and qualitative features of the data in a normative  framework.

      (3) Thoughtful discussion of the results in the context of prior literature.

      Limitations:

      (1) The model-fitting approach used of coarse-graining the behaviour into phases and fitting  to their summary statistics may not be applicable to exploratory behaviours in more complex  environments where coarse-graining is less straightforward.

      (2) Some aspects of the work could be more usefully clarified within the manuscript.

      We thank reviewer #3 for their positive feedback and helping us to improve the clarity of our  paper. We have added discussion they thought was missing.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25-28

      This part of the Abstract might give an impression that timidity (but not braveness) is  potentially associated with psychiatric illness and even that timidity is thus inferior to  braveness. However, even though extreme timidity might indeed be associated with anxiety  or depression, extreme braveness could also be associated with other psychiatric or  behavioral problems. Moreover, as a population, the existence of both timid and brave  individuals could be advantageous, and it could be a reason why both types of individuals  evolutionarily survived in the case of wild animals (although Akiti et al. used mice, which may  have no or very limited genetic varieties, and so things may be different). So I would like to  encourage the authors to elaborate on the expression of this part of the Abstract and/or  enrich the related discussion in the Discussion.

      This is an important point. We note on line 38 that excessive novelty seeking (potentially  caused by excessive braveness) could also be maladaptive.

      Additionally, we have added a paragraph to the discussion discussing heterogeneity in risk  sensitivity within a population.

      “Our data show that there is substantial variation in the degrees of risk sensitivity across the  mice.  Previous works have reported substantial interpopulation and intrapopulation  differences in risk-sensitivity in humans which depend on gender, age, socioeconomic  status, personality characteristics, wealth and culture (Rieger et al., 2015; Frey et al., 2017).  Despite the normative appeal of 𝛼 = 1, it is possible that a population may benefit from  including individuals with $\alpha$ different from 1.0 or highly negative priors. For example,  more cautious individuals could learn from merely observing the risky behavior of less  cautious individuals. Furthermore, we have only considered risk-sensitivity under epistemic  uncertainty in our work. Risk averse individuals, for instance with 𝛼 < 1 may be more  successful than risk-neutral agents in environments where there are unexpected dangers ( unknown unknowns). Risk-aversion is thus a temperament of ecological and evolutionary  significance (Réale et al., 2007).”

      (2) Line 149

      Section 2.2 consists of eight subsections. I think this organization may not be very  appealing, because there are a bit too many subsections, and their relations are not  immediately clear to readers. So I would like to encourage the authors to make an  elaboration. For example, since 2.2.1 - 2.2.5 describes a summary of model construction  and model fitting whereas 2.2.6-2.2.8 shows the results, it could be good to divide these into  separate sections (2.2.1 - 2.2.5 and 2.3.1 - 2.3.3).

      Thank you for pointing this out. We’ve renumbered the sections as you’ve suggested.

      (3) Line 347-8

      Theoretically, the effect of prior is diluted over experience whereas the effect of biased  (risk-aversive) evaluation persists, as the authors mentioned in Lines 393-394. Then isn't it  possible to consider environments/conditions in which the two effects can be separated?

      We appreciate this suggestion. Indeed, our original thought in modeling this experiment was  that this would be exactly the case here - with epistemic uncertainty reducing as the object  became more familiar. However, proving to an animal that a single environment is  completely stationary/fixed is hard - reflected in our conclusion here that the exploration  bonus pool replenishes. Thus, we argued in the discussion that a series of environments  would be necessary to separate risk sensitivity from priors.

      (4) Line 407

      It would be nice to add a brief phrase explaining how (in what sense) this model's  assumption was consistent with the reported behavior. Also, should the assumption of  having two discrete approach states (cautious and confident) itself be regarded as a  limitation of the model? If the tail-behind and tail-exposure approaches were not merely  operationally categorized but were indicated to be two qualitatively distinct behaviors in the  experiment by Akiti et al., it is reasonable to model them as two discrete states, but  otherwise, the assumption of two discrete states would need to be mentioned as a  simplification/limitation.

      We have now removed line 407, and now have an additional  paragraph in the discussion  discussing the limitations of the tail-behind and tail-exposure state representation: “Motivated by tail-behind versus tail-exposed in Akiti et al. (2022), we model approach using  a dichotomy between cautious and confident approach states. This is likely a crude  approximation to the continuous and multifaceted nature of animal approach behavior. For  example, during approach animals likely adjust their levels of vigilance continuously (or  discretely; Lloyd and Dayan (2018)) to  monitor threat, and choose different velocities for  movement, and different attentional strategies for inspecting the novel object. We hope  future works will model these additional behavioral complexities, perhaps with additional  internal states, and corroborate these states with neurobiological data.”

      (5) Line 418

      The authors contrasted their model-based analyses with the model-free analyses of Akiti et  al. Another aspect of differences between the authors' model and the model of Akiti et al. is  whether it is normative or mechanistic: while how the model of Akiti et al. can be biologically  implemented appears to be clear (TS dopamine represents threat TD error, and TS  dopamine-dependent cortico-striatal plasticity implements TD error-based update of  model-free threat prediction), biological implementation of the authors' model seems more  elusive. Given this, it might be a fruitful direction to explore how these two models can be  integrated in the future.

      We enthusiastically agree that it would be most interesting in the future to explore the  integration of the two models - and, in the discussion ( Lines 537-548, 454-461) , point to  some first steps that might be fruitful along these lines. There are two separate  considerations here: one is that our account is mostly computational and algorithmic,  whereas Akiti’s model is mostly algorithmic and implementational; the second is, as noted by  the reviewer, that our account is model-based, whereas Akiti’s model is model-free (in the  sense of reinforcement learning; RL). These are related - thanks in no small part to the work  from the group including Akiti, we know a lot more about the implementation of model-free  than model-based RL. However, our model-based account does reach additional features of  behavior not captured in Akiti et al.’s model such as bout duration, frequency, and approach  type. Thus, the temptation of unification.

      (6) Line 426

      Related to the previous point, it would be nice to more specifically describe what variable TS  dopamine can represent in the authors' model if possible.

      In the discussion  (Lines 454-461) , we speculate that  TS dopamine could still respond to the  physical salience of the novel object and affect choices by determining the potential cost of  the encountered threat or the prior on the hazard function. For example, perhaps ablating TS  dopamine reduces the hazard priors which leads to faster transition from cautious to  confident approach and longer bout durations, consistent with the optogenetics behavioral  data reported in Akiti et al.

      Reviewer #2 (Recommendations for the authors):

      My guess is simpler versions of the model would not fit the data well. But this does not mean  for example that the mice have probability distortions (CvaR) or that even probabilistic  reasoning and the internal models necessary to support them are acting in the behavioral  context studied by Akiti. So related to the above, I would ask what other models would fit and  would not fit the data? And what does this mean?

      These are good points. Our model provides an approximately normative account of the  animals’ behavior  in terms of what it achieves relative to a utility function. In practice, the  animals could deploy a precompiled model-free policy (which does not rely on probabilistic  computations) that is exactly equivalent to our model-based policy. With the current  experiment, we cannot conclude whether or not the animals are performing the prospective  calculations in an online manner. Of course, the extent to which animals or humans are  performing probabilistic computations online and have internal models are on-going  questions of study.

      Model comparison is difficult because currently we do not know of any other risk-sensitive  exploration models. We cannot directly compare to the model in Akiti et al. since our model  explains additional features of behavior: bout duration, frequency, and approach type.  Indeed, our model is as simple as it can be in the sense with the exception of nCVaR,  removing any of the other parameters makes it difficult to fit some animals in our dataset. In the future, our model could be used to fit other datasets of risk-sensitive exploration and,  ideally,  be compared to other models.

      Explaining why animals avoid the novel object in what the offers call benign environment is a  very tricky issue. In Akiti et al, the readers are not yet convinced that the mice know that this  environment is benign. Being placed in an arena with a novel object presents mice with a  great uncertainty and we do not know whether they treat this as benign. Therefore, the  alternative explanations in this study need to be carefully discussed in lieu of the limitations  of the initial study.

      It is certainly true that it is unclear if the arena is  completely  benign to the animals. However,  the amount of time the animal spends in the center of the arena decreases significantly from  habituation to novelty days. This suggests that the animals avoid the novel object largely  because of the object itself, rather than the potential danger associated with the arena.  Furthermore, the animals are not reported as exhibiting more extreme behaviours such as  freezing. In any case, our account is relative in the sense that we are comparing the time the  animal spends at the object versus elsewhere in the environment, driven by the relative  novelty and relative risk of the environment versus the object. Trying to get more absolute  measures of these quantities would require a richer experimental set-up, for instance with  different degree of habituation or experience of the occurrence of (other) novel objects, in  general.

      We added a short note to the discussion to explain this:

      “Fourth, we modeled the relative amount of time the animal spends at the object versus  elsewhere in the environment which depends on the differential risk in the two states.  However, it is likely the animals avoid the novel object largely because of the object itself,  rather than the potential danger associated with the arena since they spend much less time  at the center of the arena during novelty than habituation days.”

      Figure 2 - how confident are the authors that each mouse differs from y=1? Related to this,  the behavior in Akiti is very noisy and changes across time. I am not sure if the authors fully  describe at what levels their model captures the behavior vs not in a detailed enough  fashion.

      We have performed a random permutation test on the minute-to-minute data. We have  updated Figure 2 so that brave animals that pass the Benjamini–Hochberg procedure y>1 at  level q=0.05 are represented with solid green dots and animals that don’t pass are  represented with hollow dots. 8 out of 11 brave animals passed Benjamini–Hochberg.

      Reviewer #3 (Recommendations for the authors):

      (1) I could not find information in the preprint about code availability. Please consider making  the code public to help others apply these modelling methods.

      We have released code and included the url in the paper in the Methods section.

      (2) Though the manuscript was generally clearly written, there were a number of places  where some additional information or clarification would be useful:

      a) Please define and explain the terms 'tail-behind' and 'tail-exposed' (used to describe  approach bout types) when first used.

      We have added definitions when we first mention these terms:

      “[...] 'tail-behind' (bouts where the animal's nose was closer to the object than the tail for the  entire bout) and 'tail-exposed' (bouts where the animal's tail is closer to the object than the  nose at some point during the bout), associated respectively with cautious risk-assessment  and engagement”

      b) At lines 57-58 when contrasting the 'model-free' account of Akiti et al with the 'model-based' account of the current work, it would be worth clarifying that these terms are  being used in the RL sense rather than e.g. a model-based analysis of the data.  

      We have updated the relevant lines to say “model-free/based reinforcement learning”.

      c) Line 61, the phrase 'the significant long-run approach of timid animals despite having  reached the "avoid" state' is unclear as the 'avoid' state has not been defined.

      We updated the terminology to “avoidance behavior” to be consistent with Akiti et al.  Avoidance refers to the animal routinely avoiding the object and therefore being unable to  learn whether it is safe.

      d) It was not completely clear to me how the coarse-graining of the behaviour was  implemented. Specifically, how were animals assigned to the brave, intermediate, or timid  group, and how were the parameters of the resulting behavioural phases fit?

      Sorry that this was not clear. Section 2.1 explains how the minute-to-minute behavioral data  was coarse-grained and how animal groups were assigned. We have added further  explanation of Figure 2 to the main text:

      “Fig 2 summarizes our categorization of the animals into the three groups: brave,  intermediate, and timid based on the phases identified in the animal's exploratory  trajectories. Timid animals spend no time in confident approach and are plotted in orange at  the origin of Fig 2. Brave animals differ from intermediate animals in that their approach time  during the first ten minutes of the confident phase is greater than the last ten minutes ( steady-state phase). Brave animals are plotted in green above and intermediate animals  are plotted in black below the y=1 line in Fig 2.”

      We also added extra information to outline the goal, and methodology of coarse-graining and  animal grouping:

      “We sought to capture  these qualitative differences (cautious versus confident) as well as  aspects of the quantitative changes in bout durations and frequencies as the animal learns  about their environment. To make this readily possible, we abstracted the data in two ways:

      averaging  bout statistics over time, and clustering the animals into three groups with  operationally distinct behaviors.”

      e) What purpose does the 'retreat' state serve in the BAMDP model (as opposed to  transitioning directly from 'object' to 'nest' states), and why do subjects not pass through it  following 'detect' states?

      Thank you for pointing this out. We have updated Figure 3 to note that the two “detected  states” also point to the “retreat” state. The reviewer is correct that there could be alternative  versions of the state diagram, and the ‘retreat’ state could indeed have been eliminated.  However, we thought that it was helpful to structure the animal’s progress through state  space.

      f) Why was the hazard function parameterised via the mean and SD at each time step rather  than with a parametric form of the mean and SD as a function of time?

      Since the agent can only spend 2, 3, or 4 turns at the object states, we didn’t see a need to  parameterize the mean and SD as a function of time. Doing so is a good solution to scaling  up the hazard function to more time-steps.

      (3) There were also a couple of points that could potentially be usefully touched on in the  discussion:

      a) What, if any, is the relationship between the CVaR objective and distributional RL? They  seem potentially related due to both focussing on quantiles of the outcome distribution.

      We have added a paragraph to the discussion discussing the connection between  distributional RL and CVaR:

      “CVaR is known to come in different flavors in the case of temporally-extended behavior.  Gagne and Dayan (2021) introduces two alternative time-consistent formulations of CVaR:  nested CVaR (nCVaR) and precommitted CVaR (pCVaR). nCVaR and pCVaR both enjoy  Bellman equations which make it possible to compute approximately optimal policies without  directly computing whole distributions of the outcomes. We use nCVaR in this study for its  computational efficiency. There is, of course, great current interest in distributional  reinforcement learning (Bellemare et al., 2023b) which does acquire such whole  distributions, not the least because of prominent observations linking non-linearities in the  response functions of dopamine neurons to methods for learning distributions of outcomes ( Dabney et al., 2020; Masset et al., 2023; Sousa et al., 2023). One functional motivation for  considering entire outcome distributions is the possibility of using them to determine  risk-sensitive policies (Gagne and Dayan, 2021).

      While it is possible to compute CVaR directly from return distributions, Gagne and Dayan  (2021) showed that this can lead to temporally inconsistent policies where the agent  deviates from its original plans (the authors called this the fixed CVaR or fCVaR measure).

      Rather further removed from our model-based methods is work from Antonov and Dayan  (2023), who consider a model-free exploration strategy which exploits full return distributions  to compute the value of perfect information which is used as a heuristic for trying actions  with uncertain consequences. Future works can examine risk-sensitive versions of Antonov  and Dayan (2023)'s computationally efficient model-free algorithm as one solution to the  burdensome computations in our model-based method.”

      b) Why normatively might subjects have non-neutral risk preference as captured by the  CvaR?

      We also added a paragraph to the discussion discussing the advantage of heterogeneity in  risk sensitivity within a population:

      (Reviewer #1 had the same question, see above) “Our data show that there is substantial  variation in the degrees of risk sensitivity across the mice.  Previous works have reported  substantial interpopulation and intrapopulation differences in risk-sensitivity in humans which  depend on gender, age, socioeconomic status, personality characteristics, wealth and culture [...]”

      c) Relevance of the current modelling work to clinical conditions characterised by  dysregulation of risk assesment (e.g. anxiety or PTSD).

      We’ve added a paragraph to the discussion:

      “Inter-individual differences in risk sensitivity are also of critical importance in psychiatry,  reflected in a panoply of anxiety disorders (Butler and Mathews, 1983; Giorgetta et al., 2012;  Maner et al., 2007; Charpentier et al., 2017), along with worry and rumination (Gagne and  Dayan, 2022). Understanding the spectrum of   extreme priors and extreme values of 𝛼  could have therapeutic implications, adding significance to the search for tasks that can  more cleanly separate them.”

      d) Is it surprising to see differences in risk preference (nCVaR) between the familiar object  and novel object condition, given that risk preference might be conceptualised as a trait  rather than a state variable?

      Thank you for raising this point. You are right that we expected risk sensitivity (nCVaR alpha)  to be the same between FONC and UONC animals on average. It is difficult to know if alpha  is higher for FONC than UONC animals due to the non-identifiability between alpha and  hazard priors. We have added this discussion to the paper:

      “This is surprising if we interpret 𝛼 as a trait that is stable through time. Unfortunately, due to  the non-identifiability between 𝛼 and hazard priors, we cannot verify whether 𝛼 is actually  higher for FONC animals than UONC animals.”

    1. eLife Assessment

      The manuscript by Li and coworkers analyzed astrocytic differentiation of midbrain floor plate-patterned neural cells originating from human iPS cells, with a LMX1A reporter. This valuable work identifies transcriptomic differences at the single-cell level, between astrocytes generated from LMX1A reporter positive or negative cells, as well as non-patterned astrocytes and neurons. The evidence is solid, but the paper can be strengthened by further analyses of the transcriptomic data, and astrocytic morphology; also, searching for some of the differentially expressed genes by immunohistochemistry in different regions of the mammalian brain, or in human specimens, would be very informative.

    2. Editorial note: To ensure a thorough evaluation of the revised manuscript, we invited a third reviewer to assess whether the authors had sufficiently addressed the concerns raised in the initial round of peer review. This additional reviewer confirmed that the authors responded partially to the original reviewers requests. While he/she also provided a set of new comments, these do not alter the original assessment or editorial decision regarding the manuscript. For transparency and completeness, the additional comments are included below.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Li and coworkers present experiments generated with human induced pluripotent stem cells (iPSCs) differentiated to astrocytes through a three-step protocol consisting of neural induction/midbrain patterning, switch to expansion of astrocytic progenitors, and terminal differentiation to astroglial cells. They used lineage tracing with a LMX1A-Cre/AAVS1-BFP iPSCs line, where the initial expression of LMX1A and Cre allows the long-lasting expression of BFP, yielding BFP+ and BFP- populations, that were sorted when in the astrocytic progenitor expansion. BFP+ showed significantly higher number of cells positive to NFIA and SOX9 than BFP- cells, at 45 and 98 DIV. However, no significant differences in other markers such as AQP4, EAAT2, GFAP (which show a proportion of less than 10% in all cases) and S100B were found between BFP-positive or -negative, at these differentiation times. Intriguingly, non-patterned astrocytes produced higher proportions of GFAP positive cells than the midbrain-induced and then sorted populations. BFP+ cells have enhanced calcium responses after ATP addition, compared to BFP- cells. Single-cell RNA-seq of early and late cells from BFP- and BFP+ populations were compared to non-patterned astrocytes and neurons differentiated from iPSCs. Bioinformatic analyses of the transcriptomes resulted in 9 astrocyte clusters, 2 precursor clusters and one neuronal cluster. DEG analysis between BFP+ and BFP- populations showed some genes enriched in each population, which were subject to GO analysis, resulting in biological processes that are different for BFP+ or BFP- cells.

      Strengths:

      The manuscript tries to tackle an important aspect in Neuroscience, namely the importance of patterning in astrocytes. Regionalization is crucial for neuronal differentiation and the presented experiments constitute a trackable system to analyze both transcriptional identities and functionality on astrocytes.

      Weaknesses:

      The presented results have several fundamental issues, to be resolved, as listed in the following major points:

      (1) It is very intriguing that GFAP is not expressed in late BFP- nor in BFP+ cultures, when authors designated them as mature astrocytes.<br /> (2) In Fig. 2D, authors need to change the designation "% of positive nuclei".<br /> (3) In Fig. 2E, the text describes a decrease caused by 2APB on the rise elicited by ATP, but the graph shows an increase with ATP+2APB. However, in Fig. 2F, the peak amplitude for BFP+ cells is higher in ATP than in ATP+2APD, which is mentioned in the text, but this is inconsistent with the graph in 2E.<br /> (4) The description of Results in the single-cell section is confusing, particularly in the sorted CD49 and unsorted cultures. Where do these cells come from? Are they BFP-, BFP+, unsorted for BFP, or non-patterned? Which are the "all three astrocyte populations"? A more complete description of the "iPSC-derived neurons" is required in this section to allow the reader to understand the type and maturation stage of neurons, and if they are patterned or not.<br /> (5) A puzzling fact is that both BFP- and BFP- cells have similar levels of LMX1A, as shown in Fig. S6F. How do authors explain this observation?<br /> (6) In Fig. 3B, the non-patterned cells cluster away from the BFP+ and BFP-; on the other hand, early and late BFP- are close and the same is true for early and late BFP+. A possible interpretation of these results is that patterned astrocytes have different paths for differentiation, compared to non-patterned cells. If that can be implied from these data, authors should discuss the alternative ways for astrocytes to differentiate.<br /> (7) Fig. 3D shows that cluster 9 is the only one with detectable and coincident expression of both S100B and GFAP expression. Please discuss why these widely-accepted astrocyte transcripts are not found in the other astrocytes clusters. Also, Sox9 is expressed in neurons, astrocyte precursors and astrocytes. Why is that?<br /> (8) Line 337, Why authors selected a log2 change of 0.25? Typically, 1 or a higher number is used to ensure at least a 2-fold increase, or a 50% decrease. A volcano plot generated by the comparison of BFP+ with BFP- cells would be appropriate. The validation of differences by immunocytochemistry, between BFP+ and BFP-, is inconclusive. The staining is blur in the images presented in Fig. S8C. Quantification of the positive cells, without significant background signal, in both populations is required.<br /> (9) Lines 349-351: BFP+ cells did not show higher levels of transcripts for LMX1A nor FOXA2. This fact jeopardizes the claim that these cells are still patterned. In the same line, there are not significant differences with cortical astrocytes, indicating a wider repertoire of the initially patterned cells, that seems to lose the midbrain phenotype. Furthermore, common DGE shared by BFP- and BFP+ cells when compared to non-patterned cells indicate that after culture, the pre-pattern in BFP+ cells is somehow lost, and coincides with the progression of BFP- cells.<br /> (10) For the GO analyses, How did authors select 1153 genes? The previous section mentioned 287 genes unique for BFP+ cells. The Results section should include a rationale for performing a wider search for the enriched processes.<br /> (11) For Fig. 4C and 4D, both p values and the number of genes should be indicated in the graph. I would advise to select the 10 or 15 most significant categories, these panels are very difficult to read. Whereas the listed processes for BFP+ have a relation to Parkinson disease, the ones detected for BFP- cells are related to extracellular matrix and tissue development. Does it mean that BFP+ cells have impaired formation of this matrix, or defective tissue development? This is in contradiction of enhanced calcium responses of BFP+ cells compared to BFP- cells.<br /> (12) Both the comparison between midbrain and cortical astrocytes in Fig. S8A, and the volcano plot in S8B do not show consistent changes. For example, RCAN2 in Fig. S8A has the same intensity for cortical and midbrain cells, but is marked as an enriched gene in midbrain in the p vs log2FC graph in Fig. S8B.

    3. Author response:

      The following is the authors’ response to the current reviews.

      Response to Reviewer #3:

      We thank reviewer 3 for spending their valuable time on commenting on our revised paper.

      We would like to reiterate the central conclusion of this work, which appears to have been missed by Reviewer 3. Using a BFP-expressing lineage tracer hPSC line for tracking LMX1A+ midbrain-patterned neural progenitors and their differentiated progeny, we discovered a loss of the LMX1A lineage during pluripotent stem cell differentiation into astrocytes, despite BFP+ neural progenitors were the dominant population at the onset of astrocyte induction.

      Hence, the take-home message of this study is, as summarized in the abstract, ‘ the lineage composition of iPSC-derived astrocytes may not accurately recapitulate the founder progenitor population’ and that one should not take for granted that in vitro/stem cell-derived astrocytes are the descendants of the dominant starting neural progenitors (which is a general assumption in PSC publications as described in the paper and our response to reviewers).

      Please find below our point-by-point response to reviewer comments. We have re-ordered the points according to their relative importance to our main conclusions.

      ‘ the lineage composition of iPSC-derived astrocytes may not accurately recapitulate the founder progenitor population’ and that one should not take for granted that in vitro/stem cell derived astrocytes are the descendants of the dominant starting neural progenitors (which is a general assumption in PSC publications as described in the paper and our response to reviewers).

      Please find below our point-by-point response to their comments. We have re-ordered the points according to their relative importance to our main conclusions.

      …. They used lineage tracing with a LMX1A-Cre/AAVS1-BFP iPSCs line, where the initial expression of LMX1A and Cre allows the long-lasting expression of BFP, yielding BFP+ and BFP- populations, that were sorted when in the astrocytic progenitor expansion. BFP+ showed significantly higher number of cells positive to NFIA and SOX9 than BFP- cells …

      This is a misunderstanding by reviewer 3. As indicated in the first sentence of the second section, BFP- populations used for functional and transcriptomic analysis was not sorted BFP<sup>-</sup> cells, but those derived from unsorted, BFP<sup>+</sup> enriched populations. Our scRNAseq analysis indicated that they were transcriptomically aligned to human midbrain astrocytes. This finding is consistent with the fact that they are derived from midbrain-patterned neural progenitors, presumably minority LMX1A- progenitors.

      Reviewer 3’s comments indicate that they misunderstood the primary aims of our study as a mere functional and transcriptomic comparison of the two astrocyte populations.

      (9) BFP+ cells did not show higher levels of transcripts for LMX1A nor FOXA2. This fact jeopardizes the claim that these cells are still patterned. In the same line, there are not significant differences with cortical astrocytes, indicating a wider repertoire of the initially patterned cells, that seems to lose the midbrain phenotype. Furthermore, common DGE shared by BFP- and BFP+ cells when compared to non-patterned cells indicate that after culture, the pre-pattern in BFP+ cells is somehow lost, and coincides with the progression of BFP- cells.

      The reviewer seems to assume that astrocytes derived from LMX1A+ ventral midbrain progenitors must retain LMX1A expression. We do not take this view and do not claim this in this study. Moreover, we have discussed in the paper that due to a lack of transcriptomic studies of in vivo track regional progenitors (such as LMX1A), it remains unknown whether and to what extent patterning gene expression is maintained in astrocytes of different brain regions.

      Our findings on the lack of LMX1A and FOXA2 in BFP+ astrocytes are supported by several published single-cell transcriptomic studies of human midbrain astrocytes (La Manno et al. 2016; Agarwal et al. 2020; Kamath et al. 2022). We have a paragraph of discussion on this topic in both the original and updated versions of the paper with the relevant publications cited.

      Other points raised by reviewer 3

      (1) It is very intriguing that GFAP is not expressed in late BFP- nor in BFP+ cultures, when authors designated them as mature astrocytes.

      We did not designate our cells as ‘mature’ astrocytes but ‘astrocytes’ based on their global gene expression with the human fetal and adult brain astrocytes as references.

      Moreover, ‘mature’ only appeared once in the paper indicating that our cells lie in between the fetal and adult astrocytes in maturity.

      (2) In Fig. 2D, authors need to change the designation "% of positive nuclei".

      To be corrected in the version of record.

      (3) In Fig. 2E, the text describes a decrease caused by 2APB on the rise elicited by ATP, but the graph shows an increase with ATP+2APB. However, in Fig. 2F, the peak amplitude for BFP+ cells is higher in ATP than in ATP+2APD, which is mentioned in the text, but this is inconsistent with the graph in 2E.

      To be corrected in the version of record.

      (4) The description of Results in the single-cell section is confusing, particularly in the sorted CD49 and unsorted cultures. Where do these cells come from? Are they BFP-, BFP+, unsorted for BFP, or non-patterned? Which are the "all three astrocyte populations"? A more complete description of the "iPSC-derived neurons" is required in this section to allow the reader to understand the type and maturation stage of neurons, and if they are patterned or not.

      As previously reported in the reference cited, CD49 is a novel human astrocyte marker. This is independent of BFP expression. For all three astrocyte populations studied here (BFP+, BFP-, and non-patterned astrocytes), we included both CD49f+ sorted and unsorted samples to account for selection bias caused by FACS. iPSC-derived neurons were included in the sequencing study to provide a reference for cell-type annotation. They were generated following a GABAergic neuron differentiation protocol. However, their maturation stages and/or regional characteristics are not relevant to astrocytes.

      (5) A puzzling fact is that both BFP- and BFP- cells have similar levels of LMX1A, as shown in Fig. S6F. How do authors explain this observation?

      This figure panel shows that LMX1A, LMX1B and FOXA2 are essentially NOT expressed in these astrocytes.

      (6) In Fig. 3B, the non-patterned cells cluster away from the BFP+ and BFP-; on the other hand, early and late BFP- are close and the same is true for early and late BFP+. A possible interpretation of these results is that patterned astrocytes have different paths for differentiation, compared to non-patterned cells. If that can be implied from these data, authors should discuss the alternative ways for astrocytes to differentiate.

      Both BFP+ and BFP- astrocyte are from ventral midbrain patterned neural progenitors, while non-patterned neural progenitors are more akin to that of forebrain. Figure 3B is expected and confirms the patterning effect.

      (7) Fig. 3D shows that cluster 9 is the only one with detectable and coincident expression of both S100B and GFAP expression. Please discuss why these widely-accepted astrocyte transcripts are not found in the other astrocytes clusters. Also, Sox9 is expressed in neurons, astrocyte precursors and astrocytes. Why is that?

      S100B and GFAP are classic astrocyte markers in certain states. We are not relying only on two markers but the genome-wide expression profile as the criteria for astrocytes. As shown in the unbiased reference mapping to multiple human brain astrocyte scRNA-seq datasets, all our astrocyte clusters were mapped with high confidence to human astrocytes.

      SOX9 is an important regulator for astrogenesis, so its expression is expected in precursors (doi.org/10.1016/j.neuron.2012.01.024). In addition, recent studies have uncovered that SOX9 expression is also reported in foetal striatal projection neurons and early postnatal cortical neurons, where SOX9 regulates neuronal synaptogenesis and morphogenesis (dois:10.1016/j.fmre.2024.02.019; 10.1016/j.neuron.2018.10.008). Therefore, the expression of SOX9 in multiple cell types was expected. Instead of using a few selected markers for cell-type annotation, we employed a genomic approach relying on an unbiased reference mapping approach and a combination of various markers to ascertain our annotation results.

      (8) Line 337, Why authors selected a log2 change of 0.25? Typically, 1 or a higher number is used to ensure at least a 2-fold increase, or a 50% decrease. A volcano plot generated by the comparison of BFP+ with BFP- cells would be appropriate. The validation of differences by immunocytochemistry, between BFP+ and BFP-, is inconclusive. The staining is blur in the images presented in Fig. S8C. Quantification of the positive cells, without significant background signal, in both populations is required.

      We used a lenient threshold owing to the following considerations: 1) High FC does not necessarily mean biological relevance, as gene expression does not necessarily translate to protein expression. Therefore, a smaller FC value could also be biologically meaningful. 2) Balance between noise and biological differences. Any threshold was chosen arbitrarily. 3) We are identifying a trend rather than pinpointing a specific set of

      The quality was unfortunately reduced due to restrictions on file size upon submission. A high resolution Fig. S8C is available.

      (10) For the GO analyses, How did authors select 1153 genes? The previous section mentioned 287 genes unique for BFP+ cells. The Results section should include a rationale for performing a wider search for the enriched processes.

      GO enrichment using unique DEGS may not capture the wider landscape of the transcriptomic characteristics of BFP<sup>+</sup> astrocytes. The 287 unique genes were only differentially expressed in BFP<sup>+</sup> astrocytes. However, apart from these 287 genes, other genes among the 1187 DEGs were differentially expressed in BFP<sup>+</sup> astrocytes and in one other population.

      (11) For Fig. 4C and 4D, both p values and the number of genes should be indicated in the graph. I would advise to select the 10 or 15 most significant categories, these panels are very difficult to read. Whereas the listed processes for BFP+ have a relation to Parkinson disease, the ones detected for BFP- cells are related to extracellular matrix and tissue development. Does it mean that BFP+ cells have impaired formation of this matrix, or defective tissue development? This is in contradiction of enhanced calcium responses of BFP+ cells compared to BFP- cells.

      Information on all DEGs, including p values and numbers, is provided in Supplementary data 1-5.

      BFP+ astrocytes do have enrichment for GO terms related to extracellular matrix and tissue development, although not as obvious as BFP- astrocytes. Previous work have shown that both in vitro and in vivo derived astrocytes are functionally heterogeneous, containing functionally distinct subtypes exhibiting different GO enrichment profiles (doi: 10.1016/j.ygeno.2021.01.008; 10.1038/s41598-024-74732-7).

      (12) Both the comparison between midbrain and cortical astrocytes in Fig. S8A, and the volcano plot in S8B do not show consistent changes. For example, RCAN2 in Fig. S8A has the same intensity for cortical and midbrain cells, but is marked as an enriched gene in midbrain in the p vs log2FC graph in Fig. S8B.

      These are integrated analyses of published human datasets. S8A and S8B show the same data in different formats. The differences are better shown in the volcano plot/easier detected by the human eye.

      These are integrated analysis of published human datasets. S8A and S8B are the same data shown in different format. Differences are better shown in volcano plot /easier detected by the human eye. RCAN2 had a higher average expression in the midbrain than in the telencephalon, albeit small, and the difference was statistically significant (as shown in the volcano plot).


      The following is the authors’ response to the original reviews

      Reviewer 1:

      In vitro nature of this work being the fundamental weakness of this paper

      We disagree with this statement. As explained in the provisional response, the aim of this study was to test the validity of a general concept applied in pluripotent stem cell research that pluripotent stem cell-derived astrocytes faithfully represent the lineage heterogeneity of their ancestral neural progenitors and hence preserve the regionality of such progenitors. Our genetic lineage study is justified for addressing this in vitro-driven question. However, we have highlighted the rationale where appropriate in the revised paper.

      If regional identity is not maintained, so what? Don't we already know that this can happen? The authors acknowledge that this is known in the discussion.

      Importance of regional identity: Growing evidence demonstrates the functional heterogeneity of brain astrocytes in health and disease. Therefore, for in vitro disease modeling, it is believed that one should use astrocytes represent the anatomy of disease pathology; for example, midbrain astrocytes for studying dopamine neurodegeneration and Parkinson’s disease. Understanding the dynamics of stem cell-derived astrocytes and identifying astrocyte subtypes is important for their biomedical applications.

      Regional identity change/Discussion: It seems that the reviewer misunderstood the context in which the ‘identity change’ was discussed. The literature referred to (in the Discussion) concerns shifts in regional gene expression in bulk-cultured cells. In the days of pre-single-cell analysis/lineage tracking, one cannot distinguish whether this was due to a change in the transcriptomic landscape in progenies of the same lineage or alterations in lineage heterogeneity, but to interpret at face value as regional identity was not maintained. In the revised paper, we have made an effort to indicate that ‘regional identity’ is used broadly to refer to lineage relationships and/or traits rather than static gene expressioin.

      validation of the markers/additional work

      The scNAseq analysis performed in this study compared the profiles of astrocytes derived from LMX1A+ and LMX1A- ventral midbrain-patterned neural progenitors. Since it is not possible to perform genetic lineage tracking in humans and an analogous mouse lineage tracer line is not available, in vivo validation of these markers with respect to their lineage relationship is not currently feasible. However, we took advantage of abundant single-cell human astrocyte transcriptomic datasets and validated our genes in silico. We also validated the differential expression of selected markers in late BFP+ and BFP- astrocytes using immunocytochemistry, where reliable antibodies are available. The results of the additional analyses are presented in Figure S8 and Supplemental Data 5.

      Knowledge gaps concerning astrocyte development

      Reviewer 1 pointed out a number of knowledge gaps concerning astrocyte development, such as the transcriptomic landscape trajectories of midbrain floor plate cells as they progress towards astrocytes. Indeed, the limited knowledge on regional astrocyte molecule heterogeneity restricts the objective validation of in vitro-derived astrocyte subtypes and the development of novel approaches for their generation in vitro. We agree with the need for in-depth in vivo studies using model organisms, although these are beyond the scope of the current work.

      Reviewer 2:

      (1) The authors argue that the depletion of BFP seen in the unsorted population immediately after the onset of astrogenic induction is due to the growth advantage of the derivatives of the residual LMX1A- population. However, no objective data supporting this idea is provided, and one could also hypothesize that the residual LMX1A- cells could affect the overall LMX1A expression in the culture through negative paracrine regulation.

      We acknowledge the lack of evidence-based explanation for the depletion of BFP+ cells in mixed cultures. We were unable to perform additional experiments because of resource limitations. The design of the LMX1A-Cre/AAVS1-BFP lineage tracer line determines that BFP is expressed irreversibly in LMX1A-expressing cells or their derivatives regardless of their LMX1A expression status. Therefore, the potential negative paracrine regulation of LMX1A by residual LMX1A- cells should not affect cells that have already turned on BFP. We have highlighted the working principles of the LMX1A tracer line in the revised manuscript.

      (2) Furthermore, on line 124 it is stated that: "Interestingly, the sorted BFP+ cells exhibited similar population growth rate to that of unsorted cultures...". In the face of the suggested growth disadvantage of those cells, this statement needs clarification.

      To avoid confusion, we have removed the statement.

      (3) Regarding the fidelity of the model system, it is not clear to me how the TagBFP expression was detected in the BFP+ population supposedly in d87 and d136 pooled astrocytes (Fig S6C) while no LMX1A expression was observed in the same cells (Fig S6F).

      The TagBFP tracer is expressed in the progenies of LMX1A+ cells, regardless of their LMX1A expression status. We have gone through the MS text to ensure that this information has been provided.

      (4) The generated single-cell RNASeq dataset is extremely valuable. However, given the number of conditions included in this study (i.e. early vs late astrocytes, BFP+ vs BFP-, sorted vs unsorted, plus non-patterned and neuronal samples) the resulting analysis lacks detail. For instance, from a developmental perspective and to better grasp the functional significance of astrocytic heterogeneity, it would be interesting to map the identified clusters to early vs late populations and to the BFP status.

      We performed additional bioinformatics analysis, which provided independent support for the relative developmental maturity suggested by functional assays. The additional data are now provided in the revised Figure 3B, C, E.

      Moreover, although comprehensive, Figure S7 is complex to understand given that citations rather than the reference populations are depicted.

      The information provided in the revised Figure S7.

      (5) Do the authors have any consideration regarding the morphology of the astrocytes obtained in this study? None of the late astrocyte images depict a prototypical stellate morphology, which is reported in many other studies involving the generation of iPSC-derived astrocytes and which is associated with the maturity status of the cell.

      The morphology of our astrocytes was not unique to the present study. Many factors may influence the morphology of astrocytes, such as the culture media and supplements used, and maturity status. Based on the functional assays and limited GFAP expression, our astrocytes were relatively immature.

    1. eLife Assessment

      This manuscript reports valuable results on the role of MDC1 and Treacle in DSB repair in rDNA repeats. It has been previously established that MDC1 is replaced by Treacle as the main adaptor in the nucleolar DNA damage response. This work provides convincing evidence that MDC1 is required for the recruitment of RAD51 and BRCA1 to DSBs in rDNA. The work involves multiple MDC1 knockout models and establishes that RFN8-RNF168 act downstream of MDC1 in the recruitment of the HR machinery to nucleolar DSBs.

    2. Reviewer #1 (Public review):

      This study elucidates the molecular linkage between the mobilization of damaged rDNA from the nucleolus to its periphery and the subsequent repair process by HDR. The authors demonstrate that the nucleolar adaptor protein Treacle mediates rDNA mobilization, and the MDC1-RNF8-RNF168 pathway coordinates the recruitment of the BRCA1-PALB2-BRCA2 complex and RAD51 loading. This stepwise regulation appears to prevent aberrant recombination events between rDNA repeats. This work provides compelling evidence for the recruitment of the Treacle-TOPBP1-NBS1 complex to rDNA DSBs and demonstrates the critical role of MDC1 in the rDNA damage response. There are some issues with the over-interpretation of results as described subsequently. Some aspects could be strengthened, for example, a potential role of the RAP80-Abraxas axis, the origin of the repair synthesis (HDR vs. NHEJ)

    3. Reviewer #2 (Public review):

      Summary:

      DNA double-strand breaks (DSB) in repeated DNA pose a challenge for repair by homologous recombination (HR) due to the potential of generating chromosomal aberrations, especially involving repeats on different chromosomes. This conceptual caveat led to a long-held notion that HR is not active in repeated DNA, which was disproven in groundbreaking work by Chiolo showing in Drosophila that DSBs in pericentromeric repeats are mobilized to the nuclear periphery for repair by HR. A similar mechanism operates in mouse cells, as shown by the Gautier laboratory, but the mobilization goes to the nucleolar periphery, called nucleolar caps. In this manuscript, the authors reexamine the role of MDC1 in the mobilization of DSBs in rDNA in human cells. Previous work has shown that MDC1 is replaced by Treacle, the gene associated with Treacher Collins syndrome 1, in its role as the main adaptor of the DNA damage response, and these results are confirmed here. The novelty of this contribution lies in the discovery that MDC1 is required downstream in the recruitment of BRCA1 and RAD51 to nucleolar DSBs that were mobilized to the nucleolar cap. Using multiple MCD knockout models and DSBs induced by the nuclease PpoI, which cleaves at nuclear sites as well as in the 28S rDNA, convincingly documents this role of MDC1 and shows that it acts upstream of the RNF8-RNF168 ubiquitylation axis. Using a proxy assay of co-localization of EdU incorporation at DSBs (gammaH2AX), evidence is provided that MDC1 is required for HR in rDNA. MDC1 was not required for RAD51 recruitment to IR-induced foci, but it is unclear whether this is related to the different DSB chemistry (enzymatic versus IR) or to the localization of the DSB (rDNA versus unique sequence genome).

      Strengths:

      (1) The manuscript is well-written, and the experimental evidence is nicely presented.

      (2) Multiple MDC1 knockout models are used to validate the results.

      (3) Convincing back-complementation data clarify the relationship between MDC1 and RNF8.

      Weaknesses:

      (1) The recruitment of BRCA2 was not directly demonstrated. This caveat could be recognized, as IF for BRCA2 is challenging.

      (2) PpoI also induces DSBs in the non-rDNA genome. These DSBs would be an ideal control to establish nucleolar specificity of the events described and clarify whether the difference between IR and PpoI is the chemical structure of the DSB or the location of the DSB.

    1. eLife Assessment

      In this important work, the authors present a new transformer-based neural network designed to isolate and quantify higher-order epistasis in protein sequences. They provide solid evidence that higher-order epistasis can play key roles in protein function. This work will be of interest to the communities interested in modeling biological sequence data and understanding mutational effects.

    2. Reviewer #1 (Public review):

      The authors present an approach that uses the transformer architecture to model epistasis in deep mutational scanning datasets. This is an original and very interesting idea. Applying the approach to 10 datasets, they quantify the contribution of higher-order epistasis, showing that it varies quite extensively.

      Suggestions:

      (1) The approach taken is very interesting, but it is not particularly well placed in the context of recent related work. MAVE-NN, LANTERN, and MoCHI are all approaches that different labs have developed for inferring and fitting global epistasis functions to DMS datasets. MoCHI can also be used to infer multi-dimensional global epistasis (for example, folding and binding energies) and also pairwise (and higher order) specific interaction terms (see 10.1186/s13059-024-03444-y and 10.1371/journal.pcbi.1012132). It doesn't distract from the current work to better introduce these recent approaches in the introduction. A comparison of the different capabilities of the methods may also be helpful. It may also be interesting to compare the contributions to variance of 1st, 2nd, and higher-order interaction terms estimated by the Epistatic transformer and MoCHI.

      (2) https://doi.org/10.1371/journal.pcbi.1004771 is another useful reference that relates different metrics of epistasis, including the useful distinction between biochemical/background-relative and background-averaged epistasis.

      (3) Which higher-order interactions are more important? Are there any mechanistic/structural insights?

    3. Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions". There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled non-additive interaction discovery in machine learning models".

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

    4. Reviewer #3 (Public review):

      Summary:

      Sethi and Zou present a new neural network to study the importance of epistatic interactions in pairs and groups of amino acids to the function of proteins. Their new model is validated on a small simulated data set and then applied to 10 empirical data sets. Results show that epistatic interactions in groups of amino acids can be important to predict the function of a protein, especially for sequences that are not very similar to the training data.

      Strengths:

      The manuscript relies on a novel neural network architecture that makes it easy to study specifically the contribution of interactions between 2, 3, 4, or more amino acids. The study of 10 different protein families shows that there is variation among protein families.

      Weaknesses:

      The manuscript is good overall, but could have gone a bit deeper by comparing the new architecture to standard transformers, and by investigating whether differences between protein families explain some of the differences in the importance of interactions between amino acids. Finally, the GitHub repository needs some more information to be usable.

    1. eLife Assessment

      This manuscript uses simulations to describe the dynamics of the Pseudomonas-derived cephalosporinase PDC-3 β-lactamase and its mutants to better understand antibiotic resistance. The finding that clinically observed mutations alter the flexibility of the Ω- and R2-loops, reshaping the cavity of the active site, is useful to the field. However, the evidence is considered incomplete; there is a lack of description of methods, and there is a need for additional analysis to demonstrate statistical significance, visualisation of the Markov states, analysis to explain changes due to the different mutations, and possible simulations in the presence of substrates to shed direct light on modulation mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses adaptive sampling simulations to understand the impact of mutations on the specificity of the enzyme PDC-3 β-lactamase. The authors argue that mutations in the Ω-loop can expand the active site to accommodate larger substrates.

      Strengths:

      The authors simulate an array of variants and perform numerous analyses to support their conclusions.

      The use of constant pH simulations to connect structural differences with likely functional outcomes is a strength.

      Weaknesses:

      I would like to have seen more error bars on quantities reported (e.g., % populations reported in the text and Table 1).

    3. Reviewer #1 (Public review):

      Summary:

      This manuscript uses adaptive sampling simulations to understand the impact of mutations on the specificity of the enzyme PDC-3 β-lactamase. The authors argue that mutations in the Ω-loop can expand the active site to accommodate larger substrates.

      Strengths:

      The authors simulate an array of variants and perform numerous analyses to support their conclusions.

      The use of constant pH simulations to connect structural differences with likely functional outcomes is a strength.

      Weaknesses:

      I would like to have seen more error bars on quantities reported (e.g., % populations reported in the text and Table 1).

    1. eLife Assessment

      This manuscript employs cryo-EM, mutational analysis, and biochemical assays to explore the molecular basis by which glutamine promotes filamentation and regulates the activity of human glutamine synthetase (hGS) by stabilizing interactions between hGS decamers. The studies supporting this mechanism are solid, but could be improved by providing more clarity and by addressing methodological issues in the cryoEM data processing workflow. This work will be of particular interest and useful to groups interested in understanding the molecular basis of nutrient sensing, cellular metabolism, and structural regulation of enzyme activity.

    2. Reviewer #1 (Public review):

      Summary:

      The study is methodologically solid and introduces a compelling regulatory model. However, several mechanistic aspects and interpretations require clarification or additional experimental support to strengthen the conclusions.

      Strengths:

      (1) The manuscript presents a compelling structural and biochemical analysis of human glutamine synthetase, offering novel insights into product-induced filamentation.

      (2) The combination of cryo-EM, mutational analysis, and molecular dynamics provides a multifaceted view of filament assembly and enzyme regulation.

      (3) The contrast between human and E. coli GS filamentation mechanisms highlights a potentially unique mode of metabolic feedback in higher organisms.

      Weaknesses:

      (1) The mechanism underlying spontaneous di-decamer formation in the absence of glutamine is insufficiently explored and lacks quantitative biophysical validation.

      (2) Claims of decamer-only behavior in mutants rely solely on negative-stain EM and are not supported by orthogonal solution-based methods.

    3. Reviewer #2 (Public review):

      The authors set out to resolve the high-resolution structure of a glutamine synthetase (GS) decamer using cryo-EM, investigate glutamine binding at the decamer interface, and validate structural observations through biochemical assays of ATP hydrolysis linked to enzyme activity. Their work sits at the intersection of structural and functional biology, aiming to bridge atomic-level details with biological mechanisms - a goal with clear relevance to researchers studying enzyme catalysis and metabolic regulation.

      Strengths and weaknesses of methods and results:

      A key strength of the study lies in its use of cryo-EM, a technique well-suited for resolving large, dynamic macromolecular complexes like the GS decamer. The reported resolutions (down to 2.15 Å) initially suggest the potential for detailed structural insights, such as side-chain interactions and ligand density. However, several methodological limitations significantly undermine the reliability of the results:

      (1) Cryo-EM data processing: The absence of critical details about B-factor sharpening - a standard step to enhance map interpretability - is a major concern. For high-resolution maps (<3 Å), sharpening is typically applied to resolve side-chain features, yet the submitted maps (e.g., those in Figures 1D, 2D, and supplementary figures) appear unprocessed, with density quality inconsistent with the claimed resolutions. This makes it difficult to evaluate whether observed features (e.g., glutamine binding) are genuine or artifacts of unsharpened data.

      (2) Modeling and density consistency: The structural models, particularly for glutamine binding at the decamer interface, do not align with the reported resolution. The maps shown in Figure 2D and Supplementary Figure S7 lack sufficient density to confidently place glutamine or even surrounding residues, conflicting with claims of 2.15 Å resolution. Additionally, fitting a non-symmetric ligand (glutamine) into a symmetry-refined map requires justification, as symmetry constraints may distort ligand placement.

      (3) Biochemical assay controls: While the enzyme activity assays aim to link structure to function, they lack essential controls (e.g., blank reactions without GS or substrates, substrate omission tests) to confirm that ATP hydrolysis is GS-dependent. The use of TCEP, a reducing agent, is also not paired with experiments to rule out unintended effects on the PK/LDH system, further limiting confidence in activity measurements.

      Achievement of aims and support for conclusions:

      The study falls short of convincingly achieving its goals. The claimed high-resolution structural details (e.g., side-chain densities, ligand binding) are not supported by the provided maps, which lack sharpening and show inconsistencies in density quality. Similarly, the biochemical data do not robustly validate the structural claims due to missing controls. As a result, the evidence is insufficient to confirm glutamine binding at the decamer interface or the functional relevance of the observed structural features.

      Likely impact and utility:

      If these methodological gaps are addressed, the work could make a meaningful contribution to the field. A well-resolved GS decamer structure would advance understanding of enzyme assembly and ligand recognition, while validated biochemical assays would strengthen the link between structure and function. Improved data processing and clearer reporting of validation steps would also make the structural data more reliable for the community, providing a resource for future studies on GS or related enzymes.

      Additional context:

      Cryo-EM has transformed structural biology by enabling high-resolution analysis of large complexes, but its success hinges on rigorous data processing and validation steps that are critical to ensuring reproducibility. The challenges highlighted here are not unique to this study; they reflect broader issues in the field where incomplete reporting of methods can obscure the reliability of results. By addressing these points, the authors would not only strengthen their current work but also set a positive example for transparent and rigorous structural biology research.

    4. Reviewer #3 (Public review):

      In this manuscript, the authors propose a product-dependent negative-feedback mechanism of human glutamine synthetase, whereby the product glutamine facilitates filament formation, leading to reduced catalytic specificity for ammonia. Using time-resolved cryo-EM, the authors demonstrate filament formation under product-rich conditions. Multiple high-quality structures, including decameric and di-decameric assemblies, were resolved under different biochemical states and combined with MD simulations, revealing that the conformational space of the active site loop is critical for the GS catalysis. The study also includes extensive steady-state kinetic assays, supporting the view that glutamine regulates GS assembly and its catalytic activity. Overall, this is a detailed and comprehensive study. However, I would advise that a few points be addressed and clarified.

      (1) In Figure 2D and Supplementary Figure 7, the extra density observed between the two decamers does not appear to have the defining features of a glutamine. A less defined density may be expected given the nature of the complex, but even though mutagenesis assays were performed to support this assignment, none of these results constitutes direct and conclusive evidence for glutamine binding at this site. I would thus suggest showing the density maps at multiple contour thresholds to allow readers to also better evaluate the various small molecules under turnover conditions that cannot be well fitted based on this density map, helping to provide a more balanced interpretation of the results.

      (2) On the same point regarding the density for the enzyme under turnover conditions, more details should be provided about the symmetry expansion and classification performed, and also show the approximate ratio of reconstructions that include this density. Did you try symmetry expansion followed by focused classification, especially on the interface region?

      (3) The interface between the two decamers of the model needs to be double-checked and reassigned, especially for the residues surrounding the fitted glutamine. For example, the side chain of the Lys residue shown in the attached figure is most likely modeled incorrectly.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study is methodologically solid and introduces a compelling regulatory model. However, several mechanistic aspects and interpretations require clarification or additional experimental support to strengthen the conclusions.

      Strengths:

      (1) The manuscript presents a compelling structural and biochemical analysis of human glutamine synthetase, offering novel insights into product-induced filamentation.

      (2) The combination of cryo-EM, mutational analysis, and molecular dynamics provides a multifaceted view of filament assembly and enzyme regulation.

      (3) The contrast between human and E. coli GS filamentation mechanisms highlights a potentially unique mode of metabolic feedback in higher organisms.

      Weaknesses:

      (1) The mechanism underlying spontaneous di-decamer formation in the absence of glutamine is insufficiently explored and lacks quantitative biophysical validation.

      (2) Claims of decamer-only behavior in mutants rely solely on negative-stain EM and are not supported by orthogonal solution-based methods.

      We thank the reviewer for the summary and noting of the strengths. We agree that the evolutionary divergence of metabolic feedback in GS homologs is a fruitful avenue for future studies. With regard to the weaknesses, the di-decamer in the absence of glutamine only forms under high (higher than physiological) concentrations of enzyme. Our primary evidence for the mutant behavior was the lack of crosslinking (Figure 1E), with supplementary support from the negative stain. In the revised version we will soften the language to say “reduced” rather than “did not support” filament formation.

      Reviewer #2 (Public review):

      The authors set out to resolve the high-resolution structure of a glutamine synthetase (GS) decamer using cryo-EM, investigate glutamine binding at the decamer interface, and validate structural observations through biochemical assays of ATP hydrolysis linked to enzyme activity. Their work sits at the intersection of structural and functional biology, aiming to bridge atomic-level details with biological mechanisms - a goal with clear relevance to researchers studying enzyme catalysis and metabolic regulation.

      Strengths and weaknesses of methods and results:

      A key strength of the study lies in its use of cryo-EM, a technique well-suited for resolving large, dynamic macromolecular complexes like the GS decamer. The reported resolutions (down to 2.15 Å) initially suggest the potential for detailed structural insights, such as side-chain interactions and ligand density. However, several methodological limitations significantly undermine the reliability of the results:

      (1) Cryo-EM data processing: The absence of critical details about B-factor sharpening - a standard step to enhance map interpretability - is a major concern. For high-resolution maps (<3 Å), sharpening is typically applied to resolve side-chain features, yet the submitted maps (e.g., those in Figures 1D, 2D, and supplementary figures) appear unprocessed, with density quality inconsistent with the claimed resolutions. This makes it difficult to evaluate whether observed features (e.g., glutamine binding) are genuine or artifacts of unsharpened data.

      (2) Modeling and density consistency: The structural models, particularly for glutamine binding at the decamer interface, do not align with the reported resolution. The maps shown in Figure 2D and Supplementary Figure S7 lack sufficient density to confidently place glutamine or even surrounding residues, conflicting with claims of 2.15 Å resolution. Additionally, fitting a non-symmetric ligand (glutamine) into a symmetry-refined map requires justification, as symmetry constraints may distort ligand placement.

      (3) Biochemical assay controls: While the enzyme activity assays aim to link structure to function, they lack essential controls (e.g., blank reactions without GS or substrates, substrate omission tests) to confirm that ATP hydrolysis is GS-dependent. The use of TCEP, a reducing agent, is also not paired with experiments to rule out unintended effects on the PK/LDH system, further limiting confidence in activity measurements.

      Achievement of aims and support for conclusions:

      The study falls short of convincingly achieving its goals. The claimed high-resolution structural details (e.g., side-chain densities, ligand binding) are not supported by the provided maps, which lack sharpening and show inconsistencies in density quality. Similarly, the biochemical data do not robustly validate the structural claims due to missing controls. As a result, the evidence is insufficient to confirm glutamine binding at the decamer interface or the functional relevance of the observed structural features.

      Likely impact and utility:

      If these methodological gaps are addressed, the work could make a meaningful contribution to the field. A well-resolved GS decamer structure would advance understanding of enzyme assembly and ligand recognition, while validated biochemical assays would strengthen the link between structure and function. Improved data processing and clearer reporting of validation steps would also make the structural data more reliable for the community, providing a resource for future studies on GS or related enzymes.

      We disagree with the reviewer’s overall assessment.

      With regard to sharpening and resolution: we examined sharpened maps and in a revised version will present additional supplementary figures showing these maps side by side. We note that the resolutions reported are global and that the most interesting features are, of course, in the periphery and subject to conformational and compositional heterogeneity. We will include supplementary figures of core side chain densities that are more like what are expected by the reviewer in the revision. 

      With regard to modeling: the apo filament and turnover filament datasets were handled nearly identically. The additional density is therefore likely not artefactual to the symmetry operator - however, the lower resolution in this region noted by the reviewer is worthy of further exploration. The maps are public and we think this is the most plausible interpretation of the density, which we based primarily on the biochemical data and will include more speculation in the version.

      With regard to the biochemical controls: we point the reviewer to Figure S1, which shows that omission of ammonia or glutamate in the wild-type (tagless) system removes any coupling of the reactions. We will perform the additional controls to publication quality in the revised version along with the TCEP control. We note that the reducing agent is present across all experiments, ruling out an effect on any specific result. The inclusion of TCEP is also very standard in other published uses of the Coupled ATPase assay (e.g. PMID: 31778111 and PMID: 32483380 by our first author)

      Additional context:

      Cryo-EM has transformed structural biology by enabling high-resolution analysis of large complexes, but its success hinges on rigorous data processing and validation steps that are critical to ensuring reproducibility. The challenges highlighted here are not unique to this study; they reflect broader issues in the field where incomplete reporting of methods can obscure the reliability of results. By addressing these points, the authors would not only strengthen their current work but also set a positive example for transparent and rigorous structural biology research.

      All the data is public and the reviewer or anyone is free to reinterpret the maps and models - and we encourage that rather than just an interpretation of our static figures. In addition, we will upload the raw micrograph data for the apo filament and turnover filament datasets to EMPIAR prior to submitting the revision.

      Reviewer #3 (Public review):

      In this manuscript, the authors propose a product-dependent negative-feedback mechanism of human glutamine synthetase, whereby the product glutamine facilitates filament formation, leading to reduced catalytic specificity for ammonia. Using time-resolved cryo-EM, the authors demonstrate filament formation under product-rich conditions. Multiple high-quality structures, including decameric and di-decameric assemblies, were resolved under different biochemical states and combined with MD simulations, revealing that the conformational space of the active site loop is critical for the GS catalysis. The study also includes extensive steady-state kinetic assays, supporting the view that glutamine regulates GS assembly and its catalytic activity. Overall, this is a detailed and comprehensive study. However, I would advise that a few points be addressed and clarified.

      (1) In Figure 2D and Supplementary Figure 7, the extra density observed between the two decamers does not appear to have the defining features of a glutamine. A less defined density may be expected given the nature of the complex, but even though mutagenesis assays were performed to support this assignment, none of these results constitutes direct and conclusive evidence for glutamine binding at this site. I would thus suggest showing the density maps at multiple contour thresholds to allow readers to also better evaluate the various small molecules under turnover conditions that cannot be well fitted based on this density map, helping to provide a more balanced interpretation of the results.

      (2) On the same point regarding the density for the enzyme under turnover conditions, more details should be provided about the symmetry expansion and classification performed, and also show the approximate ratio of reconstructions that include this density. Did you try symmetry expansion followed by focused classification, especially on the interface region?

      (3) The interface between the two decamers of the model needs to be double-checked and reassigned, especially for the residues surrounding the fitted glutamine. For example, the side chain of the Lys residue shown in the attached figure is most likely modeled incorrectly.

      We thank the reviewer for the feedback. As noted above, we will include supplemental figures that show maps at multiple thresholds and sharpening schemes. We noted in the manuscript and above that our interpretation here is based on integrating biochemical evidence alongside the density and will make that even more clear in the revised manuscript. The filaments +/- the putative glutamine density were processed nearly identically, but we will attempt various schemes of focused classification/symmetry expansion in the revision as well. However, we point out that there is extensive averaging there that makes modeling a bit trickier than expected given the global resolution.

    1. eLife Assessment

      This valuable work explores the timely idea that aperiodic activity in human electrophysiology recordings is dynamically modulated in response to task events in a manner that may be relevant for behavioral performance. Moreover, the authors present solid evidence that, in some circumstances, these aperiodic changes might be misinterpreted as oscillatory changes.

    2. Reviewer #1 (Public review):

      Summary:

      Frelih et al. investigated both periodic and aperiodic activity in EEG during working memory tasks. In terms of periodic activity, they found post-stimulus decreases in alpha and beta activity, while in terms of aperiodic activity, they found a bi-phasic post-stimulus steepening of the power spectrum, which was weakly predictive of performance. They conclude that it is crucial to properly distinguish between aperiodic and periodic activity in event-related designs as the former could confound the latter. They also add to the growing body of research highlighting the functional relevance of aperiodic activity in the brain.

      Strengths:

      This is a well-written, timely paper that could be of interest to the field of cognitive neuroscience, especially to researchers investigating the functional role of aperiodic activity. The authors describe a well-designed study that looked at both the oscillatory and non-oscillatory aspects of brain activity during a working memory task. The analytic approach is appropriate, as a state-of-the-art toolbox is used to separate these two types of activity. The results support the basic claim of the paper that it is crucial to properly distinguish between aperiodic and periodic activity in event-related designs as the former could confound the latter. They also add to the growing body of research highlighting the functional relevance of aperiodic activity in the brain. Commendably, the authors include replications of their key findings on multiple independent data sets.

      Comments on the previous version:

      The authors have addressed several of the weaknesses I noted in my original review, specifically, they softened their claims regarding the theta findings, while simultaneously strengthening these findings with additional analyses (using simulations as well as a new measure of rhythmicity, the phase autocorrelation function, pACF). Most of the other suggested control analyses were also implemented. While I believe the fact that the participants in the main sample were not young adults could be made even more explicit, and the potential interaction between age and aperiodic changes could be unpacked a little in the discussion, the age of the sample is definitely addressed upfront.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Frelih et al, investigate the relationship between aperiodic neural activity, as measured by EEG, and working memory performance, and compares this to the more commonly analyzed periodic, and in particular theta, measures that are often associated with such tasks. To do so, they analyze a primary dataset of 57 participants engaging in an n-back task, as well as a replication dataset, and use spectral parameterization to measure periodic and aperiodic features of the data, across time. In the revision, the authors have clarified some key points, and added a series of additional analyses and controls, including the use of an additional method, that helps to complement the original analyses and further corroborates their claims. In doing so, they find both periodic and aperiodic features that relate to the task dynamics, but importantly, the aperiodic component appears to explain away what otherwise looks like theta activity in a more traditional analysis. This study therefore helps to establish that aperiodic activity is a task-relevant dynamic feature in working memory tasks and may be the underlying change in many other studies that reported 'theta' changes, but did not use methods that could differentiate periodic and aperiodic features.

      Strengths:

      Key strengths of this paper include that it addresses an important question - that of properly adjudicating which features of EEG recordings relate to working memory tasks - and in doing so provides a compelling answer, with important implications for considering prior work and contributing to understanding the neural underpinnings of working memory. The revision is improved by showing this using an additional analysis method. I do not find any significant faults or error with the design, analysis, and main interpretations as presented by this paper, and as such, find the approach taken to be a valid and well-enacted. The use of multiple variants of the working memory task, as well as a replication dataset significantly strengthens this manuscript, by demonstrating a degree of replicability and generalizability. This manuscript is also an important contribution to motivating best practices for analyzing neuro-electrophysiological data, including in relation to using baselining procedures. I think the updates in the revision have helped to clarify the findings and impact of this study.

      Weaknesses:

      Overall, I do not find any obvious weaknesses with this manuscript and it's analyses that challenge the key results and conclusions. Updates through the revision have addressed my previous points about adding some additional notes on the methods and conclusions.

    4. Reviewer #3 (Public review):

      Summary:

      Using a specparam (1/f) analysis of task-evoked activity, the authors propose that "substantial changes traditionally attributed to theta oscillations in working memory tasks are, in fact, due to shifts in the spectral slope of aperiodic activity." This is a very bold and ambitious statement, and the field of event-related EEG would benefit from more critical assessments of the role of aperiodic changes during task events. Unfortunately, the data shown here does not support the main conclusion advanced by the authors.

      Strengths:

      The field of event-related EEG would benefit from more critical assessments of the role of aperiodic changes during task events. The authors perform a number of additional control analyses, including different types of baseline correction, ERP subtraction, as well as replication of the experiment with two additional datasets.

      Comments on previous revisions:

      The authors have completed a substantial revision based on the comments from all of the reviewers. Overall, the major claims of the initial report have been profoundly tempered.

      [Editors' note: We determined that this revised version appropriately tempers some of the prior claims and addresses the concerns raised by the reviewers through two rounds of review.]

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer 1:

      We thank Reviewer 1 for the discussion on the possible causes of ERPs and their relevance for the interpretation of changes in aperiodic activity. We have changed the relevant paragraph to read as follows: For example, ERPs may reflect changes in periodic activity, such as phase resets (Makeig et al., 2002), or baseline shifts (Nikulin et al., 2007). ERPs may also capture aperiodic activity, either in the form of evoked transients triggered by an event (Shah et al., 2004) or induced changes in the ongoing background signal. This has important implications: evoked transients can alter the broadband spectrum without implying shifts in ongoing background activity, whereas induced aperiodic changes may signal different neural mechanisms, such as shifts in the excitation-inhibition balance (Gao et al., 2017).

      Reviewer 1 argued that a time point-by-time point comparison between ERPs and aperiodic parameters may not be the most appropriate approach, since aperiodic time series have lower temporal resolution than ERPs. Reviewer suggested comparing their topographies instead. We had already done this in the first version of the paper (see Fig. S7: https://elifesciences.org/reviewedpreprints/101071v1#s10). However, in the second version, we opted to use linear mixed models for each channel-time point in order to maintain consistency with the other analyses in the paper (e.g. the comparison between FOOOF parameters and baseline-corrected power).

      Nevertheless, we repeated the topographic correlations as in the first version, and the results are shown below. Correlations were computed for each time point, subject and condition, and then averaged across these dimensions for visualisation. The pattern differs from that of the linear mixedmodel results (see Fig. S14), with notable correlations appearing after ~0.5 s for the exponent and after ~1.0 s for the offset. Still, the correlations remain low, suggesting that aperiodic parameters and ERPs encode different information (at least in this dataset).

      Author response image 1.<br />

      Additionally, to control for the effect of smearing we have performed the same linear mixed model analysis as in Fig. S14 on low-pass filtered ERPs (with cut-off 10 Hz), and the results were largely similar as in Fig. S14.

      Reviewer 1 discussed two possible explanations for the observed correlations between baselinecorrected power and FOOOF parameters (Figure 4): “The correlation between the exponent and lowfrequency activity could be of either direction: low frequency power changes could reflect 1/f shifts, or exponent estimates might be biased by undetected delta/theta activity. I think that one other piece of evidence /…/ to intuitively highlight why the latter is more likely is the /…/ decrease at high ("transbeta") frequencies, which suggests a rotational shift /../.” We agree with the interpretation that lowfrequency power changes in our data primarily reflect 1/f shifts. However, we are uncertain about the reviewer’s statement that the “latter” explanation (i.e., bias in exponent estimates due to delta/theta activity) is more likely. Given the context, we believe the reviewer may have intended to say the “former” explanation is more likely.

      We agree with the reviewers' observation that rhythmicity, as estimated using the pACF, can be independent of power (Myrov et al., 2024, Fig. 1). However, it seems that in real (non-simulated) datasets, the pACF and power spectral density (PSD) are often moderately correlated (e.g. Myrov et al., 2024, Fig. 5).

      Reviewer 1 asked whether we had examined aperiodic changes in the data before and after subtracting the response-locked ERPs. We did not carry out this extra analysis as, as the reviewer suggests, it would have been excessive – the current version of the paper already contains more than 60 figures. As mentioned in the manuscript, we acknowledge the possibility that response-locked ERPs contribute to the second aperiodic component. However, due to the weak correlation between reaction times and aperiodic activity, the presence of both components throughout the entire epoch (in at least the first and third datasets) and the distinct differences between the ERPs and the aperiodic activity in the different conditions (see Fig. 8 vs. Fig. S13), we cannot conclusively determine whether the second aperiodic component is directly related to motor responses. Finally, we agree with the reviewer that the distribution of the response-locked ERP more closely resembles the frontocentral (earlier) aperiodic component than the later post-response component. We have amended the relevant paragraph in the Discussion to include these observations. ”While it is possible that response-related ERPs contributed to the second aperiodic component, several observations suggest otherwise: both aperiodic components were present throughout the entire epoch, differences between conditions diverged between ERPs and aperiodic activity (compare Figure 8 and Figure S16), and the associations with reaction times were weak. Moreover, the distribution of the response-locked ERP qualitatively resembled the earlier frontocentral aperiodic component more than the later post-response component. Taken together, these findings suggest that ERPs and aperiodic activity capture distinct aspects of neural processing, rather than reflecting the same underlying phenomenon.”

      We agree with Reviewer 1 that our introduction of aperiodic activity was abrupt, and that the term 'aperiodic exponent' required definition. We have now defined it as the spectral steepness in log–log space (i.e. the slope), and have added a brief explanatory sentence to the introduction.

      Reviewer 1 noted that the phrase 'task-related changes in overall power' could be misinterpreted as referring to total (broadband) power, and recommended that we specify a frequency range. We agree, so we have replaced 'overall power' with 'spectral power within a defined frequency range'.

      We agree with Reviewer 1 that the way we worded things in the Discussion section regarding alpha activity and inhibitory processes was awkward and could easily be misread. We have rephrased the sentences and added a brief explanation to avoid implying a direct link between alpha attenuation and neural inhibition.

      Furthermore, based on the reviewer’s suggestion, we added a brief comment in the Discussion section (Theoretical and methodological implications) on theoretical perspectives regarding the interaction between age and aperiodic activity.

      Reviewer 1 suggested including condition as a fixed effect in order to examine whether the relationship between FOOOF parameters and baseline-corrected power is modulated by condition. Specifically, the reviewer proposed changing our model from

      baseline_corrected_power ~ 1 + fooof_parameter + (1|modality) + (1|nback) + (1|stimulus) + (1|subject)

      to

      baseline_corrected_power ~ 1 + fooof_parameter + modality*nback *stimulus + (1|subject)

      While we appreciate this suggestion, we believe that including design variables as fixed effects would confound the interpretation of (marginal) R² as a measure of the association between FOOOF parameters and baseline-corrected power. Our primary question in this analysis was about the fundamental relationship between these measures, not how experimental conditions moderate this relationship.

      To address the reviewer's concern regarding condition-specific effects, we conducted separate analyses for each condition using a simpler model:

      baseline_corrected_power ~ 1 + fooof_parameter + (1|subject)

      The results (now included in the Supplement, Fig. S4–S6) show generally smaller effect sizes compared to our original random-effects model, with notable differences between conditions. The 2-back conditions, particularly the non-target trials, exhibited the weakest associations. Despite these differences, the overall patterns remained consistent with our original findings: exponent and offset exhibited positive associations at low frequencies (delta, theta) and negative associations at higher frequencies (beta, low gamma), while periodic activity correlated substantially with baselinecorrected power in the alpha, beta, and gamma ranges.

      However, this condition-specific approach has important limitations. With only 47 subjects per condition, the statistical power is insufficient for stable correlation estimates (Schönbrodt & Perugini, 2013; https://doi.org/10.1016/j.jrp.2013.05.009). This likely explains why the effects are smaller and less stable effects than in our original model, which uses the full dataset's power while appropriately accounting for condition-related variance through random effects. Since these additional analyses do not alter our primary conclusions, we have included them in the Supplement for completeness and made a minor change in the Discussion section.

      Reviewer 1 asked what channels are lines on Figure 9 based on. As stated in the Methods section, “We fitted models in a mass univariate manner, that is for each channel, frequency (where applicable), and time point separately. /…/ For the purposes of visualisation, p-values were averaged across channels (for heatmaps or lines) or across time (for topographies).” Therefore, the lines and heatmaps apply to all channels.

      Reviewer 2:

      We would like to thank reviewer 2 for their detailed explanation of the expected behaviour of the specparam algorithm. We have added the following explanation to the Methods section:

      Importantly, as noted by the reviewer, this behaviour reflects an explicit design choice of the algorithm: to avoid overfitting ambiguous peaks at the edges of the spectrum, FOOOF excludes peaks that are too close to the boundaries. This exclusion is controlled by the _bw_std_edge parameter, which defines the distance that a peak must be from the edge in order to be retained (in units of standard deviation; set to 1.0 by default). Therefore, although the algorithm is functioning as intended, users should be careful when interpreting aperiodic parameters in datasets where lowfrequency oscillatory activity might be expected.

      In line with the reviewer’s suggestion we have added a version of specparam to the paper.

      We thank reviewer 2 for pointing out two studies that used a time-resolved approach to spectral parameterisation. We have updated the text accordingly:

      Although a similar approach has been used to track temporal dynamics in sleep and resting state (e.g., Wilson et al., 2022; Ameen et al., 2024), as well as in task-based contexts (e.g., Barrie et al., 1996; Preston et al., 2025), its specific application to working memory paradigms remains underexplored.

      Reviewer 3:

      Reviewer 3 notes that the revised manuscript feels less intriguing than the original version. While we understand this concern, we believe this difference arises from a misalignment in expectations regarding the scope and purpose of our study. We think the reviewer is interpreting our work as focusing on whether theta activity is elicited in a paradigm that reliably produces theta oscillations. In contrast, our study is framed around a working memory task in which, based on prior literature, we expected to observe theta activity but instead found an absence of theta spectral peaks in almost all participants. Note that the absence of theta is already noteworthy in itself, given that theta oscillations are believed to play a crucial role in working memory.

      Importantly, Van Engen et al. (2024) have recently reported similar findings:

      ”While we did not observe load-dependent aperiodic changes over the frontal midline, we did reveal the possibility that previous frontal midline theta results that do not correct for aperiodic activity likely do not reflect theta oscillations. /…/ While our results do not invalidate previous research into extracranial theta oscillations in relation to WM, they challenge popular and widely held beliefs regarding the mechanistic role for theta oscillations to group or segregate channels of information”.

      From this perspective, we maintain that the following statements are still justified:

      “substantial portion of the changes often attributed to theta oscillations in working memory tasks may be influenced by shifts in the spectral slope of aperiodic activity”

      "Note that although no prominent oscillatory peak in the theta range was observed at the group level, and some of this activity could potentially fall within the delta range, similar lowfrequency patterns have often been referred to as 'theta' in previous work, even in the absence of a clear spectral peak"

      These formulations are intended to emphasize existing interpretations of changes in low-frequency power as theta oscillations in related research.

      Next, Reviewer 3 pointed out that “spectral reflection (peak?) in spectral power plot does not imply that an event is repeating (i..e. oscillatory).” We agree with the reviewer that not every spectral peak implies a true oscillation. To address this, we complemented the power analyses with a measure of rhythmicity (phase autocorrelation function, pACF) after the first round of reviews, and the pACF results were largely similar to those for periodic activity. These results suggest that, in our case, periodic activity is indeed largely oscillatory.

      However, we do agree with the reviewer that the term “oscillatory” is not interchangeable with “periodic”. To address this, we reviewed the paper for all appearances of “oscillations”, “oscillatory” and related terms, and replaced them with “power”, “spectral” or “periodic activity” where appropriate (all changes are marked in red in the latest version of the manuscript).

      Examples of corrections:

      Changes in aperiodic activity appear as low-frequency oscillations in baseline-corrected time-frequency plots à low-frequency power

      “The periodic component includes only the parameterised oscillatory peak” à spectral peak

      “FOOOF decomposition may miss low-frequency oscillations near the edges of the spectrum” à low-frequency peaks

      We disagree with the reviewer’s assertion that the subtitle “Aperiodic parameters are largely independent of oscillatory activity” is misleading for a methods oriented paper. Namely, the full subtitle is “Rhythmicity analysis reveals aperiodic parameters are largely independent of oscillatory activity”. Since rhythmicity is a phase-based measure that requires repeating dynamics and is therefore indicative of oscillations, we believe this phrasing is technically accurate.

      Finally, we would like to emphasise our contribution once again. Our analyses of rhythmicity, spectrally parameterised power, and baseline-corrected power offer different perspectives on the data. Each of these analyses may lead to different interpretations, but performing all of them on the same data provides a more comprehensive insight into what is actually going on in the data.

      Our findings demonstrate that conclusions drawn from a single analytical approach may be incomplete or misleading. For example, as we discuss in the paper, many studies examine thetagamma coupling in scalp EEG during n-back tasks without first establishing whether theta activity genuinely oscillates (e.g. Rajji et al., 2016). The absence of true theta oscillations would undermine the validity of such analyses. Our multifaceted approach provides researchers with a systematic framework for validating oscillatory assumptions before proceeding with more complex analyses.

    1. eLife Assessment

      To evaluate phenotypic correlations between complex traits, this study aimed to measure the genetic overlap of traits by evaluating GWAS signals assisted by eQTL signals. They suggested an improved version of the previous Sherlock to integrate SNP-level signals into gene-level signals. Then they compared 59 human traits to identify known and novel genetic distance relationships. This work is valuable to the field, but still needs substantial improvement because many parts of the paper are incomplete.

    2. Reviewer #1 (Public review):

      The authors tried to quantify the difference between human complex traits by calculating genetic overlap scores between a pair of traits. Sherlock-II was devised to integrate GWAS with eQTL signals. The authors claim that Sherlock-II is superior to the previous version (robustness, accuracy, etc). It appears that their framework provides a reasonable solution to this important question, although the study needs further clarification and improvements.

      (1) Sherlock-II incorporates GWAS and eQTL signals to better quantify genetic signals for a given complex trait. However, this approach is based on the hypothesis that "all GWAS signals confer association to complex trait via eQTL", which is not true (PMID: 37857933). This should be acknowledged (through mentioning in the text) and incorporated into the current setup (through differential analysis - for example, with or without eQTL signals, or with strong colocalization only).

      (2) When incorporating eQTL, why did the authors use the top p-value tissues for eQTL? This approach seems simpler and probably more robust. But many eQTLs are tissue-specific. Therefore, it would also be important to know if eQTLS from appropriate tissues were incorporated instead.

      (3) One of the main examples is the novel association between Alzheimer's disease and breast cancer. Although the authors provided a molecular clue underlying the association, it is still hard to comprehend the association easily, as the two diseases are generally known to be exclusive to each other. This is probably because breast cancer GWAS is performed for germline variants and does not consider the contribution of somatic variants.

      (4) It would help readers understand the story better if a summary figure of the entire process were provided. The current Figure 1 does not fulfil that role.

      (5) Figure 2 is not very informative. The readers would want to know more quantitative information rather than a heatmap-style display. Is there directionality to the relationship, or is it always unidirectional?

      (6) In Figure 3, readers may want to know more specific information. For example, what gene signals are really driving the hypoxia signal in Alzheimer's disease vs breast cancer? And what SNP signals are driving these gene-level signals?

    3. Reviewer #2 (Public review):

      Summary:

      The authors introduce a gene-level framework to detect shared genetic architecture between complex traits by integrating GWAS summary statistics with eQTL data via a new algorithm, Sherlock-II, which aggregates signals from multiple (cis/trans) eSNPs to produce gene-phenotype p-values. Shared pathways are identified with Partial-Pearson-Correlation Analysis (PPCA).

      Strengths:

      The authors show the gene-based approach is complementary and often more sensitive than SNP-level methods, and discuss limitations (in terms of no directionality, dependence on eQTL coverage).

      Weaknesses:

      (1) How do the authors explain data where missing tissues or sparse eQTL mapping are available? Would that bias as to which genes/traits can be linked and may produce false negatives or tissue-specific false positives?

      (2) Aggregating SNP-level signals into gene scores can be confounded by LD; for example, a nearby causal variant for a different gene or non-expression mechanism may drive a gene's score, producing spurious gene-trait links. How do the authors prevent this?

      (3) How the SNPs are assigned to genes would affect results, this is because different choices can change which genes appear shared between traits. The authors can expand on these.

      (4) Many reported novel trait links remain speculative without functional or orthogonal validation (e.g., colocalization, perturbation data). Thus, the manuscript's claims are inconclusive and speculative.

      (5) It would be best to run LD-aware colocalization and power-matched simulations to check for robustness.

    4. Author response:

      Reviewer #1 (Public review):

      The authors tried to quantify the difference between human complex traits by calculating genetic overlap scores between a pair of traits. Sherlock-II was devised to integrate GWAS with eQTL signals. The authors claim that Sherlock-II is superior to the previous version (robustness, accuracy, etc). It appears that their framework provides a reasonable solution to this important question, although the study needs further clarification and improvements.

      (1) Sherlock-II incorporates GWAS and eQTL signals to better quantify genetic signals for a given complex trait. However, this approach is based on the hypothesis that "all GWAS signals confer association to complex trait via eQTL", which is not true (PMID: 37857933). This should be acknowledged (through mentioning in the text) and incorporated into the current setup (through differential analysis - for example, with or without eQTL signals, or with strong colocalization only). 

      The reviewer is correct that in this version of the tool, we focused on SNPs with effect on gene expression, as the majority of the SNPs identified by GWASs are non-coding SNPs. In the future improvement, we should also include coding SNPs that change the amino acid sequence of genes. We will discuss this point more in the revised manuscript.

      (2) When incorporating eQTL, why did the authors use the top p-value tissues for eQTL? This approach seems simpler and probably more robust. But many eQTLs are tissue-specific. Therefore, it would also be important to know if eQTLS from appropriate tissues were incorporated instead. 

      This is a simple scheme to incorporate eQTL data from multiple tissues, assuming that the tissue that gives the strongest association is most relevant, or mainly mediates the effect from the SNP to the phenotype. This is a reasonable approach given that the tissues of origin for most of the phenotypes are unknown. In the future improvement, we should incorporate eQTL data from the appropriate tissue(s) if that is known.

      (3) One of the main examples is the novel association between Alzheimer's disease and breast cancer. Although the authors provided a molecular clue underlying the association, it is still hard to comprehend the association easily, as the two diseases are generally known to be exclusive to each other. This is probably because breast cancer GWAS is performed for germline variants and does not consider the contribution of somatic variants. 

      This is due to one of the limitations of the current algorithm: no direction of association is predicted explicitly. It could be that increasing the expression of a gene reduced the risk of one disease but increase the risk of another. Currently we have to analyze the details of the SNPs to infer direction once overlapping genes are found. This needs improvement in the future.  

      (4) It would help readers understand the story better if a summary figure of the entire process were provided. The current Figure 1 does not fulfil that role. 

      We plan to incorporate reviewer's suggestion in the revised manuscript.

      (5) Figure 2 is not very informative. The readers would want to know more quantitative information rather than a heatmap-style display. Is there directionality to the relationship, or is it always unidirectional? 

      We will consider a different presentation in the revised manuscript.

      (6) In Figure 3, readers may want to know more specific information. For example, what gene signals are really driving the hypoxia signal in Alzheimer's disease vs breast cancer? And what SNP signals are driving these gene-level signals? 

      We will add these information in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors introduce a gene-level framework to detect shared genetic architecture between complex traits by integrating GWAS summary statistics with eQTL data via a new algorithm, Sherlock-II, which aggregates signals from multiple (cis/trans) eSNPs to produce gene-phenotype p-values. Shared pathways are identified with Partial-Pearson-Correlation Analysis (PPCA).

      Strengths:

      The authors show the gene-based approach is complementary and often more sensitive than SNP-level methods, and discuss limitations (in terms of no directionality, dependence on eQTL coverage).

      Weaknesses:

      (1) How do the authors explain data where missing tissues or sparse eQTL mapping are available? Would that bias as to which genes/traits can be linked and may produce false negatives or tissue-specific false positives?

      Missing tissues or sparse eQTL certainly can produce false negatives as the signals linking the two phenotypes are simply not captured in the data. It is less likely to produce false positives as long as the statistical test is well controlled.   

      (2) Aggregating SNP-level signals into gene scores can be confounded by LD; for example, a nearby causal variant for a different gene or non-expression mechanism may drive a gene's score, producing spurious gene-trait links. How do the authors prevent this? 

      When there are multiple SNPs in LD with multiple genes nearby, it is generally difficult to map the causal SNP and the causal gene it affected, and thus there will be spurious gene-trait links. When we calculate the global similarity based on the gene-trait association profiles,  we tried to control this by simulating with random GWASs that have the same power as the real GWAS and preserve the LD structure, as the spurious links will also be present in the simulated data (but may appear in different loci) that are used to calibrate the statistical significance. 

      (3) How the SNPs are assigned to genes would affect results, this is because different choices can change which genes appear shared between traits. The authors can expand on these. 

      We assign SNPs to genes based on their strongest eQTL association from the available data. Improvement can be made if the relevant tissues for a trait are known (see response to Reviewer 1 above).

      (4) Many reported novel trait links remain speculative without functional or orthogonal validation (e.g., colocalization, perturbation data). Thus, the manuscript's claims are inconclusive and speculative. 

      We agree with the reviewer that the reported trait links are speculative, and they should be treated as hypotheses generated from the computational analyses. To truly validate some of these proposed relationships, deeper functional analyses and experimental tests are needed.

      (5) It would be best to run LD-aware colocalization and power-matched simulations to check for robustness. 

      We agree more control on LD and power-matched simulations will be important for testing the robustness of the predictions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      (5) Figure 2 appears very complex and broad.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

      This review focuses on the whole extravasation journey of leukocyte and highlights involvement of extracellular matrix (ECM) in multiple phases of the process. ECM may exert their roles either as a collective structure or as individual components. In the revision, for those functions involving specific matrix components, we will emphasize the matrix components and incorporate this information to subheadings as suggested. The parts of macrophage phenotype (Section 10-11) are included for its pivotal roles on deciding the tissue fate following inflammation (ie. to resolve / to regenerate damages incurred or to sustain inflammation), which is an important aspect of this review. ECM could modify macrophage phenotypes either directly (section 10) or indirectly via modulations of tissue stiffness or other cell types like fibroblasts (section 9). However, as pointed out by other reviewers as well, we acknowledge that Section 11 does not integrate well enough to the rest of the review. We plan to reorganize this part and to emphasize its link to ECM during the revision for better integration. We will reformat Table 1 for easier comprehension. We will consider restructuring Figure 2, which outlines various events influencing tissue decision of resolution/inflammation, perhaps by breaking up into two separate figures, to better focus the message. We will also check the language to improve readability.

      Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      (4) Limited discussion of translational implications and therapeutic strategies.

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

      We will add a transition paragraph between Section 6 and Section 7 to provide a narrative that the extravasation processes affect downstream leukocyte functions. While lymphocytes follow a similar extravasation principle, their in-tissue activities differ from innate leukocytes. We will thus include discussion of lymphocyte-ECM crosstalk to Section 8 and/or 9 in the revision. We will restructure Section 11 and Figure 3 to better integrate to the rest of the review: In the current manuscript, we merely describe the capability of the MIKA framework to describe identity of any tissue macrophages and thus the framework could serve as a roadmap to facilitate identity normalization of pathological macrophages. We plan, in the revision, by employing the MIKA framework, to discuss and demonstrate linkage between macrophage identities and expression/production of modulators to functional ECM effectors described in Section 8-9. Regarding the comment of limited discussion of translational implications / therapeutic strategies, we will try to enrich this aspect throughout the manuscript where appropriate, in addition to the existing ones (eg. line 293-297; 388-391; 460-463; 512-517) We will also revise figure structure in general to avoid too dense information and to improve clarity. We will consider to provide a glossary explaining specialized terms to expand readership accessibility.

      Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development. Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

      We agree with and appreciate the specific and helpful suggestions by the reviewer. During the revision, we will provide the requested background description of MIKA to enhance accessibility of general readership. As pointed out by other reviewers, since this part (Section 11) is less well-integrated to the rest of the review, we will restructure this part by linking tissue macrophage identities under MIKA framework to modulation of functional ECM effectors described in previous sections (Section 8-9). We acknowledge the current figure organization might be overly information-dense and will consider breaking down the contents to multiple figures. The size and color-coding issues will also be addressed.

    2. eLife Assessment

      This Review Article takes an original angle and covers several aspects of the leukocytes extravasation process with a focus on the role of ECM proteins. It is a timely piece with an original viewpoint. The current manuscript would benefit from improvement in writing and organization.

    3. Reviewer #1 (Public review):

      Summary:

      In this review, the author covered several aspects of the inflammation response, mainly focusing on the mechanisms controlling leukocyte extravasation and inflammation resolution.

      Strengths:

      This review is based on an impressive number of sources, trying to comprehensively present a very broad and complex topic.

      Weaknesses:

      (1) This reviewer feels that, despite the title, this review is quite broad and not centred on the role of the extracellular matrix.

      (2) The review will benefit from a stronger focus on the specific roles of matrix components and dynamics, with more informative subheadings.

      (3) The macrophage phenotype section doesn't seem well integrated with the rest of the review (and is not linked to the ECM).

      (4) Table 1 is difficult to follow. It could be reformatted to facilitate reading and understanding

      (5) Figure 2 appears very complex and broad.

      (6) Spelling and grammar should be thoroughly checked to improve the readability.

    4. Reviewer #2 (Public review):

      Summary:

      The manuscript is a timely and comprehensive review of how the extracellular matrix (ECM), particularly the vascular basement membrane, regulates leukocyte extravasation, migration, and downstream immune function. It integrates molecular, mechanical, and spatial aspects of ECM biology in the context of inflammation, drawing from recent advances. The framing of ECM as an active instructor of immune cell fate is a conceptual strength.

      Strengths:

      (1) Comprehensive synthesis of ECM functions across leukocyte extravasation and post-transmigration activity.

      (2) Incorporation of recent high-impact findings alongside classical literature.

      (3) Conceptually novel framing of ECM as an active regulator of immune function.

      (4) Effective integration of molecular, mechanical, and spatial perspectives.

      Weaknesses:

      (1) Insufficient narrative linkage between the vascular phase (Sections 2-6) and the in-tissue phase (Sections 7-10).

      (2) Underrepresentation of lymphocyte biology despite mention in early sections.

      (3) The MIKA macrophage identity framework is only loosely tied to ECM mechanisms.

      (4) Limited discussion of translational implications and therapeutic strategies.

      (5) Overly dense figure insets and underdeveloped links between ECM carryover and downstream immune phenotypes.

      (6) Acronyms and some mechanistic details may limit accessibility for a broader readership.

    5. Reviewer #3 (Public review):

      Summary & Strengths:

      This review by Yu-Tung Li sheds new light on the processes involved in leukocyte extravasation, with a focus on the interaction between leukocytes and the extracellular matrix. In doing so, it presents a fresh perspective on the topic of leukocyte extravasation, which has been extensively covered in numerous excellent reviews. Notably, the role of the extracellular matrix in leukocyte extravasation has received relatively little attention until recently, with a few exceptions, such as a study focusing on the central nervous system (J Inflamm 21, 53 (2024) doi.org/10.1186/s12950-024-00426-6) and another on transmigration hotspots (J Cell Sci (2025) 138 (11): jcs263862 doi.org/10.1242/jcs.263862). This review synthesizes the substantial knowledge accumulated over the past two decades in a novel and compelling manner.

      The author dedicates two sections to discussing the relevant barriers, namely, endothelial cell-cell junctions and the basement membrane. The following three paragraphs address how leukocytes interact with and transmigrate through endothelial junctions, the mechanisms supporting extravasation, and how minimal plasma leakage is achieved during this process. The subsequent question of whether the extravasation process affects leukocyte differentiation and properties is original and thought-provoking, having received limited consideration thus far. The consequences of the interaction between leukocytes and the extracellular matrix, particularly regarding efferocytosis, macrophage polarization, and the outcome of inflammation, are explored in the subsequent three chapters. The review concludes by examining tissue-specific states of macrophage identity.

      Weaknesses:

      Firstly, the first ten sections provide a comprehensive overview of the topic, presenting logical and well-formulated arguments that are easily accessible to a general audience. In stark contrast, the final section (Chapter 11) fails to connect coherently with the preceding review and is nearly incomprehensible without prior knowledge of the author's recent publication in Cell. Mol. Life Sci. CMLS 772 82, 14 (2024). This chapter requires significantly more background information for the general reader, including an introduction to the Macrophage Identity Kinetics Archive (MIKA), which is not even introduced in this review, its basis (meta-analysis of published scRNA-seq data), its significance (identification of major populations), and the reasons behind the revision of the proposed macrophage states and their further development. Secondly, while the attempt to integrate a vast amount of information into fewer figures is commendable, it results in figures that resemble a complex puzzle. The author may consider increasing the number of figures and providing additional, larger "zoom-in" panels, particularly for the topics of clot formation at transmigration hotspots and the interaction between ECM/ECM fragments and integrins. Specifically, the color coding (purple for leukocyte α6-integrins, blue for interacting laminins, also blue for EC α6 integrins, and red for interacting 5-1-1 laminins) is confusing, and the structures are small and difficult to recognize.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions, causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours of hypoxia. Bulk and scRNA-seq show adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2, confirmed atthe protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype, and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA-CREB signalling mediating the effect of ADM addition, which also leads to up-regulation of GABAreceptors. Taken together, this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view, the study is of great interest.

      Strengths:

      The strengths of the study are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that other genes regulated upon hypoxia are not confirmed, such that readers will not know until which fold change/stats cut-off data are reliable.

      Reviewer #2 (Public review):

      Summary

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids, and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM.

      Strengths:

      The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies. The authors use sufficient iPSC lines including both XX and XY, so the analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of validation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall, this is a very nice manuscript.

      Weaknesses:

      I have a few comments and suggestions for the authors. See below.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their  conclusions. The work has a significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform - particularly the combination of assembloids and live imaging - will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Weaknesses:

      The main weakness of the study lies in the extent to which forebrain assembloids

      recapitulate in vivo conditions, as the migration of interneurons from hSO to hCO does not fully reflect the native environment or migratory context of these cells. Nevertheless, this limitation is tempered by the fact that the work provides the first direct observation of human interneuron migration under hypoxia, representing a major advance for the field. In addition, while the transcriptomic analyses are valuable and highlight promising candidates, more in-depth exploration will be needed to fully elucidate the molecular mechanisms governing neuronal migration and maturation under hypoxic conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors should examine if all cortical interneurons are affected by ADM or only subtypes (Parvalbumin/Somatostatin).

      We thank the reviewer for raising this important question. In our study, we utilized the Dlx1/2b::eGFP reporter to broadly label cortical interneurons; however, this system does not distinguish specific interneuron subtypes. To address this, in the revised version of the manuscript we will use the single-cell RNA sequencing data and immunostainings to provide this information. Based on previous analyses from Birey et al (Cell Stem Cell, 2022), we expect interneurons within assembloids to express mostly calbindin (CALB2) and somatostatin (SST) at this in vitro stage of development; parvalbumin subtype appears later based on data from Birey et al (Nature, 2017) and more recently from Varela et al, (bioRxiv, 2025).

      In parallel, we will analyze available scRNA-seq data from developing human primary brain tissue a similar age as the one used in the manuscript, and check whether these subtypes of interneurons are similar to the ones within assembloids.

      (2) The authors should test more candidates from their bulk RNA-seq data with different fold changes for regulation after hypoxia, to allow the reader to judge at which cut-off the DEGs may be reproducible. This would make this database much more valuable for the field of hypoxia research.

      We appreciate the reviewers’ thoughtful suggestion. In addition to the bulk RNA-seq analysis, we did validate several upregulated hypoxia-responsive genes with varying fold changes by qPCR; these include PDK1, PFKP, VEGFA (Figure S1). 

      We go agree that in-depth investigation of specific cut-offs would be interesting, however, this could be the focus of a different manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      We thank the reviewer this important comment about inflammation. Indeed, hypoxia has been shown to activate the inflammatory response pathways. In various studies, it was found that HIF-1a can interact with NF-κB signaling, leading to the upregulation of pro-inflammatory cytokines such as IL-1β, IL-6, and TNF-α (Rius et al., Cell, 2008; Hagberg et al., Nat Rev Neurol, 2015).

      In our transcriptomics data (Figure 2D), and to the reviewers’ point, we identified enrichment of inflammatory signaling response following the hypoxic exposure. Since hSO at the time of analyses do contain astrocytes, we think these glia contribute to the observed pro-inflammatory changes. Based on these results and because ADM is known to have strong anti-inflammatory properties, the effects of ADM on hypoxic astrocytes should be investigated in future studies focused on hypoxia-induced inflammation. In the revision, we will address this comment in the discussion section and cite the appropriate papers.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      Based on our scRNA-seq data in hSOs showing significant upregulation of ADM expression in astrocytes and progenitors, we speculate that the primary mechanism is likely to involve paracrine interactions. However, we cannot exclude autocrine mechanisms with the included experiments. Dissecting these interactions in a cell-type specific manner could be an important focus for future ADM-related studies.

      To address the question about the possible different mechanisms in ventral versus dorsal areas, in the revision we will plot and include in the figures the data about the cell-type expression of ADM and its receptors in hCOs.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      We thank the reviewer for this comment and the observation. Although we did not include a traditional positive control in these ELISA assays, several lines of evidence indicate that the measurements are reliable. First, the standard curves behaved as expected, and all sample values fell within the assay’s dynamic range. Second, technical replicates showed low variability, and the observed changes across experimental conditions (e.g., hypoxia vs. control) were consistent with the expected biological responses based on previous literature. We agree that including western blot validation would strengthen the findings, and we will note this for our future studies focused on CREB and ADM.

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      We appreciate the reviewers’ insightful question. Currently, not much is known about the molecular pathways and downstream cellular events triggered by ADM binding to RAMP2 in inhibitory neurons, and in general in brain cells. The data from our study brings the first information about the cell-type specific expression of ADM in baseline and hypoxic conditions and is one of the key novelties of our study.

      While the signaling landscape of ADM in interneurons is largely unexplored, several studies in other (non-brain) cell types have demonstrated that ADM binding to RAMP2 can activate downstream cascades such as the cAMP/PKA/CREB pathway, PI3K/AKT, and ERK/MAPK, all of which are also known to be critical regulators of neuronal development and survival. These previously published data along with our CREB-targeted findings in hypoxic interneurons, suggest ADM–RAMP2 signaling could influence multiple aspects of interneuron biology, but these remain to be evaluated in future studies.

      We agree with the reviewer that CREB has a wide range of transcriptional targets. We decided to focus on GABA as a target of CREB for two main reasons, including: (i) GABA signaling has been previously shown to play an important role in the migration of cortical interneurons, and (ii) a previous study by Birey et al. (Cell Stem Cell, 2022) demonstrated that CREB pathway activity is essential for regulating interneuron migration in assembloid models of Timothy Syndrom, thus further providing evidence that dysregulation of CREB activity disrupts migration dynamics.

      While our study provides a first step toward uncovering the mechanisms of interneuron migration protection by ADM, we fully acknowledge that future work will be needed to delineate the full spectrum of ADM–RAMP2 downstream signaling events in inhibitory neurons and other brain cells.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      We appreciate this question from the reviewer; however, this was not something that we focused on in this manuscript due to the already large amount of data included. A separate study focusing on neurogenesis defects and the molecular mechanisms of injury for that specific developmental process would be an important next step.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

      We thank the Reviewer for the suggestion of detailing the functional impact of reduced inhibitory neuron migration. We will revise the manuscript by incorporating a paragraph about this in the Discussion section.

      Reviewer #3 (Recommendations for the authors):

      Most of the evidence presented is convincing in supporting the conclusions, and I have only minor suggestions for improvement:

      (1) The bulk RNA-seq was performed in hSOs only, which may not fully capture the phenotypes of migrating or migrated interneurons. It would be valuable, if feasible, to sort migrated cells from hSO-hCO assembloids and specifically examine their molecular mediators.

      We thank the reviewer for this suggestion. While it is likely that the cellular environment will have some influence on a subset of the molecular changes, based on all the data from the manuscript and our specific target, the RNA-sequencing on hCOs was sufficient to capture essential changes like ADM upregulation. The in-depth exploration on differential responses of migrated versus non-migrated interneurons to hypoxia could be the focus of a different project.

      (2) In Figure 3, it is striking that cell-type heterogeneity dominates over hypoxia vs. control conditions. A joint embedding of hSO and hCO cells could provide further insight into molecular differences between migrated and non-migrated interneurons.

      We thank the reviewer for this observation and opportunity to clarify. Since we manually separated the assembloids before the analyses, we processed these samples separately. That is why they separate like this. In the revision, we will add data about ADM expression and its receptors’ expression in the hCOs.

      (3) It would be helpful to expand the discussion on how closely the migration observed in hSO-hCO assembloids reflects in vivo conditions, and what environmental aspects are absent from this model. This would better frame the interpretation and translational relevance of the findings.

      We thank the Reviewer for bringing up this important point. Although the assembloid model offers the unique advantage of allowing the direct investigation of migration patterns of hypoxic interneurons, we fully agree it does not fully recapitulate the in vivo environment. While there are multiple aspects that cannot be recapitulated in vitro at this time (e.g. cellular complexity, vasculature, immune response, etc), we are encouraged by the validation of our main findings in ex vivo developing human brain tissue, which strongly supports the validity of our findings for in vivo conditions.

      We will expand our discussion to include more details and the need to validate these findings using in vivo models, while also acknowledging that different species (e.g. rodents versus non-human primates versus humans) might have different responses to hypoxia.

      (4) The authors suggest that hypoxia is also associated with delayed interneuron maturation, yet the bulk RNA-seq data primarily reveal stress and hypoxia-related genes. A more detailed discussion of why genes linked to interneuron maturation and function were not strongly affected would clarify this point.

      We thank the Reviewer for the opportunity to clarify.

      The RNAseq data was performed during the acute stages of hypoxia/reoxygenation and we think a maturation phenotype might be difficult to capture at this point and would require analysis at later in vitro assembloid maturation stages.

      Our speculation about a possible maturation defect is based on data from previous studies from developmental biology that showed failure of interneurons to reach their final cortical location within a specified developmental window will impair their integration within the neuronal network, and thus lead to maturation defects and possible elimination by apoptosis.

      Since preterm infants suffer from countless hypoxic events over multiple months, we suggest these repetitive events are likely to induce cumulative delays in migration, inability of interneurons to reach their target in time, followed by abnormal integration within the excitatory network, and eventual elimination of some of these interneurons through apoptosis. However, the direct demonstration of this effect following a hypoxic insult would require prolonged in vivo experiments in rodents to follow the migration, network integration and apoptosis of interneurons; to our knowledge this experimental design is not technically feasible at this time.

      (5) Relatedly, while the focus on interneuron migration is well justified, acknowledging how hypoxia might also impact other aspects of cortical development (e.g., progenitor proliferation, neuronal maturation, or circuit integration) would place the findings in a broader developmental framework and strengthen their relevance.

      We appreciate the Reviewer’s suggestion to discuss the role of hypoxia on other processes during cortical development. In the revised manuscript, we will include citations about the effects of hypoxia on interneuron proliferation, maturation and circuit integration as available, and also expand to other cell types known to be affected.

      (6) Very minor: in Figure S3C and D, it was not stated what the colors mean (grey: control, yellow: hypoxia)

      Thank you for pointing out this error and we will correct it in our revision.

    2. eLife Assessment

      In this manuscript, the authors investigate the migration of human cortical interneurons under hypoxic conditions using forebrain assembloids and developing human brain tissue, and probe the underlying mechanisms. The study provides the first direct evidence that hypoxia delays interneuron migration and identifies adrenomedullin (ADM) as a potential therapeutic intervention. The findings are important, and the conclusions are convincingly supported by experimental evidence.

    3. Reviewer #1 (Public review):

      Summary:

      This work aims to elucidate the molecular mechanisms affected in hypoxic conditions, causing reduced cortical interneuron migration. They use human assembloids as a migratory assay of subpallial interneurons into cortical organoids and show substantially reduced migration upon 24 hours of hypoxia. Bulk and scRNA-seq show adrenomedullin (ADM) up-regulation, as well as its receptor RAMP2, confirmed atthe protein level. Adding ADM to the culture medium after hypoxic conditions rescues the migration deficits, even though the subtype of interneurons affected is not examined. However, the authors demonstrate very clearly that ineffective ADM does not rescue the phenotype, and blocking RAMP2 also interferes with the rescue. The authors are also applauded for using 4 different cell lines and using human fetal cortex slices as an independent method to explore the DLXi1/2GFP-labelled iPSC-derived interneuron migration in this substrate with and without ADM addition (after confirming that also in this system ADM is up-regulated). Finally, the authors demonstrate PKA-CREB signalling mediating the effect of ADM addition, which also leads to up-regulation of GABAreceptors. Taken together, this is a very carefully done study on an important subject - how hypoxia affects cortical interneuron migration. In my view, the study is of great interest.

      Strengths:

      The strengths of the study are the novelty and the thorough work using several culture methods and 4 independent lines.

      Weaknesses:

      The main weakness is that other genes regulated upon hypoxia are not confirmed, such that readers will not know until which fold change/stats cut-off data are reliable.

    4. Reviewer #2 (Public review):

      Summary

      The manuscript by Puno and colleagues investigates the impact of hypoxia on cortical interneuron migration and downstream signaling pathways. They establish two models to test hypoxia, cortical forebrain assembloids, and primary human fetal brain tissue. Both of these models provide a robust assay for interneuron migration. In addition, they find that ADM signaling mediates the migration deficits and rescue using exogenous ADM. The findings are novel and very interesting to the neurodevelopmental field, revealing new insights into how cortical interneurons migrate and as well, establishing exciting models for future studies. The authors use sufficient iPSC line,s including both XX and XY, so the analysis is robust. In addition, the RNAseq data with re-oxygenation is a nice control to see what genes are changed specifically due to hypoxia. Further, the overall level of validation of the sequencing data and involvement of ADM signaling is convincing, including the validation of ADM at the protein level. Overall, this is a very nice manuscript. I have a few comments and suggestions for the authors.

      Strengths and Weaknesses:

      (1) Can the authors comment on the possibility of inflammatory response pathways being activated by hypoxia? Has this been shown before? While not the focus of the manuscript, it could be discussed in the Discussion as an interesting finding and potential involvement of other cells in the Hypoxic response.

      (2) Could the authors comment on the mechanism at play here with respect to ADM and binding to RAMP2 receptors - is this a potential autocrine loop, or is the source of ADM from other cell types besides inhibitory neurons? Given the scRNA-seq data, what cell-to-cell mechanisms can be at play? Since different cells express ADM, there could be different mechanisms in place in ventral vs dorsal areas.

      (3) For data from Figure 6 - while the ELISA assays are informative to determine which pathways (PKA, AKT, ERK) are active, there is no positive control to indicate these assays are "working" - therefore, if possible, western blot analysis from assembloid tissue could be used (perhaps using the same lysates from Figure 3) as an alternative to validate changes at the protein level (however, this might prove difficult); further to this, is P-CREB activated at the protein level using WB?

      (4) Could the authors comment further on the mechanism and what biological pathways and potential events are downstream of ADM binding to RAMP2 in inhibitory neurons? What functional impact would this have linked to the CREB pathway proposed? While the link to GABA receptors is proposed, CREB has many targets beyond this.

      (5) Does hypoxia cause any changes to inhibitory neurogenesis (earlier stages than migration?) - this might always be known, but was not discussed.

      (6) In the Discussion section, it might be worth detailing to the readers what the functional impact of delayed/reduced migration of inhibitory neurons into the cortex might result in, in terms of functional consequences for neural circuit development.

    5. Reviewer #3 (Public review):

      Summary:

      The authors aimed to test whether hypoxia disrupts the migration of human cortical interneurons, a process long suspected to underlie brain injury in preterm infants but previously inaccessible for direct study. Using human forebrain assembloids and ex vivo developing brain tissue, they visualized and quantified interneuron migration under hypoxic conditions, identified molecular components of the response, and explored the effect of pharmacological intervention (specifically ADM) on restoring the migration deficits.

      Strengths:

      The major strength of this study lies in its use of human forebrain assembloids and ex vivo prenatal brain tissue, which provide a direct system to study interneuron migration under hypoxic conditions. The authors combine multiple approaches: long-term live imaging to directly visualize interneuron migration, bulk and single-cell transcriptomics to identify hypoxia-induced molecular responses, pharmacological rescue experiments with ADM to establish therapeutic potential, and mechanistic assays implicating the cAMP/PKA/pCREB pathway and GABA receptor expression in mediating the effect. Together, this rigorous and multifaceted strategy convincingly demonstrates that hypoxia disrupts interneuron migration and that ADM can restore this defect through defined molecular mechanisms.

      Overall, the authors achieve their stated aims, and the results strongly support their conclusions. The work has a significant impact by providing the first direct evidence of hypoxia-induced interneuron migration deficits in the human context, while also nominating a candidate therapeutic avenue. Beyond the specific findings, the methodological platform - particularly the combination of assembloids and live imaging - will be broadly useful to the community for probing neurodevelopmental processes in health and disease.

      Weaknesses:

      The main weakness of the study lies in the extent to which forebrain assembloids recapitulate in vivo conditions, as the migration of interneurons from hSO to hCO does not fully reflect the native environment or migratory context of these cells. Nevertheless, this limitation is tempered by the fact that the work provides the first direct observation of human interneuron migration under hypoxia, representing a major advance for the field. In addition, while the transcriptomic analyses are valuable and highlight promising candidates, more in-depth exploration will be needed to fully elucidate the molecular mechanisms governing neuronal migration and maturation under hypoxic conditions.

    1. eLife Assessment

      This study provides novel and fundamental insights into the long-term use of DREADDs to modulate neuronal activity in nonhuman primates. The exceptional evidence demonstrates the peak dynamics and the subsequent stability of chemogenetic effects for 1.5 years, informing the experimental designs and the interpretation of highly impactful chemogenetic studies in macaques. The protocols, data, and outcomes can serve as guidelines for future experiments. Therefore, the findings will be of significant interest to the field of chemogenetics and may also be of broader interest to researchers and clinicians who seek to utilize viral vectors and/or related genetic technologies.

    2. Reviewer #1 (Public review):

      Summary:

      Inhibitory hM4Di and excitatory hM3Dq DREADDs are currently the most commonly utilized chemogenetic tools in the field of nonhuman primate research, but there is a lack of available information regarding the temporal aspects of virally-mediated DREADD expression and function. Nagai et al. investigated the longitudinal expression and efficacy of DREADDs to modulate neuronal activity in the macaque model. The authors demonstrate that both hM4Di and hM3Dq DREADDs reach peak expression levels after approximately 60 days and that stable expression was maintained for up to two years for hM4Di and at least one year for hM3Dq DREADDs. During this period, DREADDs effectively modulated neuronal activity, as evidenced by a variety of measures, including behavioural testing, functional imaging, and/or electrophysiological recording. Notably, some of the data suggest that DREADD expression may decline after two-three years. This is a novel finding and has important implications for the utilization of this technology for long-term studies, as well as its potential therapeutic applications. Lastly, the authors highlight that peak DREADD expression may be significantly influenced by the presence of fused or co-expressed protein tags, emphasizing the importance of careful design and selection of viral constructs for neuroscientific research. This study represents a critical step in the field of chemogenetics, setting the scene for future development and optimization of this technology.

      Strengths:

      The longitudinal approach of this study provides important preliminary insights into the long-term utility of chemogenetics, which has not yet been thoroughly explored.

      The data presented are novel and inclusive, relying on well-established in vivo imaging methods as well as behavioral and immunohistochemical techniques. The conclusions made by the authors are generally supported by a combination of these techniques. In particular, the utilization of in vivo imaging as a non-invasive method is translationally relevant and likely to make an impact in the field of chemogenetics, such that other researchers may adopt this method of longitudinal assessment in their own experiments. Rigorous standards have been applied to the datasets, and the appropriate controls have been included where possible.

      The number of macaque subjects (20) from which data was available is also notable. Behavioral testing was performed in 11 subjects, FDG-PET in 5, electrophysiology in 1, and [11C]DCZ-PET in 15. This is an impressive accumulation of work that will surely be appreciated by the growing community of researchers using chemogenetics in nonhuman primates.

      The implication that chemogenetic effects can be maintained for up to 1.5-2 years, followed by a gradual decline beyond this period, is an important development in knowledge. The limited duration of DREADD expression may present an obstacle in the translation of chemogenetic technology as a potential therapeutic tool, and it will be of interest for researchers to explore whether this limitation can be overcome. This study therefore represents a key starting point upon which future research can build.

      Weaknesses:

      None.

    3. Reviewer #2 (Public review):

      Summary:

      This paper reports histological, PET imaging, functional and behavioural data evaluating the longevity of AAV2 infection in multiple brain areas of macaques in the context of DREADD experiments. The central aim is to provide unprecedented information about how long the expression of HM4di or HM3dq receptors are expressed and efficient in modulating brain functions after vector injections. The data show peak expression after 40 to 60 days of vector injection, and stable expressions for up to 1.5 years for hM4di, and that hM3dq remained mostly at 75% of peak after a year, declining to 50% after 2 years. DREADDs effectively modulated neuronal activity and behaviour for approximately two years, evaluated with behavioural testings, neural recordings or FDG-PET. A statistical evaluation revealed that vector titers, DREADD type and tags contribute to the measured peak level of DREADD expression.

      The article present a thorough discussion of the limitations and specificities of chemogenetic approaches in monkeys.

      Strength:

      These are unique data, in non-human primate (NHP), an animal model that not only features physiological and immunological characteristics similar to humans, but also contributes to neurobiological functional studies over long timescales with experiments spanning months or years. This evaluation of long-term efficacy of DREADDs will be very important for all laboratories using chemogenetics in NHP but also for future use of such approach in experimental therapies. The longevity estimates are based on multiple approaches including behavioural and neurophysiological, thus providing information on functional efficacy of DREADD expression.

      Performing such evaluation requires specific tools like PET imaging that very few monkey labs have access to. This study was done by the laboratory that has developed the radiotracer c11-DCZ, used here, a radiotracer binding selectively to DREADDs and providing, using PET, quantitative in vivo measures of DREADD expression. This study and its data should thus be a reference in the field, providing estimates to plan future chemogenetic experiments.

      Publishing databases of experimental outcomes in NHP DREADD experiments is crucial for the community because such experiments are rare, expensive and long. It contributes to refining experiments and reducing the number of animals overall used in the domain.

      Weaknesses:

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the down side is that all things where not planned and equated, creating a lot of unexplained variances in the data. However, this was judiciously used by the authors to provide very relevant information. One might think that organized multi-centric experiments planned using the knowledge acquired here, will provide help testing more parameters, including some related to inter-individual variability, and particular genetic constructs.

    4. Reviewer #3 (Public review):

      Summary

      This manuscript, from the developers of the novel DREADD-selective agonist DCZ (Nagai et al., 2020), utilizes a unique dataset where multiple PET scans in a large number of monkeys, including baseline scans before AAV injection, 30-120 days post-injection, and then periodically over the course of the prolonged experiments, were performed to access short- and long-term dynamics of DREADD expression in vivo, and to associate DREADD expression with the efficacy of manipulating the neuronal activity or behavior. The goal was to provide critical insights into practicality and design of multi-year studies using chemogenetics, and to elucidate factors affecting expression stability.

      Strengths are systematic quantitative assessment of the effects of both excitatory and inhibitory DREADDs, quantification of both the short-term and longer-term dynamics, a wide range of functional assessment approaches (behavior, electrophysiology, imaging), and assessment of factors affecting DREADD expression levels, such as serotype, promoter, titer (concentration), tag, and DREADD type.

      These finding will undoubtedly have a very significant impact on the rapidly growing, but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Overall, the conclusions of the paper are mostly supported by the data but may be overstated in some cases, and some details are also missing or not easily recognizable within the figures. The provision of additional information and analyses would be valuable to the reader and may even benefit the authors' interpretation of the data. 

      We thank the reviewer for the thoughtful and constructive feedback. We are pleased that the reviewer found the overall conclusions of our paper to be well supported by the data, and we appreciate the suggestions for improving figure clarity and interpretive accuracy. Below, we address each point with corresponding revisions.

      The conclusion that DREADD expression gradually decreases after 1.5-2 years is only based on a select few of the subjects assessed; in Figure 2, it appears that only 3 hM4Di cases and 2 hM3Dq cases are assessed after the 2-year timepoint. The observed decline appears consistent within the hM4Di cases, but not for the hM3Dq cases (see Figure 2C: the AAV2.1-hSyn-hM3Dq-IRES-AcGFP line is increasing after 2 years.) 

      We agree that our interpretation should be stated more cautiously, given the limited number of cases assessed beyond the two-year timepoint. In the revised manuscript, we have clarified in the Results that the observed decline is based on a subset of animals. We have also included a text stating that while a consistent decline was observed in hM4Di-expressing monkeys, the trajectory for hM3Dq expression was more variable with at least one case showing an increased signal beyond two years.

      Revised Results section:

      Lines 140, “hM4Di expression levels remained stable at peak levels for approximately 1.5 years, followed by a gradual decline observed in one case after 2.5 years, and after approximately 3 years in the other two cases (Figure 2B, a and e/d, respectively). Compared with hM4Di expression, hM3Dq expression exhibited greater post-peak fluctuations. Nevertheless, it remained at ~70% of peak levels after about 1 year. This post-peak fluctuation was not significantly associated with the cumulative number of DREADD agonist injections (repeated-measures two-way ANOVA, main effect of activation times, F<sub>(1,6)</sub> = 5.745, P = 0.054). Beyond 2 years post-injection, expression declined to ~50% in one case, whereas another case showed an apparent increase (Figure 2C, c and m, respectively).”

      Given that individual differences may affect expression levels, it would be helpful to see additional labels on the graphs (or in the legends) indicating which subject and which region are being represented for each line and/or data point in Figure 1C, 2B, 2C, 5A, and 5B. Alternatively, for Figures 5A and B, an accompanying table listing this information would be sufficient. 

      We thank the reviewer for these helpful suggestions. In response, we have revised the relevant figures (Fig. 1C, 2B, 2C, and 5) as noted in the “Recommendations for the authors”, including simplifying visual encodings and improving labeling. We have also updated Table 2 to explicitly indicate the animal ID and brain regions associated with each data point shown in the figures.

      While the authors comment on several factors that may influence peak expression levels, including serotype, promoter, titer, tag, and DREADD type, they do not comment on the volume of injection. The range in volume used per region in this study is between 2 and 54 microliters, with larger volumes typically (but not always) being used for cortical regions like the OFC and dlPFC, and smaller volumes for subcortical regions like the amygdala and putamen. This may weaken the claim that there is no significant relationship between peak expression level and brain region, as volume may be considered a confounding variable. Additionally, because of the possibility that larger volumes of viral vectors may be more likely to induce an immune response, which the authors suggest as a potential influence on transgene expression, not including volume as a factor of interest seems to be an oversight. 

      We thank the reviewer for raising this important issue. We agree that injection volume could act as a confounding variable, particularly since larger volumes were used in only handheld cortical injections. This overlap makes it difficult to disentangle the effect of volume from those of brain region or injection method. Moreover, data points associated with these larger volumes also deviated when volume was included in the model.

      To address this, we performed a separate analysis restricted to injections delivered via microinjector, where a comparable volume range was used across cases. In this subset, we included injection volume as additional factor in the model and found that volume did not significantly impact peak expression levels. Instead, the presence of co-expressed protein tags remained a significant predictor, while viral titer no longer showed a significant effect. These updated results have replaced the originals in the revised Results section and in the new Figure 5. We have also revised the Discussion to reflect these updated findings.

      The authors conclude that vectors encoding co-expressed protein tags (such as HA) led to reduced peak expression levels, relative to vectors with an IRES-GFP sequence or with no such element at all. While interesting, this finding does not necessarily seem relevant for the efficacy of long-term expression and function, given that the authors show in Figures 1 and 2 that peak expression (as indicated by a change in binding potential relative to non-displaced radioligand, or ΔBPND) appears to taper off in all or most of the constructs assessed. The authors should take care to point out that the decline in peak expression should not be confused with the decline in longitudinal expression, as this is not clear in the discussion; i.e. the subheading, "Factors influencing DREADD expression," might be better written as, "Factors influencing peak DREADD expression," and subsequent wording in this section should specify that these particular data concern peak expression only. 

      We appreciate this important clarification. In response, we have revised the title to "Protein tags reduce peak DREADD expression levels" in the Results section and “Factors influencing peak DREADD expression levels” in the Discussion section. Additionally, we specified that our analysis focused on peak ΔBP<sub>ND</sub> values around 60 days post-injection. We have also explicitly distinguished these findings from the later-stage changes in expression seen in the longitudinal PET data in both the Results and Discussion sections.

      Reviewer #1 (Recommendations for the authors):

      (1) Will any of these datasets be made available to other researchers upon request?

      All data used to generate the figures have been made publicly available via our GitHub repository (https://github.com/minamimoto-lab/2024-Nagai-LongitudinalPET.git). This has been stated in the "Data availability" section in the revised manuscript.

      (2) Suggested modifications to figures:

      a) In Figures 2B and C, the inclusion of "serotype" as a separate legend with individual shapes seems superfluous, as the serotype is also listed as part of the colour-coded vector

      We agree that the serotype legend was redundant since this information is already included in the color-coded vector labels. In response, we have removed the serotype shape indicators and now represent the data using only vector-construct-based color coding for clarity in Figure 2B and C.

      b) In Figures 3A and B, it would be nice to see tics (representing agonist administration) for all subjects, not just the two that are exemplified in panels C-D and F-H. Perhaps grey tics for the non-exemplified subjects could be used.

      In response, we have included black and white ticks to indicate all agonist administration across all subjects in Figure 3A and B, with the type of agonist clearly specified. 

      c) In Figure 4C, a Nissl- stained section is said to demonstrate the absence of neuronal loss at the vector injection sites. However, if the neuronal loss is subtle or widespread, this might not be easily visualized by Nissl. I would suggest including an additional image from the same section, in a non-injected cortical area, to show there is no significant difference between the injected and non-injected region.

      To better demonstrate the absence of neuronal loss at the injection site, we have included an image from the contralateral, non-injected region of the same section for comparison (Fig. 4C).

      d) In Figure 5A: is it possible that the hM3Dq construct with a titer of 5×10^13 gc/ml is an outlier, relative to the other hM3Dq constructs used?

      We thank the reviewer for raising this important observation. To evaluate whether the high-titer constructs represented a statistical outlier that might artifactually influence the observed trends, we performed a permutation-based outlier analysis. This assessment identified this point in question, as well as one additional case (titer 4.6 x 10e13 gc/ml, #255, L_Put), as significant outlier relative to the distribution of the dataset.

      Accordingly, we excluded these two data points from the analysis. Importantly, this exclusion did not meaningfully alter the overall trend or the statistical conclusions—specifically, the significant effect of co-expressed protein tags on peak expression levels remain robust. We have updated the Methods section to describe this outlier handling and added a corresponding note in the figure legend.

      Reviewer #2 (Public review): 

      Weaknesses 

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the downside is that all things were not planned and equated, creating a lot of unexplained variances in the data. This was yet judiciously used by the authors, but one might think that planned and organized multicentric experiments would provide more information and help test more parameters, including some related to inter-individual variability, and particular genetic constructs. 

      We thank the reviewer for bringing this important point to our attention. We fully acknowledge that the retrospective nature of our dataset—compiled from multiple studies conducted within a single laboratory—introduces variability related to differences in injection parameters and scanning timelines. While this reflects the practical realities and constraints of long-term NHP research, we agree that more standardized and prospectively designed studies would better control such source of variances. To address this, we have added the following statement to the "Technical consideration" section in Discussion:

      Lines 297, "This study included a retrospective analysis of datasets pooled from multiple studies conducted within a single laboratory, which inherently introduced variability across injection parameters and scan intervals. While such an approach reflects real-world practices in long-term NHP research, future studies, including multicenter efforts using harmonized protocols, will be valuable for systematically assessing inter-individual differences and optimizing key experimental parameters."

      Reviewer #2 (Recommendations for the authors):

      I just have a few minor points that might help improve the paper:

      (1) Figure 1C y-axis label: should add deltaBPnd in parentheses for clarity.

      We have added “ΔBP<sub>ND</sub>” to the y-axis label for clarity.

      The choice of a sigmoid curve is the simplest clear fit, but it doesn't really consider the presence of the peak described in the paper. Would there be a way to fit the dynamic including fitting the peak?

      We agree that using a simple sigmoid curve for modeling expression dynamics is a limitation. In response to this and a similar comment from Reviewer #3, we tested a double logistic function (as suggested) to see if it better represented the rise and decline pattern. However, as described below, the original simple sigmoid curve was a better fit for the data. We have included a discussion regarding this limitation of this analysis. See Reviewer #3 recommendations (2) for details.

      The colour scheme in Figure 1C should be changed to make things clearer, and maybe use another dimension (like dotted lines) to separate hM4Di from hM3Dq.

      We have improved the visual clarity of Figure 1C by modifying the color scheme to represent vector construct and using distinct line types (dashed for hM4Di and solid for hM3Dq data) to separate DREADD type.

      (2) Figure 2

      I don't understand how the referencing to 100 was made: was it by selecting the overall peak value or the peak value observed between 40 and 80 days? If the former then I can't see how some values are higher than the peak. If the second then it means some peak values occurred after 80 days and data are not completely re-aligned.

      We thank the reviewer for the opportunity to clarify this point. The normalization was based on the peak value observed between 40–80 days post-injection, as this window typically captured the peak expression phase in our dataset (see Figure 1). However, in some long-term cases where PET scans were limited during this period—e.g., with one scan performing at day 40—it is possible that the actual peak occurred later. Therefore, instances where ΔBP<sub>ND</sub> values slightly exceeded the reference peak at later time points likely reflect this sampling limitation. We have clarified this methodological detail in the revised Results section to improve transparency.

      The methods section mentions the use of CNO but this is not in the main paper which seems to state that only DCZ was used: the authors should clarify this

      Although DCZ was the primary agonist used, CNO and C21 were also used in a few animals (e.g., monkeys #153, #221, and #207) for behavioral assessments. We have clarified this in the Results section and revised Figure 3 to indicate the specific agonist used for each subject. Additionally, we have updated the Methods section to clearly specify the use and dosage of DCZ, CNO, and C21, to avoid any confusion regarding the experimental design.

      Reviewer #3 (Public review): 

      Minor weaknesses are related to a few instances of suboptimal phrasing, and some room for improvement in time course visualization and quantification. These would be easily addressed in a revision. <br /> These findings will undoubtedly have a very significant impact on the rapidly growing but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

      We thank the reviewer for the positive assessment of our manuscript and for the constructive suggestions. We address each comment in the following point-by-point responses and have revised the manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify the reasoning was, behind restricting the analysis in Figure 1 only to 7 monkeys with subcortical AAV injection?

      We focused the analysis shown in Figure 1 on 7 monkeys with subcortical AAV injections who received comparative injection volumes. These data were primary part of vector test studies, allowing for repeated PET scans within 150 days post-injection. In contrast, monkeys with cortical injections—including larger volumes—were allocated to behavioral studies and therefore were not scanned as frequently during the early phase. We will clarify this rationale in the Results section.

      (2) Figure 1: Not sure if a simple sigmoid is the best model for these, mostly peaking and then descending somewhat, curves. I suggest testing a more complex model, for instance, double logistic function of a type f(t) = a + b/(1+exp(-c*(t-d))) - e/(1+exp(-g*(t-h))), with the first logistic term modeling the rise to peak, and the second term for partial decline and stabilization

      We appreciate the reviewer’s thoughtful suggestion to use a double logistic function to better model both the rising and declining phases of the expression curve. In response to this and similar comments from Reviewer #1, we tested the proposed model and found that, while it could capture the peak and subsequent decline, the resulting fit appeared less biologically plausible (See below). Moreover, model comparison using BIC favored the original simple sigmoid model (BIC = 61.1 vs. 62.9 for the simple and double logistic model, respectively). This information has been included in the revised figure legend for clarity.

      Given these results, we retained the original simple sigmoid function in the revised manuscript, as it provides a sufficient and interpretable approximation of the early expression trajectory—particularly the peak expression-time estimation, which was the main purpose of this analysis. We have updated the Methods section to clarify our modeling and rationale as follows:

      Lines 530, "To model the time course of DREADD expression, we used a single sigmoid function, referencing past in vivo fluorescent measurements (Diester et al., 2011). Curve fitting was performed using least squares minimization. For comparison, a double logistic function was also tested and evaluated using the Bayesian Information Criterion (BIC) to assess model fit."

      We also acknowledge that a more detailed understanding of post-peak expression changes will require additional PET measurements, particularly between 60- and 120-days post-injection, across a larger number of animals. We have included this point in the revised Discussion to highlight the need for future work focused on finer-grained modeling of expression decline:

      Lines 317, “Although we modeled the time course of DREADD expression using a single sigmoid function, PET data from several monkeys showed a modest decline following the peak. While the sigmoid model captured the early-phase dynamics and offered a reliable estimate of peak timing, additional PET scans—particularly between 60- and 120-days post-injection—will be essential to fully characterize the biological basis of the post-peak expression trajectories.”

      Author response image 1.<br />

      (3) Figure 2: It seems that the individual curves are for different monkeys, I counted 7 in B and 8 in C, why "across 11 monkeys"? Were there several monkeys both with hM4Diand hM3Dq? Does not look like that from Table 1. Generally, I would suggest associating specific animals from Tables 1 and 2 to the panels in Figures 1 and 2.

      Some animals received multiple vector types, leading to more curves than individual subjects. We have revised the figure legends and updated Table 2 to explicitly relate each curve with the specific animal and brain region.

      (4) I also propose plotting the average of (interpolated) curves across animals, to convey the main message of the figure more effectively.

      We agree that plotting the mean of the interpolated expression curves would help convey the group trend. We added averaged curves to Figure 2BC.

      (5) Similarly, in line 155 "We assessed data from 17 monkeys to evaluate ... Monkeys expressing hM4Di were assessed through behavioral testing (N = 11) and alterations in neuronal activity using electrophysiology (N = 2)..." - please explain how 17 is derived from 11, 2, 5 and 1. It is possible to glean from Table 1 that it is the calculation is 11 (including 2 with ephys) + 5 + 1 = 17, but it might appear as a mistake if one does not go deep into Table 1.

      We have clarified in both the text and Table 1 that some monkeys (e.g., #201 and #207) underwent both behavioral and electrophysiological assessments, resulting in the overlapping counts. Specifically, the dataset includes 11 monkeys for hM4Di-related behavior testing (two of which underwent electrophysiology testing), 5 monkeys assessed for hM3Dq with FDG-PET, and 1 monkey assessed for hM3Dq with electrophysiology, totaling 19 assessments across 17 monkeys. We have revised the Results section to make this distinction more explicit to avoid confusion, as follows:

      Lines 164, "Monkeys expressing hM4Di (N = 11) were assessed through behavioral testing, two of which also underwent electrophysiological assessment. Monkeys expressing hM3Dq (N = 6) were assessed for changes in glucose metabolism via [<sup>18</sup>F]FDG-PET (N = 5) or alterations in neuronal activity using electrophysiology (N = 1).”

      (6) Line 473: "These stock solutions were then diluted in saline to a final volume of 0.1 ml (2.5% DMSO in saline), achieving a dose of 0.1 ml/kg and 3 mg/kg for DCZ and CNO, respectively." Please clarify: the injection volume was always 0.1 ml? then it is not clear how the dose can be 0.1 ml/kg (for a several kg monkey), and why DCZ and CNO doses are described in ml/kg vs mg/kg?

      We thank the reviewer for pointing out this ambiguity. We apologize for the oversight and also acknowledge that we omitted mention of C21, which was used in a small number of cases. To address this, we have revised the “Administration of DREADD agonist” section of the Methods to clearly describe the preparation, the volume, and dosage for each agonist (DCZ, CNO, and C21) as follows:

      Lines 493, “Deschloroclozapine (DCZ; HY-42110, MedChemExpress) was the primary agonist used. DCZ was first dissolved in dimethyl sulfoxide (DMSO; FUJIFILM Wako Pure Chemical Corp.) and then diluted in saline to a final volume of 1 mL, with the final DMSO concentration adjusted to 2.5% or less. DCZ was administered intramuscularly at a dose of 0.1 mg/kg for hM4Di activation, and at 1–3 µg/kg for hM3Dq activation. For behavioral testing, DCZ was injected approximately 15 min before the start of the experiment unless otherwise noted. Fresh DCZ solutions were prepared daily.

      In a limited number of cases, clozapine-N-oxide (CNO; Toronto Research Chemicals) or Compound 21 (C21; Tocris) was used as an alternative DREADD agonist for some hM4Di experiments. Both compounds were dissolved in DMSO and then diluted in saline to a final volume of 2–3 mL, also maintaining DMSO concentrations below 2.5%. CNO and C21 were administered intravenously at doses of 3 mg/kg and 0.3 mg/kg, respectively.”

      (7) Figure 5A: What do regression lines represent? Do they show a simple linear regression (then please report statistics such as R-squared and p-values), or is it related to the linear model described in Table 3 (but then I am not sure how separate DREADDs can be plotted if they are one of the factors)?

      We thank the reviewer for the insightful question. In the original version of Figure 5A, the regression lines represented simple linear fits used to illustrate the relationship between viral titer and peak expression levels, based on our initial analysis in which titer appeared to have a significant effect without any notable interaction with other factors (such as DREADD type).

      However, after conducting a more detailed analysis that incorporated injection volume as an additional factor and excluded cortical injections and statistical outliers (as suggested by Reviewer #1), viral titer was no longer found to significantly predict peak expression levels. Consequently, we revised the figure to focus on the effect of reporter tag, which remained the most consistent and robust predictor in our model.

      In the updated Figure 5, we have removed the relationship between viral titer and expression level with regression lines.

    1. eLife Assessment

      This study presents an important finding on the role of GATA4 in aging- and OA-associated cartilage pathology. The conclusions are well supported by compelling in vitro and in vivo evidence. This work will be of broad interest to both cell biologists and orthopedic/skeletal health clinicians.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript assesses the differences between young and aged chondrocytes. Through transcriptomic analysis and further assessments in chondrocytes, GATA4 was found to be increased in aged chondrocyte donors compared to young. Subsequent mechanistic analysis with lentiviral vectors, siRNAs, and a small molecule were used to study the role of GATA4 in young and old chondrocytes. Lastly, an in vivo study was used to assess the effect of GATA4 expression on osteoarthritis progression in a DMM mouse model.

      Strengths:

      This work linked the over expression of GATA4 to NF-kB signaling pathway activation, alterations to the TGF-b signaling pathway, and found that GATA4 increased the progression of OA compared to the DMM control group. Indicating that GATA4 contributes to the onset and progression of OA in aged individuals.

      Comments on revised version:

      Great work! All my concerns have been well addressed.

    3. Reviewer #2 (Public review):

      Summary:

      This study elucidated the impact of GATA4 on aging- and injury-induced cartilage degradation and osteoarthritis (OA) progression, based on the team's finding that GATA expression is positively correlated with aging in human chondrocytes. By integrating cell culture of human chondrocytes, gene manipulation tools (siRNA, lentivirus), biological/biochemical analyses and murine models of post-traumatic OA, the team found that increasing GATA4 levels reduced anabolism and increased catabolism of chondrocytes from young donors, likely through upregulation of the BMP pathway, and that this impact is not correlated with TGF-β stimulation. Conversely, silencing GATA4 by siRNA attenuated catabolism and elevated aggrecan/collagen II biosynthesis of chondrocytes from old donors. The physiological relevance of GATA4 was further validated by the accelerated OA progression observed in lentivirus-infected mice in the DMM model.

      Strengths:

      This is a highly significant and innovative study that provides new molecular insights into cartilage homeostasis and pathology in the context of aging and disease. The experiments were performed in a comprehensive and rigorous manner. The data were interpreted thoroughly in the context of the current literature.

      Weaknesses:

      The only aspect that would benefit from further clarification is a more detailed discussion of aging-associated ECM changes in the context of prior literature.

    4. Reviewer #3 (Public review):

      Summary:

      This is an exciting, comprehensive paper that demonstrates the role of GATA4 on OA-like changes in chondrocytes. The authors present elegant reverse translational experiments that justify this mechanism and demonstrate the sufficiency of GATA4 in a mouse model of osteoarthritis (DMM), where GATA4 drove cartilage degeneration and pain in a manner that was significantly worse than DMM alone. This could pave the way for new therapies for OA that account for both structural changes and pain.

      Strengths:

      (1) GATA4 was identified from human chondrocytes.

      (2) IHC and sequencing confirmed GATA4 presence.

      (3) Activation of SMADs is clearly shown in vitro with GATA4 overexpression.

      (4) The role of GATA4 was functionally assessed in vivo using the mouse DMM model, where the authors uncovered that GATA4 worsens OA structure and hyperalgesia in male mice.

      (5) It is interesting that GATA4 is largely known to be found in cardiac cells and to have a role in cardiac repair, metabolism, and inflammation, among other things listed by the authors in the discussion (in liver, lung, pancreas). What could this new knowledge of GATA4 mean for OA as a potentially systemically mediated disease, where cardiac disease and metabolic syndrome are often co-morbid?

      Weaknesses:

      I do not have further comments. Thank you for addressing the previously mentioned concerns.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #2 (Public review):

      The only aspect that would benefit from further clarification is a more detailed discussion of aging-associated ECM changes in the context of prior literature. 

      Thank you. Please refer to the new section (Lines 604-617)

      Reviewer #3 (Public review):

      (1) It would be useful to explain why GATA4 was chosen over HIF1a, which was the most differentially expressed. 

      Thank you. Please refer to Lines 530-537.  

      “Of note, Hypoxia-Inducible Factor 1α (HIF1 α) was the most differentially expressed gene predicted to regulate chondrocyte aging. The connection between HIF1 α and aging has been previously reported.[32] Furthermore, additional studies have investigated HIF1 in association with OA and assessed its use as a therapeutic target.[33,34] Therefore, we decided to focus on GATA4, which was less studied in chondrocytes but highly associated with cellular senescence, an aging hallmark. However, our selection did not dampen the importance of HIF1α and other molecules listed in Figure 1D in chondrocyte aging. They can be further studied in the future using the same strategy employed in the current work.”

      (2) In Figure 5, it would be useful to demonstrate the non-surgical or naive limbs to help contextualize OARSI scores and knee hyperalgesia changes. 

      In the current study, we focused on the DMM control and DMM Gata4 virus groups so we did not include a sham control group. We recognized this was a limitation of this study.  

      (3) While there appear to be GATA4 small-molecule inhibitors in various stages of development that could be used to assess the effects in age-related OA, those experiments are out of scope for the current study.  

      We agree with this comment that the results are still preliminary, which was the reason that we put it in the supplementary materials. However, we felt like the result is informative, which will support the potential of GATA4 as a therapeutic target and inspire the development of more specific inhibitors. Therefore, we would still keep the results in the current study.

    1. eLife Assessment

      This is a useful tool for code-less analysis of patterns in cell migratory behaviours in vivo using intravital microscopy data and allows correlation with spatial features of the tumour microenvironment. There is a clear need for these tools to make quantitative analysis, comparison and interpretation of complex cell tracking data more accessible and solid evidence is provided of its applicability to tracks generated by both proprietary and open tracking software.

    2. Reviewer #1 (Public review):

      In this work, Rios-Jimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of Intravital imaging (IVM) data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). A key strength is that it is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. In addition, demo datasets are available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline.

      To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation.<br /> While the analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment, conclusions are appropriately tempered in the absence of additional experiments and controls.

      The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches.

      While the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers. Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment.

      When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'.

    3. Reviewer #2 (Public review):

      Summary:

      The authors produce a new tool, BEHAV3D to analyse tracking data and to integrate these analyses with large and small scale architectural features of the tissue. This is similar to several other published methods to analyse spatio-temporal data, however, the connection to tissue features is a nice addition, as is the lack of requirement for coding. The tool is then used to analyse tracking data of tumour cells in diffuse midline glioma. They suggest 7 clusters exist within these tracks and that they differ spatially. They ultimately suggest that these behaviours occur in distinct spatial areas as determined by CytoMAP.

      Strengths:

      The tool appears relatively user-friendly and is open source. The combination with CytoMAP represents a nice option for researchers.

      The identification of associations between cell track phenotype and spatial features is exciting and the diffuse midline glioma data nicely demonstrates how this could be used.

    4. Reviewer #3 (Public review):

      The manuscript by Rios-Jimenez developed a software tool, BEHAV3D Tumor Profiler, to analyze 3D intravital imaging data and identify distinctive tumor cell migratory phenotypes based on the quantified 3D image data. Moreover, the heterogeneity module in this software tool can correlate the different cell migration phenotypes with variable features of the tumor microenvironment. Overall, this is a useful tool for intravital imaging data analysis and its open-source nature makes it accessible to all interested users.

      Strengths:

      An open-source software tool that can quantify cell migratory dynamics from intravital imaging data and identify distinctive migratory phenotypes that correlate with variable features of the tumor microenvironment.

      Weaknesses:

      Motility is the main tumor cell feature analyzed in the study together with some other tumor-intrinsic features, such as morphology. However, these features are insufficient to characterize and identify the heterogeneity of the tumor cell population that impacts their behaviors in the complex tumor microenvironment (TME). For instance, there are important non-tumor cell types in the TME, and the interaction dynamics of tumor cells with other cell types, e.g., fibroblasts and distinct immune cells, play a crucial role in regulating tumor behaviors. BEHAV3D-TP focuses on analysis of tumor-alone features, and cannot be applied to analyze important cell-cell interaction dynamics in 3D.

    5. Author response:

      The following is the authors’ response to the current reviews

      Reviewer #1 (Public review):

      In this work, Rios-Jimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of Intravital imaging (IVM) data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). A key strength is that it is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. In addition, demo datasets are available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline.

      To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation.

      While the analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment, conclusions are appropriately tempered in the absence of additional experiments and controls.

      The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches.

      While the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers. Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment.

      When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'.

      We thank the reviewer for carefully considering our manuscript and providing constructive comments. We appreciate the recognition of BEHAV3D-TP’s user-friendliness, modular design, and ability to link cell behavior with the tumor microenvironment. In the future, we plan to extend the tool to incorporate segmentation and tracking modules, once we have approaches that are broadly applicable or allow for personalized model training, further enhancing its utility for the community.

      Reviewer #2 (Public review):

      Summary:

      The authors produce a new tool, BEHAV3D to analyse tracking data and to integrate these analyses with large and small scale architectural features of the tissue. This is similar to several other published methods to analyse spatio-temporal data, however, the connection to tissue features is a nice addition, as is the lack of requirement for coding. The tool is then used to analyse tracking data of tumour cells in diffuse midline glioma. They suggest 7 clusters exist within these tracks and that they differ spatially. They ultimately suggest that these behaviours occur in distinct spatial areas as determined by CytoMAP.

      Strengths:

      - The tool appears relatively user-friendly and is open source. The combination with CytoMAP represents a nice option for researchers.

      - The identification of associations between cell track phenotype and spatial features is exciting and the diffuse midline glioma data nicely demonstrates how this could be used.

      We thank the reviewer for their careful reading and thoughtful comments. Feedback from all revision rounds has helped us clarify key points and improve the manuscript, and we are grateful for the positive remarks regarding our application to diffuse midline glioma and the potential of the tool to enable new biological insights.

      Reviewer #3 (Public review):

      The manuscript by Rios-Jimenez developed a software tool, BEHAV3D Tumor Profiler, to analyze 3D intravital imaging data and identify distinctive tumor cell migratory phenotypes based on the quantified 3D image data. Moreover, the heterogeneity module in this software tool can correlate the different cell migration phenotypes with variable features of the tumor microenvironment. Overall, this is a useful tool for intravital imaging data analysis and its open-source nature makes it accessible to all interested users.

      Strengths:

      An open-source software tool that can quantify cell migratory dynamics from intravital imaging data and identify distinctive migratory phenotypes that correlate with variable features of the tumor microenvironment.

      Weaknesses:

      Motility is the main tumor cell feature analyzed in the study together with some other tumor-intrinsic features, such as morphology. However, these features are insufficient to characterize and identify the heterogeneity of the tumor cell population that impacts their behaviors in the complex tumor microenvironment (TME). For instance, there are important non-tumor cell types in the TME, and the interaction dynamics of tumor cells with other cell types, e.g., fibroblasts and distinct immune cells, play a crucial role in regulating tumor behaviors. BEHAV3D-TP focuses on analysis of tumor-alone features, and cannot be applied to analyze important cell-cell interaction dynamics in 3D.

      We thank the reviewer for their careful assessment and encouraging remarks regarding BEHAV3D-TP.

      Regarding the concern about the tool’s current focus on motility features, we would like to clarify again that BEHAV3D-TP is designed to be highly flexible and extensible. Users can incorporate a wide range of features—including dynamic, morphological, and spatial parameters—into their analyses. In the latest revision, we have make this even more explicit by explaining that the feature selection interface allows users to either (i) directly select them for clustering or (ii) select features for correlation with clusters (See Small scale phenotyping module section in Methods).

      Importantly, while our current analysis emphasizes clustering based on dynamic behaviors, Figure 4 demonstrates that these behavioral clusters are associated at the single-cell level with distinct proximities to key TME components, such as TAMMs and blood vessels. These spatial interaction features could also have been included in the clustering itself—creating dynamic-spatial clusters—but we deliberately chose not to do so. This decision was guided by established principles of feature selection: including features with unknown or potentially irrelevant variability can introduce noise and obscure biologically meaningful patterns, ultimately reducing the clarity and interpretability of the resulting clusters. Instead, we adopted a two-step approach—first identifying clusters based on core dynamic features, then examining their relationships with spatial and interaction metrics. This allowed us to reveal meaningful associations of particular cell behavior such as the invading cluster in proximity of TAMMs without overfitting or complicating the clustering model.

      To address the reviewer’s point in the latest revision round, we have updated the Small-scale phenotyping module  to highlight the possibility of including spatial interaction features with various TME cell types. We also revised the manuscript text and Figure 1 to clarify that these environmental features can be used both upstream as clustering input (Option 1) and for downstream analysis (Option 2), depending on the user’s experimental goals. Attached to this rebuttal letter, we also provide an additional figure illustrating these options in the feature selection panels of the Colab notebook.

      In summary, while the clustering presented in this study is based on dynamic parameters, BEHAV3D-TP fully supports the integration of interaction features and other non-motility descriptors. This modularity enables users to customize their analysis pipelines according to specific biological questions, including those involving cell–cell interactions and spatial dynamics within the TME.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Intravital microscopy (IVM) is a powerful tool that facilitates live imaging of individual cells over time in vivo in their native 3D tissue environment. Extracting and analysing multi-parametric data from IVM images however is challenging, particularly for researchers with limited programming and image analysis skills. In this work, RiosJimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of IVM data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). It is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. Demo datasets are also available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline. 

      To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation. 

      The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches. 

      Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment. When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'. 

      Strengths: 

      •  Figures are clearly presented, and the manuscript is easy to follow. 

      •  The pipeline appears to be intuitive and user-friendly for researchers with limited computational expertise. A detailed step-by-step video and demo datasets are also included to support its uptake. 

      •  The different computational modules have been tested using relevant datasets, including imaging data of normal and tumour cells in vivo. 

      •  All code is open source, and the pipeline can be implemented with Google Colab. 

      •  The tool combines multiple dynamic parameters extracted from timelapse IVM images to identify single-cell behavioural patterns and to cluster cells into distinct groups sharing similar behaviours, and provides avenues to map these onto in vivo or ex vivo imaging data of the tumour microenvironment 

      Weaknesses: 

      •  The tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images. To use the tool researchers must first extract dynamic cellular parameters from their IVM datasets using other software including Imaris, which is expensive and therefore not available to all. Nonetheless, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers. 

      •  The analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. The authors acknowledge this however, and conclusions are appropriately tempered in the absence of additional experiments and controls. 

      We thank the reviewer for their thorough and constructive assessment of our work and are pleased that the accessibility, functionality, and potential impact of BEHAV3DTumour Profiler were well received. We particularly appreciate the acknowledgment of the tool’s ease of use for researchers with limited computational expertise, the clarity of the manuscript, and the relevance of our approach for identifying multi-parametric migratory behaviours and their correlation with the tumour microenvironment.

      Regarding the weaknesses raised:

      (1) Lack of built-in tracking and kinetic parameter extraction – As noted in our initial revision, while we agree that integrating open-source tracking and segmentation functionality could be valuable, it is beyond the scope of the current work. Our tool is designed to focus specifically on downstream analysis of already extracted kinetic data, addressing a gap in post-processing tools for exploring complex migratory behaviour and spatial correlations. Since different experimental systems often require tailored imaging and segmentation pipelines, we believe that decoupling tracking from the downstream analysis can actually be a strength, offering greater versatility. Researchers can use their preferred or most appropriate tracking software—whether proprietary or opensource—and then analyze the resulting data with BEHAV3D-TP. To support this, we ensured compatibility with widely used tools including open-source Fiji plugins (e.g., TrackMate, MTrackJ, ManualTracking), and we also cited several relevant studies and that address the upstream processing steps. Importantly, the main aim of our tool is to fill the gap in post-tracking analysis, enabling quantitative interpretation and pattern recognition that has until now required substantial coding effort or custom solutions.

      (2) Preliminary nature of the biological conclusions – We fully agree with this assessment and have explicitly acknowledged this limitation in the manuscript. Our aim was to demonstrate the utility of BEHAV3D-TP in uncovering heterogeneity and spatial associations in vivo, while encouraging further hypothesis-driven studies using complementary biological approaches. We are grateful that the reviewer recognizes the cautious interpretation of our results and their added value beyond single-parameter analysis.

      Reviewer #2 (Public review): 

      Summary: 

      The authors produce a new tool, BEHAV3D to analyse tracking data and to integrate these analyses with large and small scale architectural features of the tissue. This is similar to several other published methods to analyse spatio-temporal data, however, the connection to tissue features is a nice addition, as is the lack of requirement for coding. The tool is then used to analyse tracking data of tumour cells in diffuse midline glioma. They suggest 7 clusters exist within these tracks and that they differ spatially. They ultimately suggest that there these behaviours occur in distinct spatial areas as determined by CytoMAP. 

      Strengths: 

      - The tool appears relatively user-friendly and is open source. The combination with CytoMAP represents a nice option for researchers. 

      - The identification of associations between cell track phenotype and spatial features is exciting and the diffuse midline glioma data nicely demonstrates how this could be used. 

      Weaknesses: 

      The revision has dealt with many concerns, however, the statistics generated by the process are still flawed. While the statistics have been clarified within the legends and this is a great improvement in terms of clarity the underlying assumptions of the tests used are violated. The problem is that individual imaging positions or tracks are treated as independent and then analysed by ANOVA. As separate imaging positions within the same mouse are not independent, nor are individual cells within a single mouse, this makes the statistical analyses inappropriate. For a deeper analysis of this that is feasible within a review please see Lord, Samuel J., et al. "SuperPlots: Communicating reproducibility and variability in cell biology." The Journal of cell biology 219.6 (2020): e202001064. Ultimately, while this is a neat piece of software facilitating the analysis of complex data, the fact that it will produce flawed statistical analysis is a major problem. This problem is compounded by the fact that much imaging analysis has been analysed in this inappropriate manner in the past, leading to issues of interpretation and ultimately reproducibility. 

      We thank the reviewer for their careful reading and thoughtful feedback. We are encouraged by the recognition of BEHAV3D-TP’s ease of use, open-source accessibility, and the value of integrating cell behaviour with spatial features of the tissue. We appreciate the positive remarks regarding our application to diffuse midline glioma (DMG) and the potential for the tool to enable new biological insights.

      We also appreciate the reviewer’s continued concern regarding the statistical treatment of the data. While we agree with the broader principle that care must be taken to avoid violating assumptions of independence, we respectfully disagree that all instances where individual tracks or imaging positions are used constitute flawed analysis. Importantly, our work is centered on characterizing heterogeneity at the single-cell level in distinct TME regions. Therefore, in certain cases—especially when comparing distinct behavioral subtypes across varying TME environments and multiple mice—it is appropriate to treat individual imaging positions as independent units. This approach is particularly relevant given our findings that large-scale TME regions differ across positions. When analyzing features such as the percentage of DMG cells in proximity to TAMMs, averaging per mouse would obscure these regional differences and reduce the resolution of biologically meaningful variation.

      To address this concern further, we have revised the figure legends, main text, and documentation, carefully considering the appropriate statistical unit for each analysis. As detailed below, we used mouse-level aggregation where the experimental question required inter-mouse reproducibility, and a position-based approach where the aim was to explore intra-tumoral heterogeneity.

      Figure 3d and Supplementary Figure 5d: In this analysis, we treated imaging positions as independent units because our data specifically demonstrate that, within individual mice, different positions correspond to distinct large-scale tumor microenvironment phenotypes. Therefore, averaging across the whole mouse would obscure these important spatial differences and not accurately reflect the heterogeneity we aim to characterize.

      Figure 4c-e; Supplementary Figure 6d: While our initial aim was to highlight single-cell variability, we acknowledge that the original presentation may have been misleading. In the revised manuscript, we have updated the graphs for greater clarity. To quantify how often tumor cells of each behavioral type are located near TAMMs (Fig. 4c) or blood vessels (Fig. 4e), we now calculate the percentage of tumor cells "close" to environmental feature per behavioral cluster within each imaging position. This classification is based on the distance to the TME feature of interest and is detailed in the “Large-scale phenotyping” section of the Methods. For the number of SR101 objects in a 30um radius we averaged per position.

      We treated individual imaging positions as the units of analysis rather than averaging per mouse, as our data (see Figure 2) show that positions vary in their TME phenotypes—such as Void, TAMM/Oligo, and TAMM/Vascularized—as well as in the number of TAMMs, SR101 cells or blood vessels per position. These differences are biologically meaningful and relevant to the quantification that we performed – percentage of tumor cell in close proximity to distinct TME features.

      To account for inter-mouse and TME region variability, we applied a linear mixedeffects model with both mouse and TME class included as random effects.

      Supplementary Figure 3d: Following the reviewer’s suggestion, we have averaged the distance to the 3 closest GBM neighbours per mouse, treating each mouse as an independent unit for comparison across distinct GBM morphodynamic clusters. To account for inter-mouse variability when assessing statistical significance, we employed a linear mixed model with mouse included as a random effect. 

      Distance to 3 neighbours is a feature not used in the clustering, thus variability between mice can be more pronounced—for example, due to differences in tumor compactness or microenvironment structure across individual mice. To appropriately account for this, mouse was included as a random effect in the model.

      Supplementary Figure 4c: Following the reviewer’s suggestion, we averaged cell speed per mouse, treating each mouse as an independent unit for comparison across distinct DMG behavioral clusters. Statistical significance was assessed using ANOVA followed by Tukey’s post hoc test. When comparing cell speed, which is a feature used in the clustering process, inter-mouse variability was already addressed during clustering itself. Therefore, in the downstream analysis of this cluster-derived feature, it is appropriate to treat each mouse as an independent unit without including mouse as a random effect.

      Supplementary Figure 5e-g: Following the reviewer’s suggestion, we averaged cell speed per mouse, treating each mouse as an independent unit for comparison across distinct DMG behavioral clusters. Statistical significance was assessed using ANOVA followed by Tukey’s post hoc test.

      Supplementary Figure 6c: Following the reviewer’s suggestion, we averaged cell distance to the 10 closest DMG neighbours per mouse, treating each mouse as an independent unit for comparison across distinct DMG behavioral clusters. To account for inter-mouse variability, we used a linear mixed model with mouse included as a random effect.

      Reviewer #3 (Public review): 

      The manuscript by Rios-Jimenez developed a software tool, BEHAV3D Tumor Profiler, to analyze 3D intravital imaging data and identify distinctive tumor cell migratory phenotypes based on the quantified 3D image data. Moreover, the heterogeneity module in this software tool can correlate the different cell migration phenotypes with variable features of the tumor microenvironment. Overall, this is a useful tool for intravital imaging data analysis and its open-source nature makes it accessible to all interested users. 

      Strengths: 

      An open-source software tool that can quantify cell migratory dynamics from intravital imaging data and identify distinctive migratory phenotypes that correlate with variable features of the tumor microenvironment. 

      Weaknesses: 

      Motility is only one tumor cell feature and is probably not sufficient to characterize and identify the heterogeneity of the tumor cell population that impacts their behaviors in the complex tumor microenvironment (TME). For instance, there are important nontumor cell types in the TME, and the interaction dynamics of tumor cells with other cell types, e.g., fibroblasts and distinct immune cells, play a crucial role in regulating tumor behaviors. BEHAV3D-TP focuses on only motility feature analysis, and cannot be applied to analyze other tumor cell dynamic features or cell-cell interaction dynamics. 

      Regarding the concern about the tool’s current focus on motility features, we would like to clarify that BEHAV3D-TP is designed to be highly flexible and extensible. As described in our first revision, users can incorporate a wide range of features—including dynamic, morphological, and spatial parameters—into their analyses. In the current revision, we have make this even more explicit by explaining that the feature selection interface allows users to either (i) directly select them for clustering or (ii) select features for correlation with clusters (See Small scale phenotyping module section in Methods and Rebuttal Figure).

      Importantly, while our current analysis emphasizes clustering based on dynamic behaviors, Figure 4 demonstrates that these behavioral clusters are associated at the single-cell level with distinct proximities to key TME components, such as TAMMs and blood vessels. These spatial interaction features could also have been included in the clustering itself—creating dynamic-spatial clusters—but we deliberately chose not to do so. This decision was guided by established principles of feature selection: including features with unknown or potentially irrelevant variability can introduce noise and obscure biologically meaningful patterns, ultimately reducing the clarity and interpretability of the resulting clusters. Instead, we adopted a two-step approach—first identifying clusters based on core dynamic features, then examining their relationships with spatial and interaction metrics. This allowed us to reveal meaningful associations of particular cell behavior such as the invading cluster in proximity of TAMMs without overfitting or complicating the clustering model.

      To further address the reviewer’s point, we have updated the Small-scale phenotyping module  to highlight the possibility of including spatial interaction features with various TME cell types. We also revised the manuscript text and Figure 1 to clarify that these environmental features can be used both upstream as clustering input (Option 1) and for downstream analysis (Option 2), depending on the user’s experimental goals. Author response image 1 illustrates these options in the feature selection panels of the Colab notebook.

      Author response image 1.

      (a) In the small-scale phenotyping module, microenvironmental factors (MEFs) detected in the segmented IVM movies are identified and their coordinates imported. From here, there are two options: (b) include the relationship to these MEFs as a feature for clustering, or (c) exclude this relationship and instead correlate MEFs with cell behavior to assess potential spatial associations.<br />

      In summary, while the clustering presented in this study is based on dynamic parameters, BEHAV3D-TP fully supports the integration of interaction features and other non-motility descriptors. This modularity enables users to customize their analysis pipelines according to specific biological questions, including those involving cell–cell interactions and spatial dynamics within the TME.

      Reviewer #2 (Recommendations for the authors): 

      If the software were adjusted to produce analyses following best practices in the field as outlined in Lord, Samuel J., et al. "SuperPlots: Communicating reproducibility and variability in cell biology." The Journal of cell biology 219.6 (2020): e202001064. this could be a helpful piece of software. The major current issue would be that it democratises the ability to analyse complex imaging data, allowing non-experts to carry out these analyses but misleads them and encourages poor statistical practice. 

      We appreciate the reviewer’s suggestion and the reference to best practices outlined in Lord et al., 2020. As discussed in detail in our point-by-point response to Reviewer #2, we have revised several figures to enhance clarity and statistical rigor, including Figure 4c,e; Supplementary Figures 3d, 4c, 5e–g, and 6c–d. Specifically, we adjusted how data are summarized and displayed—averaging per mouse where appropriate and clarifying the statistical methods used. Where imaging positions were retained as the unit of analysis, this decision was grounded in the biological relevance of intra-mouse spatial heterogeneity (as demonstrated in Figure 2). Additionally, we applied linear mixed-effects models in cases where inter-mouse or inter-Large scale TME regions variability needed to be accounted for. We believe these changes address the core concern about reproducibility and statistical interpretation while preserving the biological insights captured by our approach.

    1. eLife Assessment

      This study uses steered molecular dynamics simulations to interrogate force transmission in the mechanosensitive NOMPC channel, which plays roles including soft-touch perception, auditory function, and locomotion. The valuable finding that the ankyrin spring transmits force through torsional rather than compression forces may help understand the entire TRP channel family. The evidence is considered to be solid, although full opening of the channel is not seen, and it has been noted that experimental validation of reduced mechanosensitivity through mutagenesis of proposed ankyrin/TRP domain coupling interactions would help substantiate the findings.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript uses molecular dynamics simulations to understand how forces felt by the intracellular domain are coupled to opening of the mechanosensitive ion channel NOMPC. The concept is interesting - as the only clearly defined example of an ion channel that opens due to forces on a tethered domain, the mechanism by which this occur are yet to be fully elucidated. The main finding is that twisting of the transmembrane portion of the protein - specifically via the TRP domain that is conserved within the broad family of channels- is required to open the pore. That this could be a common mechanism utilised by a wide range of channels in the family, not just mechanically gated ones, makes the result significant. It is intriguing to consider how different activating stimuli can produce a similar activating motion within this family. While the authors do not see full opening of the channel, only an initial dilation, this motion is consistent with partial opening of structurally characterized members of this family.

      Strengths:

      Demonstrating that rotation of the TRP domain is the essential requirement for channel opening would have significant implcaitions for other members of this channel family.

      Weaknesses:

      The manuscript centres around 3 main computational experiments. In the first, a compression force is applied on a truncated intracellular domain and it is shown that this creates both a membrane normal (compression) and membrane parallel (twisting) force on the TRP domain. This is a point that was demonstrated in the authors prior eLife paper - so the point here is to quantify these forces for the second experiment.

      The second experiment is the most important in the manuscript. In this, forces are applied directly to two residues on the TRP domain with either a membrane normal (compression) or membrane parallel (twisting) direction, with the magnitude and directions chosen to match that found in the first experiment. Only the twisting force is seen to widen the pore in the triplicate simulations, suggesting that twisting, but not compression can open the pore. This result is intriguing and there appears to be a significant difference between the dilation of pore with the two force directions. When the forces are made of similar magnitude, twisting still has a larger effect than forces along the membrane normal.

      The second important consideration is that the study never sees full pore opening, rather a widening that is less than that seen in open state structures of other TRP channels and insufficient for rapid ion currents. This is something the authors acknowledge in their prior manuscript Twist may be the key to get this dilation, but we don't know if it is the key to full pore opening. Structural comparison to open state TRP channels supports that this represents partial opening along the expected pathway of channel gating.

      Experiment three considers the intracellular domain and determines the link between compression and twisting of the intracellular AR domain. In this case, the end of the domain is twisted and it is shown that the domain compresses, the converse to the similar study previously done by the authors in which compression of the domain was shown to generate torque.

    3. Reviewer #2 (Public review):

      This study uses all atom MD simulation to explore the mechanics of channel opening for the NOMPC mechanosensitive channel. Previously the authors used MD to show that external forces directed along the long-axis of the protein (normal to the membrane) results in AR domain compression and channel opening. This force causes two changes to the key TRP domains adjacent to the channel gate: 1) a compressive force pushes the TRP domain along the membrane normal, while 2) a twisting torque induces a clock-wise rotation on the TRP domain helix when viewing the bottom of the channel from the cytoplasm. Here, the authors wanted to understand which of those two changes are responsible for increasing the inner pore radius, and they show that it is the torque. The simulations in Figure 2 probe this question with different forces, and we can see the pore open with parallel forces in the membrane, but not with the membrane-normal forces. I believe this result as it is reproducible, the timescales are reaching 1 microsecond, and the gate is clearly increasing diameter to about 4 Å. This seems to be the most important finding in the paper, but the impact is limited since the authors already shows how forces lead to channel opening, and this is further teasing apart the forces and motions that are actually the ones that cause the opening.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Duan and Song interrogates the gating mechanisms and specifically force transmission in mechanosensitive NOMPC channels using steered molecular dynamics simulations. They propose that the ankyrin spring can transmit force to the gate through torsional forces adding molecular detail to the force transduction pathways in this channel.

      Strengths:

      Detailed, rigorous simulations coupled with a novel model for force transduction.

      Weaknesses:

      Experimental validation of reduced mechanosensitivity through mutagenesis of proposed ankyrin/TRP domain coupling interactions would greatly enhance the manuscript.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses molecular dynamics simulations to understand how forces felt by the intracellular domain are coupled to the opening of the mechanosensitive ion channel NOMPC. The concept is interesting - as the only clearly defined example of an ion channel that opens due to forces on a tethered domain, the mechanism by which this occurs is yet to be fully elucidated. The main finding is that twisting of the transmembrane portion of the protein - specifically via the TRP domain that is conserved within the broad family of channels- is required to open the pore. That this could be a common mechanism utilised by a wide range of channels in the family, not just mechanically gated ones, makes the result significant. It is intriguing to consider how different activating stimuli can produce a similar activating motion within this family. However, the support for the finding can be strengthened as the authors cannot yet exclude that other forces could open the channel if given longer or at different magnitudes. In addition, they do not see the full opening of the channel, only an initial dilation. Even if we accept that twist is essential for this, it may be that it is not sufficient for full opening, and other stimuli are required.

      Strengths:

      Demonstrating that rotation of the TRP domain is the essential requirement for channel opening would have significant implications for other members of this channel family.

      Thank you for your positive summary and comments.

      Weaknesses:

      The manuscript centres around 3 main computational experiments. In the first, a compression force is applied on a truncated intracellular domain and it is shown that this creates both a membrane normal (compression) and membrane parallel (twisting) force on the TRP domain. This is a point that was demonstrated in the authors’ prior eLife paper - so the point here is to quantify these forces for the second experiment.

      The second experiment is the most important in the manuscript. In this, forces are applied directly to two residues on the TRP domain with either a membrane normal (compression) or membrane parallel (twisting) direction, with the magnitude and directions chosen to match that found in the first experiment. Only the twisting force is seen to widen the pore in the triplicate simulations, suggesting that twisting, but not compression can open the pore. This result is intriguing and there appears to be a significant difference between the dilation of pore with the two force directions.

      However, there are two caveats to this conclusion. Firstly, is the magnitude of the forces - the twist force is larger than the applied normal force to match the result of experiment 1. However, it is possible that compression could also open the pore at the same magnitude or if given longer. It may be that twist acts faster or more easily, but I feel it is not yet possible to say it is the key and exclude the possibility that compression could do something similar.

      Thank you for your insightful comment. As you pointed out, the membranenormal pushing forces exerted at residues E1571 and R1581 are approximately onethird and two-thirds, respectively, of the membrane-parallel twisting forces. These magnitudes were derived from a previous simulation (Wang et al., 2021), in which we decomposed the resultant force into its membrane-parallel and membrane-normal components upon applying a compressive force to the intracellular AR end. Our results indicated that, upon reaching the TRP helix, the induced twisting force is indeed greater, which partially reflects actual physiological conditions. Therefore, considering the magnitudes of the resultant forces alone, the twisting force is predominantly greater than the pushing force when the AR domain is subjected to compression.

      Then the question became, if forces of the same magnitude are applied in either the membrane-normal or membrane-parallel directions, what would the outcome be? To address this, we conducted additional simulations. Considering the situations discussed above, we applied a smaller membrane-parallel force instead of a larger membranenormal force that may disrupt the integrity of protein and membrane structure. As shown in the new Figure S6, we adjusted the applied membrane-parallel force to either half or one-third of the original value. When we applied half of the force used in the original setup, the channel opened in two out of three trajectories. When applying onethird of the force, the channel opened in one out of three trajectories. Together with our previous results, these findings suggest that if forces of equal magnitude are applied in the membrane-normal and membrane-parallel directions, the membrane-parallel force has a higher probability of inducing channel opening.

      Still, one cannot completely exclude the possibility that the pushing force on the TRP helix can open the channel if given a very long time. This becomes unfeasible to examine with MD simulations, so we investigated the likely conformational changes of multiple TRP family proteins upon opening, and found that the TRP rotation is a universal conformational change, while the TRP tilt is much less consistent (Figure 6). These findings gives us more confidence that the twist force plays a more crucial role in channel gating than the pushing force. We have added a new table (Table 1) and a new figure (Figure 6) to present this analysis.

      In addition, we did not intend to imply that compression is incapable of contributing to channel opening. In fact, our aim was to highlight that compression can generate both a twisting force and a pushing force, with the twisting force appearing to be the more critical component for facilitating channel opening. We concur that we cannot completely dismiss the possibility that the pushing component may also assist in channel opening. Consequently, we have revised our discussion on pages 4,6 to enhance clarity.

      I also note that when force was applied to the AR domain in experiment 1, the pore widened more quickly than with the twisting force alone, suggesting that compression is doing something to assist with opening.

      You are correct that the trajectory corresponding to Experiment 1 (Figure S1(b)) indicates pore opening around 300-400 ns, while the trajectory for Experiment 2 (800 ns) shows pore opening around 600 ns. This observation may suggest that the pore opens more rapidly in Experiment 1, assuming that the simulation conditions were identical for both experiments. However, it is important to note that in Experiment 1, an external force was applied to AR29. In contrast, in Experiment 2, the force was applied exclusively to two selected residues on the TRP domain, while other TRP residues also experienced mechanical forces, albeit to a lesser extent. The differing methods of force application in the two experiments complicate the comparison of pore opening speeds under these conditions.

      We acknowledge that the compression of the AR spring can facilitate pore opening. This compression generates both a twisting component and a pushing component on the TRP domain. Our simulations and structural analyses of multiple TRP channels suggest that the twisting component plays a predominant role in gating. However, we cannot entirely rule out the possibility that the pushing component may also contribute to this process. We have carefully revised our Result (page 6), Discussion (pages 10–12) and Methods (pages 14–17) sections to enhance clarity.

      Given that the forces are likely to be smaller in physiological conditions it could still be critical to have both twist and compression present. As this is the central aspect of the study, I believe that examining how the channel responds to different force magnitudes could strengthen the conclusions and recommend additional simulations be done to examine this.

      Thank you for your valuable comments. We agree that the force applied in Experiment 2 is possible to be larger than the physiological conditions. Therefore, we performed additional simulations to investigate the possibility of opening the pore using smaller torsional forces.

      As shown in the new Figure S6, we applied half and one-third of the original force and performed three replicate simulations for each condition. With half the force, the pore opened in two out of the three simulations. And with one-third of the applied force, the pore opened in one out of the three replicate simulations. The probability of pore opening within the same simulation time decreased as the applied force was reduced, consistent with our expectations. These new results are provided as supplementary figures (Figure S6) in the revised manuscript.

      We anticipate that further reductions in the forces will result in additional delays in the opening process; however, this would lead to prohibitive computational costs. Consequently, we have decided to conclude our analysis at this stage and have discussed this matter on page 6 of the revised manuscript.

      The second important consideration is that the study never sees a full pore opening, but rather a widening that is less than that seen in open state structures of other TRP channels and insufficient for rapid ion currents. This is something the authors acknowledge in their prior manuscript in eLife 2021. Although this may simply be due to the limited timescale of the simulations, it needs to be clearly stated as a caveat to the conclusions. Twist may be the key to getting this dilation, but we do not know if it is the key to full pore opening. To demonstrate that the observed dilation is a first step in the opening of pores, a structural comparison to open-state TRP channels would be beneficial in providing evidence that this motion is along the expected pathway of channel gating.

      We are grateful for this insightful comment. We acknowledge that our simulations do not capture a fully open state, but rather a dilation that is smaller than the open-state structures of other TRP channels. In our simulations, a pore radius exceeding 2 Å is considered as a partially open state, as this is generally sufficient for the permeation of water molecules or even small cations such as K<sup>+</sup> and Na<sup>+</sup> However, the passage of larger molecules and ions, such as Ca<sup>2+</sup> and clusters of hydrated ions, remains challenging. As you noted, this partial opening may be attributed to the limited timescale of the simulations.

      Furthermore, in accordance with your suggestion, we analyzed numerous TRP proteins for which multiple open or intermediate states have been resolved, and we have included a new figure (Figure 6). A clockwise rotation of the TRP domain is observed in the majority of these proteins upon gating. For instance, in the case of RnTRPV1, our analysis revealed that during TRPV1 activation, when different ligands are bound (RTX, DkTX), the pore undergoes gradual dilation, which involves a progressive clockwise rotation of the TRP domain. This analysis provides evidence that the observed motion aligns with expected gating transitions, supporting the notion that twist-induced TRP rotation and pore dilation may represent an initial step in the pore opening process.

      Nonetheless, we concur that further studies, including extended simulations, which are currently unfeasible, or experimental validation, will be necessary to ascertain whether our proposed mechanism is adequate for the complete opening of the pore. We have carefully discussed this on pages 10–12.

      Experiment three considers the intracellular domain and determines the link between compression and twisting of the intracellular AR domain. In this case, the end of the domain is twisted and it is shown that the domain compresses, the converse to the similar study previously done by the authors in which compression of the domain was shown to generate torque. While some additional analysis is provided on the inter-residue links that help generate this, this is less significant than the critical second experiment.

      Although experiment three is less significant in revealing the underlying gating mechanism, it provides quantitative measurements of the mechanical properties of the intriguing AR spring structure, which are currently challenging to obtain experimentally. These provide computational predictions for future experiments to validate.

      Reviewer #2 (Public review):

      This study uses all-atom MD simulation to explore the mechanics of channel opening for the NOMPC mechanosensitive channel. Previously the authors used MD to show that external forces directed along the long axis of the protein (normal to the membrane) result in AR domain compression and channel opening. This force causes two changes to the key TRP domains adjacent to the channel gate: 1) a compressive force pushes the TRP domain along the membrane normal, while 2) a twisting torque induces a clock-wise rotation on the TRP domain helix when viewing the bottom of the channel from the cytoplasm. Here, the authors wanted to understand which of those two changes is responsible for increasing the inner pore radius, and they show that it is the torque. The simulations in Figure 2 probe this question with different forces, and we can see the pore open with parallel forces in the membrane, but not with the membrane-normal forces. I believe this result as it is reproducible, the timescales are reaching 1 microsecond, and the gate is clearly increasing diameter to about 4 Å. This seems to be the most important finding in the paper, but the impact is limited since the authors already show how forces lead to channel opening, and this is further teasing apart the forces and motions that are actually the ones that cause the opening.

      Thank you for your insightful comments. We appreciate your recognition of our key finding that torque is responsible for increasing the inner pore radius. Indeed, our simulations illustrated in Figure 2 systematically explore the effects of different forces on pore opening. These results demonstrate that membrane-parallel forces are effective, while membrane-normal forces are not within the simulation time. We acknowledge that this study builds upon previous findings regarding force-induced channel opening. However, we believe that further decomposition of the specific forces and motions responsible for this process provides valuable mechanistic insights. By distinguishing the role of torque from the membrane-normal forces of the TRP helix, which is highly conserved across the TRP channel family, our work contributes to a more precise understanding of TRP channel gating. Moreover, in the revised manuscript, we conducted a systematic analysis of the structures of TRP family proteins and discovered that the clockwise rotation of the TRP domain is likely a universal gating mechanism among the TRP family, which significantly enhances and strengthens our original findings (Figure 6).

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Duan and Song interrogates the gating mechanisms and specifically force transmission in mechanosensitive NOMPC channels using steered molecular dynamics simulations. They propose that the ankyrin spring can transmit force to the gate through torsional forces adding molecular detail to the force transduction pathways in this channel.

      Strengths:

      Detailed, rigorous simulations coupled with a novel model for force transduction.

      Thank you for your positive comments.

      Weaknesses:

      Experimental validation of reduced mechanosensitivity through mutagenesis of proposed ankyrin/TRP domain coupling interactions would greatly enhance the manuscript. I have some additional questions documented below:

      We attempted to measure the mechanical properties of the AR domain and conduct mutagenesis experiments in collaboration with Prof. Jie Yan’s laboratory at the Mechanobiology Institute, National University of Singapore; however, this proved to be a significant challenge at this time. Given the urgency of the publication, we have decided to first publish the computational results and reserve further experimental studies for future investigations.

      (1) The membrane-parallel torsion force can open NOMPC

      How does the TRP domain interact with the S4-S5 linker? In the original structural studies, the coordination of lipids in this region seems important for gating. In this manner does the TRP domain and S4-S5 linker combined act like an amphipathic helix as suggested first for MscL (Bavi et al., 2016 Nature Communications) and later identified in many MS channels (Kefauver et al., 2020 Nature).

      In our analysis of the compression trajectories (trajectory: CI-1, Figure S4), we identified stable interactions between the TRP domain and the S4-S5 linker. These interactions primarily involve the residues S1421 and F1422 of the S4-S5 linker, as indicated by the large pink data points in Figure S4. Therefore, we agree that the TRP helix and the S4–S5 linker can be considered an amphipathic helical unit, analogous to the amphipathic helix observed in MscL and other mechanosensitive channels. Moreover, the pocket adjacent to the S4-S5 linker has been recognized as a binding site for small molecules in other ligand-activated TRP channels, such as the vanilloid-binding TRPV1. We hypothesize that this unit is likely to play a critical role in the polymodal gating of the TRP channel family, including ligand-induced activation. In the revised manuscript, we have included an analysis of the interaction between the TRP domain and the transmembrane (TM) domain on page 4 (Figure S4), and we have briefly discussed its implications on pages 10 and 12.

      (2) Torsional forces on shorter ankyrin repeats of mammalian TRP channels

      Is it possible torsional forces applied to the shorter ankyrin repeats of mammalian TRPs may also convey force in a similar manner?

      This is an intriguing question.

      To answer your question, we studied the full-length squirrel TRPV1 (PDB: 7LQY, Nadezhdin et al. (2021)) using all-atom steered MD simulations. We applied pushing or torsional forces to the intracellular AR1-2 region of TRPV1, separately (Figure S10(a)). Similar to NOMPC, rotation of the TRP domain was observed under both types of mechanical stimulation (Figure S10(b-e)). The conformational change induced by the torsional force on the TRP domain resembles the change observed in NOMPC. This suggests that a torsional force applied to the shorter ankyrin repeats of mammalian TRPs may yield similar effects on channel gating. However, given that these ankyrin repeats do not act like tether elements, the implications of these results in the context of biological functions remain unclear. Additionally, in NOMPC, the AR domain is connected to the TRP domain through a linker helix (LH) domain, composed of multiple stacked helices that form a relatively compact structure (Figure 1(a)). In contrast, TRPV1 does not possess a similarly compact LH domain connecting the AR domain to the TRP domain (Figure S10(a)). These structural differences render our conclusions regarding NOMPC not directly applicable to TRPV1. We have included an additional discussion about this on page 12 (Figure S10).

      (3) Constant velocity or constant force

      For the SMD the authors write "and a constant velocity or constant force". It’s unclear from this reviewer’s perspective which is used to generate the simulation data.

      Thank you for pointing out this ambiguity. In our simulations, we first applied constant-velocity pulling to achieve specific force magnitudes, followed by constantforce pulling. This protocol allowed us to initiate the motion of the protein in a controlled manner and observe the response of the system under sustained forces. We have now clarified this in the revised Methods section.

      Reviewer #1 (Recommendations for the authors):

      The language in the paper requires some editing - particularly in the introduction. For example, what is meant by ion channels ’coalescing to form mechanical receptors’? Are the authors implying it requires multiple channels to form a receptor? It is stated that mechanically gated ion channels are only found in nerve endings when in fact they are found in almost every cell type. Another example is the statement ’In the meantime’ the TRP domain was observed to rotate when this observation came prior to the others mentioned before. While these sound like minor edits, they significantly change the meaning of the introduction. I recommend careful editing of the manuscript to avoid accidental inaccuracies like this.

      Thank you for your feedback on the clarity and accuracy of the introduction. We have carefully revised the manuscript, particularly the abstract and instroduction sections, to address these concerns:

      (1) We have reworded the original sentence ’These mechanosensitive ion channels, coalescing to form mechanical receptors, are strategically positioned within the sensory neuron terminals intricately nestled within the epidermal layer.’ into ’In both vertebrates and invertebrates, mechanosensitive ion channels are widely expressed in peripheral sensory neurons located near or within the surface tissues responsible for detecting mechanical stimuli.’

      (2) We have replaced the phrase "In the meantime" with "Interestingly" to introduce the conformational change of the TRP domain that we believe is crucial.

      (3) We have carefully reviewed the entire manuscript and used a language editing tool, Writefull integrated within Overleaf, to proof-check the language problems.

      Reviewer #2 (Recommendations for the authors):

      How do the energy values in Figure 3b, compare with the continuum energy values reported by Argudo et al. JGP (2019)? I wonder what value the authors would get with a new replicate run slower - say 200 ns total aggregate simulation? This would probe the convergence of this energy value. It seems important to determine whether the loading velocity of the experiments performed here with the steered MD is slow enough to allow the protein to relax and adopt lower energy configurations during the transition. The true loading is likely to occur on the millisecond timescale, not the nanosecond to low microsecond timescale. That said, I don’t mean to detract from the result in Figure 2, as this is likely quite solid in my opinion given the nearly 1 microsecond simulations and the replicates showing the same results.

      Thank you for your valuable suggestions. It is important to note that we calculated different physical quantities compared to those reported in Argudo’s study. In Figure 3b, we calculated the torque ( instead of the energy, although they share the same dimensional units) of the long AR bundle (AR9-29 of the four filaments combined) and subsequently determined its torsion coefficient. Argudo’s study calculated the torsional spring constant (𝑘<sub>ɵ</sub>) of three 6-AR-unit stretches of one filament, which were designated as ANK1 (AR 12-17), ANK2 (AR 17-22) and ANK3 (AR 22–27). As the four filaments are coupled within the bundled structure and the torsional axes differ between an individual filament and the four-filament bundle, a direct comparison of the torsional spring constants reported in the two studies is not meaningful.

      We agree that extending the simulation time may provide deeper insights into the convergence of energy values. In accordance with your suggestion, we conducted additional simulations to further investigate convergence and compare the results with our existing data, thereby ensuring robustness and consistency. Specifically, we slowed down the original operation of twisting from 10 degrees over 100 ns to 10 degrees over 200 ns, and extended the holding time for selected frames (sampled every 2.5 degrees) from 100 ns to 200 ns. We have updated Figure 3 and relevant main text accordingly (page 7). The results of the new simulations are similar to those of the previous ones, with the fitted torsion coefficient revised from (2.31 ± 0.44) × 10<sup>3</sup>kJ mol<sup>−1</sup>  ra<sup>−1</sup> 1 to (2.30 ± 0.31) × 10<sup>3</sup> kJmol<sup>−1</sup> rad<sup>−1</sup>  This close agreement indicates that our simulations are well-converged. Additionally, we updated the compression–twist coupling coefficient, , from (1.67 ± 0.14) nmrad<sup>−1</sup> to (1.32 ± 0.11) nmrad<sup>−1</sup>

      As you suggested, we conducted an additioanl analysis to determine whether the loading velocity/force with the steered MD is sufficiently slow to facilitate the relaxation of the protein and its adoption of lower-energy configurations during the transition. For simulations involving the application of membrane-normal or membrane-parallel force on the TRP domain, we utilized DSSP (Define Secondary Structure of Proteins) analysis to assess the stability of the secondary structure of the TRP domain. The results indicated that, during the application of external forces, the secondary structure of the TRP domain maintained good stability, as illustrated in Figure S11. For simulations involving the rotation of the AR domain, we also analyzed the DSSP of the AR9 to AR11 units, which are positioned directly above the AR8 domain where the twisting force is applied. The secondary structure of the AR domain also exhibited good stability (Figure S12). These are briefly discussed in the Methods section of the revised manuscript (page 17).

      It is unclear to me that the force transmission analysis in Figure 4 provides much insight into the mechanics of opening. Perhaps the argument was made, but I did not appreciate it. Related to this the authors state that the transfer velocity is 1.8 nm/ps based on their previous study. Is this value profound or is it simply the velocity of sound in the protein?

      The analysis of force transmission presented in Figure 4 offers detailed insights into the transfer of force along the AR domain. While this may appear straightforward, the information elucidates how a pushing force can induce a twisting force during its transmission through the AR spring structure, as well as the primary contributions that stabilize this transmission pathway. To enhance clarity, we have included an additional discussion on page 9.

      The force transfer velocity is expected to align with the velocity of sound within the protein. The value of 1.8 nm/ps, however, is specific to the unique structure of the AR spring, which is quite interesting to report in our opinion. Additionally, this rapid transfer speed suggests that the simulation timescale is sufficient for enabling the transfer of compression force from the bottom of the AR domain to the TRP domain in our simulations, given that the simulation timescale is considerably longer than the force propagation timescale within the protein.

      The methods description is largely complete, but is missing some details on the MD simulations (barostat, thermostat, piston constants, etc.).

      Thank you for pointing out the missing details; we have added the additional information in the revised Methods section.

      References

      Nadezhdin, K. D., A. Neuberger, Y. A. Nikolaev, L. A. Murphy, E. O. Gracheva, S. N. Bagriantsev, and A. I. Sobolevsky (2021). Extracellular cap domain is an essential component of the trpv1 gating mechanism. Nature communications 12(1), 2154.

      Wang, Y., Y. Guo, G. Li, C. Liu, L. Wang, A. Zhang, Z. Yan, and C. Song (2021). The pushto-open mechanism of the tethered mechanosensitive ion channel nompc. Elife 10, e58388.

    1. eLife Assessment

      This important study examined the complexity of emergent dynamics of large-scale neural network models after perturbation (perturbational complexity index, PCI) and used it as a measurement of consciousness to account for previous recordings of humans at various anesthetized levels. The evidence supporting the conclusion is convincing and constitutes a unified framework for different observations related to consciousness. There are many fields that would be interested in this study, including cognitive neuroscience, psychology, complex systems, neural networks, and neural dynamics.

    2. Reviewer #1 (Public review):

      Summary:

      This paper attempts to measure the complex changes of consciousness in the human brain as a whole. Inspired by the perturbational complexity index (PCI) from classic research, authors introduce simulation PCI (𝑠𝑃𝐶𝐼) of a time series of brain activity as a measure of consciousness. They first use large-scale brain network modeling to explore its relationship with the network coupling and input noise. Then the authors verify the measure with empirical data collected in previous research.

      Strengths:

      The conceptual idea of the work is novel. The authors measure the complexity of brain activity from the perspective of dynamical systems. They provide a comparison of the proposed measure with four other indexes. The text of this paper is very concise, supported by experimental data and theoretical model analysis.

      Comments on revisions:

      The manuscript is in good shape after revision. I would suggest that the author open-source the code and data in this study.

    3. Reviewer #2 (Public review):

      Summary:

      Breyton and colleagues analysed the emergent dynamics from a neural mass model, characterised the resultant complexity of the dynamics, and then related these signatures of complexity to datasets in which individuals had been anaesthetised with different pharmacological agents. The results provide a coherent explanation for observations associated with different time series metrics, and further help to reinforce the importance of modelling when integrating across scientific studies.

      Strengths:

      * The modelling approach was clear, well-reasoned and explicit, allowing for direct comparison to other work and potential elaboration in future studies through the augmentation with richer neurobiological detail.

      * The results serve to provide a potential mechanistic basis for the observation that Perturbational Complexity Index changes as a function of consciousness state.

      Weaknesses:

      * Coactivation cascades were visually identified, rather than observed through an algorithmic lens. Given that there are numerous tools for quantifying the presence/absence of cascades from neuroimaging data, the authors may benefit from formalising this notion.

      * It was difficult to tell, graphically, where the model's operating regime lay. Visual clarity here will greatly benefit the reader.

      Comments on revisions:

      The authors have addressed my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer 1 (Public review):

      Summary:

      This paper attempts to measure the complex changes of consciousness in the human brain as a whole. Inspired by the perturbational complexity index (PCI) from classic research, authors introduce simulation PCI (_s_PCI) of a time series of brain activity as a measure of consciousness. They first use large-scale brain network modeling to explore its relationship with the network coupling and input noise. Then the authors verify the measure with empirical data collected in previous research.

      Strengths:

      The conceptual idea of the work is novel. The authors measure the complexity of brain activity from the perspective of dynamical systems. They provide a comparison of the proposed measure with four other indexes. The text of this paper is very concise, supported by experimental data and theoretical model analysis.

      We would like to thank the reviewer for evaluation of our work and the positive feedback. In what follows we would like to clarify the ambiguities in our initial submission, and the respective changes to the manuscript.

      (1) Consciousness is a network phenomenon. The measure defined by the authors is to consider the maximal sPCI across the nodes stimulated. This measure is based on the time series of one node. The measure may be less effective in quantifying the ill relationship between nodes. This may contribute to the less predictive power of anesthesia (Figure 4b).

      Thank you for this comment, consciousness is indeed a network phenomenon. sPCI is in fact measured across the whole network: to compute sPCI we apply PCI to simulated activity of the whole network. The perturbation is applied to individual nodes of network (different node for each trial) and each time, the response to the stimulus is measured through sPCI in the whole network. To make this explicit, the relevant section now reads:

      “In line with the PCI experimental protocol, we sampled from multiple initial conditions and stimulated regions, presenting the maximum sPCI for each regime (i.e., each {G,σ}). For each simulation, we measured the complexity of the activity of the whole network over a 10-second period post-stimulus.”

      (2) One of the focuses of the work is the use of a dynamic model of brain networks. The explanation of the model needs to be in more detail.

      Thank you for your feedback. We expanded the method section.

      (3) The equations should be checked. For example, there should be no max on the left side of the first equation on page 13.

      We thank the reviewer for spotting this typo, and we removed the max on the left side of this equation, and also checked all the other equations for correctness. The equation now reads:

      (4) The quality of the figures should be improved.

      Thank you for your comment. We have made adjustments to several figures and we hope that they are clearer now...

      (5) Figure 4 should be discussed and analyzed more in the text.

      Thank you for pointing this out. We added the following paragraph discussing the figure (now number 5) in the results section:

      “Classification results using a linear SVM are reported in Fig. 5. We report the crossplots of PCI and each of the resting-state metrics for all subjects and conditions in Fig. 5a. Each point corresponds to the calculation of the given metric over the whole recording normalized by its duration. We find that for fluidity (Fig. 5a, third panel), there is a perfect linear separation between Propofol and Xenon anesthesia on the left side and Wakefulness and Ketamine anesthesia on the right side. This corresponds to the classification accuracy result of 100% for the consciousness class in Fig. 5b, which is the same for PCI. As expected, PCI and fluidity behave poorly at classifying the presence of an anesthetic agent due to the confusion induced by Ketamine. However, the size of the functional repertoire performs an almost perfect classification for this grouping. Only one subject under Ketamine has a high functional repertoire (Fig. 5a, left panel), but all other subjects in the anesthesia condition have a size of functional repertoire roughly under 100. Classification accuracies for complexity and GAP at the group level are less performant but are shown for completeness.”

      (6) The usage of the terms PCI and sPCI should be distinguished.

      We would like to thank the reviewer for pointing out this ambiguity. The PCI metric had to be adapted for the synthetic data. We have now further emphasized this in the methods sections – “Perturbational Complexity”.

      Reviewer 2 (Public review):

      Summary:

      Breyton and colleagues analysed the emergent dynamics from a neural mass model, characterised the resultant complexity of the dynamics, and then related these signatures of complexity to datasets in which individuals had been anaesthetised with different pharmacological agents. The results provide a coherent explanation for observations associated with different time series metrics, and further help to reinforce the importance of modelling when integrating across scientific studies.

      Strengths:

      (1) The modelling approach was clear, well-reasoned, and explicit, allowing for direct comparison to other work and potential elaboration in future studies through the augmentation with richer neurobiological detail.

      (2) The results serve to provide a potential mechanistic basis for the observation that the Perturbational Complexity Index changes as a function of the consciousness state.

      We would like to thank the reviewer for assessing our work, and the valuable feedback.

      Weaknesses:

      (3) Coactivation cascades were visually identified, rather than observed through an algorithmic lens. Given that there are numerous tools for quantifying the presence/absence of cascades from neuroimaging data, the authors may benefit from formalising this notion.

      Thank you for bringing this to our attention. We added a quantification of the cascades in Fig 2 and 3. We computed the absolute value of the mean signal across sources (following z-scoring) to obtain a cascade profile and calculated the area under the curve as quantification of the overall presence of cascades. As it can be seen in the two figures, the presence of cascades is the highest around the working point. We have also added the precise definition to the methods section, which now reads:

      “Coactivation Cascades

      The profile of cascades over time was computed, first by z-scoring each source activity, and second by averaging the absolute value of the activity across all sources. The quantification of cascades was then obtained by calculating numerically the Area Under the Curve (AUC) of the profile of cascades.”

      (4) It was difficult to tell, graphically, where the model’s operating regime lay. Visual clarity here will greatly benefit the reader.

      Thank you for pointing out this ambiguity, we have marked the working point explicitly in the Figure 3.

      Recommendations For The Authors

      Reviewer 1 (Recommendations for the authors):

      (1) In the method section, the technical details of the other four indexes should be elaborated.

      Thank you for your recommendation, we agree that the description in the submitted manuscript was too brief. We expanded the method section about the functional repertoire and the bursting potential.

      Reviewer 2 (Recommendations for the authors):

      (1) The authors could more clearly label the ”working point” of their parameter space. Perhaps a label/arrow on Figure 2c that directs the readers’ eyes towards the location in state-space that you define as the working point?

      Thank you for pointing out this ambiguity, we updated the figure 3 to mark the working point precisely.

      (2) While ’fluidity’ is quite an evocative term and does a great job of suggesting to the uninitiated reader the character of the time series in question, I wonder whether a more descriptive term might be better suited for this variable, even if as an adjunct to the term, fluidity. In the past, we (and others) have used the term dynamic functional connectivity variability (Mu¨ller et al., 2020 NeuroImage) to refer to this feature, as it links the measure directly to the technique from which it was estimated.

      Thank you for your feedback. You are correct, dynamic functional connectivity variability could have been a wording of choice for some of our results. However the term “fluidity” was chosen to convey a broader theoretical concept linked to dynamical systems but not exclusive to the brain. Here, dynamic functional connectivity variability is merely a measure of the fluidity of the system. We added the following in the method section describing the metrics:

      “[...] Fluidity is related to previously defined metrics such as functional connectivity variability [10] that relied on a non-overlapping windowing procedure. We chose the term fluidity to convey a cocept linked to dynamical systems in general and states exploration. [...]”

      (3) The term ”bursting potential” is also potentially problematic, as ”bursting” refers to a different concept at the cellular level (i.e., multiple action potentials in a short window of time) than it does in the context that the authors are presumably using it here (i.e., the capacity for the dynamics of the population to ”burst” into the fat-tail of their activity distribution). To avoid ambiguity, it could be worth considering altering this terminology, perhaps again by using a term that is descriptive of the technique used to estimate it, rather than the concept that it evokes.

      Thank you for pointing out this ambiguity in the naming of the bursting potential. We have renamed it to “Global Activation Potential (GAP)” as we believe this term is a better description of the metric. We have switched to this term across the whole manuscript.

      (4) There is a range of other modelling studies that have compared brain dynamics in the awake vs. anaesthetised patient. In my opinion, the reader would benefit from the ability to place this work into the broader context created by the literature, particularly as there are subtle (yet potentially important) differences in the models used in each case. Note - as this is a subjective opinion, I don’t view this as a crucial addition to the paper’s potential strength of evidence, though I do believe that it would have a positive effect on its potential impact.

      We thank you for the suggestion. We have modified the before-to-last paragraph of the discussion to bring more context from the literature models of anethesia and wakefulness:

      “Several studies have employed computational modeling approaches to investigate the differences in brain dynamics across states of consciousness. These studies present varying degrees of physiological detail and focus on complementary aspects of unconsciousness. They start from simple abstract models (Ising model) addressing for example the increased correlation between stuctural and functional connectivity in aneshesia [15], or oscillator-based models (Hopf model) capturing a brain state dependent response to simulated perturbation [4]. More neurobiologically realistic models (Dynamic Mean Field) have also been used to combine multimodal imaging data together with receptor density maps to address the macroscopic effects of general aneshesia and their relationship to spatially heterogeneous properties of the neuronal populations [8]. Similarly, using anatomically constrained parameters for brain regions has already been shown to increase the predictive value of brain network models [6, 18]. Furthermore, employing biophysically grounded mean-field and spiking neuron models (AdEx) allows addressing phenomena propagating in effect across multiple scales of description such as the molecular effects of anesthetics targeting specific receptor types [12]. Related work has shown that adaptation successfully reproduces dynamical regimes coherent with NREM and wakefulness [3] with corresponding realistic PCI values Goldman2021comprehensive. Here, we don’t address these biological questions but rather give a proof of concept that large-scale brain models can help understand the dynamics related to brain function. We used a model derived from QIF neurons Montbrio2015Macroscopic that lacks biological parameters such as ion concentration or synaptic adaptation. Nevertheless, we demonstrate that even the symmetry breaking caused by the connectome is sufficient for setting the global working point of the brain, which then links the brain’s capacity for generating complex behavior in the different paradigms, that is, rest and stimulation.”

      (5) I saw the label ”digital brain twin” in the abstract but then did not find a location in the main text/methods wherein this aspect of the modelling was explained.

      Thank you for pointing out this discrepancy, we have removed the term “digital brain twin” and replaced it by “whole-brain model” everywhere.

      References

      John M. Beggs and Dietmar Plenz. Neuronal Avalanches in Neocortical Circuits. The Journal of Neuroscience, 23(35):11167–11177, dec 3 2003.

      A. G. Casali, O. Gosseries, M. Rosanova, M. Boly, S. Sarasso, K. R. Casali, S. Casarotto, M.-A. Bruno, S. Laureys, G. Tononi, and M. Massimini. A Theoretically Based Index of Consciousness Independent of Sensory Processing and Behavior. Science Translational Medicine, 5(198):198ra105–198ra105, aug 14 2013.

      Anna Cattani, Andrea Galluzzi, Matteo Fecchio, Andrea Pigorini, Maurizio Mattia, and Marcello Massimini. Adaptation shapes local cortical reactivity: From bifurcation diagram and simulations to human physiological and pathological responses. eneuro, 10(7):ENEURO.0435– 22.2023, July 2023.

      Gustavo Deco, Joana Cabral, Victor M. Saenger, Melanie Boly, Enzo Tagliazucchi, Helmut Laufs, Eus Van Someren, Beatrice Jobst, Angus Stevner, and Morten L. Kringelbach. Perturbation of whole-brain dynamics in silico reveals mechanistic differences between brain states. NeuroImage, 169:46–56, April 2018.

      Rahul S. Desikan, Florent S´egonne, Bruce Fischl, Brian T. Quinn, Bradford C. Dickerson, Deborah Blacker, Randy L. Buckner, Anders M. Dale, R. Paul Maguire, Bradley T. Hyman, Marilyn S. Albert, and Ronald J. Killiany. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. NeuroImage, 31(3):968–980, 7 2006.

      Xiaolu Kong, Ru Kong, Csaba Orban, Peng Wang, Shaoshi Zhang, Kevin Anderson, Avram Holmes, John D. Murray, Gustavo Deco, Martijn van den Heuvel, and B. T. Thomas Yeo. Sensory-motor cortices shape functional connectivity dynamics in the human brain. Nature Communications, 12(1), November 2021.

      A. Lempel and J. Ziv. On the Complexity of Finite Sequences. IEEE Transactions on Information Theory, 22(1):75–81, 1 1976. event-title: IEEE Transactions on Information Theory.

      Andrea I. Luppi, Pedro A. M. Mediano, Fernando E. Rosas, Judith Allanson, John D. Pickard, Guy B. Williams, Michael M. Craig, Paola Finoia, Alexander R. D. Peattie, Peter Coppola, Adrian M. Owen, Lorina Naci, David K. Menon, Daniel Bor, and Emmanuel A. Stamatakis. Whole-brain modelling identifies distinct but convergent paths to unconsciousness in anaesthesia and disorders of consciousness. Communications Biology, 5(1), April 2022.

      Ernest Montbri´o, Diego Paz´o, and Alex Roxin. Macroscopic Description for Networks of Spiking Neurons. Physical Review X, 5(2):021028, jun 19 2015.

      Eli J. Mu¨ller, Brandon Munn, Luke J. Hearne, Jared B. Smith, Ben Fulcher, Aurina Arnatkeviˇciu¯te˙, Daniel J. Lurie, Luca Cocchi, and James M. Shine. Core and matrix thalamic sub-populations relate to spatio-temporal cortical connectivity gradients. NeuroImage, 222:117224, November 2020.

      J. Matias Palva, Alexander Zhigalov, Jonni Hirvonen, Onerva Korhonen, Klaus LinkenkaerHansen, and Satu Palva. Neuronal long-range temporal correlations and avalanche dynamics are correlated with behavioral scaling laws. Proceedings of the National Academy of Sciences, 110(9):3585–3590, feb 26 2013. publisher: Proceedings of the National Academy of Sciences.

      Maria Sacha, Federico Tesler, Rodrigo Cofre, and Alain Destexhe. A computational approach to evaluate how molecular mechanisms impact large-scale brain activity. Nature Computational Science, 5(5):405–417, May 2025.

      Simone Sarasso, Melanie Boly, Martino Napolitani, Olivia Gosseries, Vanessa Charland-Verville, Silvia Casarotto, Mario Rosanova, Adenauer Girardi Casali, Jean-Francois Brichant, Pierre Boveroux, Steffen Rex, Giulio Tononi, Steven Laureys, and Marcello Massimini. Consciousness and Complexity during Unresponsiveness Induced by Propofol, Xenon, and Ketamine. Current Biology, 25(23):3099–3105, 12 2015.

      Pierpaolo Sorrentino, Rosaria Rucco, Fabio Baselice, Rosa De Micco, Alessandro Tessitore, Arjan Hillebrand, Laura Mandolesi, Michael Breakspear, Leonardo L. Gollo, and Giuseppe Sorrentino. Flexible brain dynamics underpins complex behaviours as observed in Parkinson’s disease. Scientific Reports, 11(1):4051, feb 18 2021. number: 1 publisher: Nature Publishing Group.

      S. Stramaglia, M. Pellicoro, L. Angelini, E. Amico, H. Aerts, J. M. Cort´es, S. Laureys, and D. Marinazzo. Ising model with conserved magnetization on the human connectome: Implications on the relation structure-function in wakefulness and anesthesia. Chaos: An Interdisciplinary Journal of Nonlinear Science, 27(4), April 2017.

      Enzo Tagliazucchi, Pablo Balenzuela, Daniel Fraiman, and Dante Chialvo. Criticality in LargeScale Brain fMRI Dynamics Unveiled by a Novel Point Process Analysis. Frontiers in Physiology, 3, 2012. [Online; accessed 2022-12-23].

      David C. Van Essen, Stephen M. Smith, Deanna M. Barch, Timothy E.J. Behrens, Essa Yacoub, and Kamil Ugurbil. The WU-Minn Human Connectome Project: An Overview. NeuroImage, 80:62–79, oct 15 2013. PMID: 23684880 PMCID: PMC3724347.

      Peng Wang, Ru Kong, Xiaolu Kong, Rapha¨el Li´egeois, Csaba Orban, Gustavo Deco, Martijn P. van den Heuvel, and B.T. Thomas Yeo. Inversion of a large-scale circuit model reveals a cortical hierarchy in the dynamic resting human brain. Science Advances, 5(1), January 2019.

      Farnaz Zamani Esfahlani, Youngheun Jo, Joshua Faskowitz, Lisa Byrge, Daniel P. Kennedy, Olaf Sporns, and Richard F. Betzel. High-amplitude cofluctuations in cortical activity drive functional connectivity. Proceedings of the National Academy of Sciences of the United States of America, 117(45):28393–28401, November 2020.

    1. eLife Assessment

      This useful study examines the contribution of synaptotagmin 1 and synaptotagmin 7 to metabolite antigen presentation to mucosal-associated invariant T (MAIT) cells; it begins to address a critical gap in our understanding of the antigen presentation mechanisms to these cells. Strengths of the study include the use of Mtb to study the dynamics of antigen presentation to MAIT cells instead of a synthetic antigen. However, the strength of the evidence to support the conclusion is currently incomplete. The conclusions could be enhanced by additional dissection of some of the cell biological events that lead to antigen presentation by MR1.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript "Synaptotagmin 1 and Synaptotagmin 7 promote MR1-mediated presentation of Mycobacterium tuberculosis antigens", authored by Kim et al., showed that the calcium-sensing trafficking proteins Synaptotagmin (Syt) 1 and Syt7 specifically promote (are critical for) MAIT cell activation in response to Mtb-infected bronchial epithelial cell line BEAS-2B (Fig. 1) and monocyte-like cell line THP-1 (Figure 3) . This work also showed co-localization of Syt1 and Syt7 with Rab7a and Lamp1, but not with Rab5a (Figure 5). Loss of Syt1 and Syt7 resulted in a larger area of MR1 vesicles (Figure 6f) and an increased number of MR1 vesicles in close proximity to an Auxotrophic Mtb-containing vacuoles during infection (Figure 7ab). Moreover, flow organellometry was used to separate phagosomes from other subcellular fractions and identify enrichment of auxotrophic Mtb-containing vacuoles in fractions 42-50, which were enriched with Lamp1+ vacuoles or phagosomes (Figures 7e-f).

      Strengths:

      This work nicely associated Syt1 and Syt7 with late endocytic compartments and Mtb+ vacuoles. Gene editing of Syt1 and Syt7 loci of bronchial epithelial and monocyte-like cells supported Syt1 and Syt7 facilitated maintaining a normal level of antigen presentation for MAIT cell activation in Mtb infection. Imaging analyses further supported that Syt1 and Syt7 mutants enhanced the overlaps of MR1 with Mtb fluorescence, and the MR1 proximity with Mtb-infected vacuoles, suggesting that Syt1 and Syt7 proteins help antigen presentation in Mtb infection for MAIT activation.

      Weaknesses:

      Additional data are needed to support the conclusion, "identify a novel pathway in which Syt1 and Syt7 facilitate the translocation of MR1 from Mtb-containing vacuoles" and some pieces of other evidence may be seen by some to contradict this conclusion.

    3. Reviewer #2 (Public review):

      Summary:

      The study demonstrates that calcium-sensing trafficking proteins Synaptotagmin (Syt) 1 and Syt7 are involved in the efficient presentation of mycobacterial antigens by MR1 during M. tuberculosis infection.

      This is achieved by creating antigen-presenting cells in which the Syt1 and Syt7 genes are knocked out. These mutated cell lines show significantly reduced stimulation of MAIT cells, while their stimulation of HLA class I-restricted T cells remains unchanged. Syt1 and Syt7 co-localize in a late endo-lysosomal compartment where MR1 molecules are also located, near M. tuberculosis-containing vacuoles.

      Strengths:

      This work uncovers a new aspect of how mycobacterial antigens generated during infection are presented. The finding that Syt1 and Syt7 are relevant for final MR1 surface expression and presentation to MR1-restricted T cells is novel and adds valuable information to this process.

      The experiments include all necessary controls and convincingly validate the role of Syt1 and Syt7.

      Another key point is that these proteins are essential during infection, but they are not significant when an exogenous synthetic antigen is used in the experiments. This emphasizes the importance of studying infection as a physiological context for antigen presentation to MAIT cells.

      An additional relevant aspect is that the study reveals the existence of different MR1 antigen presentation pathways, which differ from the endoplasmic reticulum or endosomal pathways that are typical for MHC-presented peptides.

      Weaknesses:

      The reduced MAIT cell response observed with Syt1 and Syt7-deficient cell lines is statistically significant but not completely abolished. This may suggest that only some MR1-loaded molecules depend on these two Syt proteins. Further research is needed to determine whether, during persistent M. tuberculosis infection, enough MR1-loaded molecules are produced and transported to the plasma membrane to sufficiently stimulate MAIT cells.

      The study proposes that other Syt proteins might also play a role, as outlined by the authors. However, exploring potential redundant mechanisms that facilitate MR1 loading with antigens remains a challenging task.

    4. Reviewer #3 (Public review):

      Summary:

      In the submitted manuscript, the authors investigate the role of Synaptotagmins (Syt1) and (Syt7) in MR1 presentation of MtB.

      Strengths:

      In the first series of experiments, the authors determined that knocking down Syt1 and Sy7 in antigen-presenting cells decreases IFN-γ production following cellular infection with Mtb. These experiments are well performed and controlled.

      Weaknesses:

      Next, they aim to mechanistically investigate how Syt1 and Syt7 affect MtB presentation. In particular, they focus on MR1, a non-classical MHC-I molecule known to present endogenous and exogenous metabolites, including MtB metabolites.

      Results from these next series of experiments are less clear. Firstly, they show that knocking down Syt1 and Sy7 does not change MtB phagocytosis as well as MR1 ER-plasma membrane translocation. Based on this, they suggest that Syt1 and Syt7 may affect MR1 trafficking in endosomal compartments. However, neither subcellular compartment analysis nor flow organelleometry clearly establishes the role of Syt1 and Syt7 in MtB trafficking.

      Altogether, the notion that Synaptotagmins facilitate MR1 interaction with Mtb-containing compartments and its vesicular transport was already known. As such, the manuscript should add additional insight on where/how the interaction occurs. The reviewer is left with the notion that Syt1 and Sy7 may affect MR1 presentation, facilitating the trafficking of MR1 vesicles from endosomal compartments to either the cell surface or other endosomal compartments. The analysis is observational and additional data or discussion could address what the insight gained beyond what is already known from the literature.

    1. eLife Assessment

      This important study shows how hunger alters avoidance of harmful heat in C. elegans by reconfiguring the activity of key sensory neurons. The evidence is convincing, with well-designed behavioural, genetic, and imaging experiments that support the main conclusions. The work will be of interest to neuroscientists studying how internal states shape sensory processing and behaviour across species.

    2. Reviewer #1 (Public review):

      This study by Thapliyal and Glauser investigates the neural mechanisms that contribute to the progressive suppression of thermonociceptive behavior that is induced under conditions of starvation. Several previous studies have demonstrated that when starved, C. elegans alters its preferences for a variety of sensory cues, including CO2, temperature, and odors, in order to prioritize food seeking over other behavioral drives. The varied mechanisms that underlie the ability of internal states to alter behavioral responses are not fully understood, however there is growing evidence for a role by neuropeptidergic signaling as well as capacity for functionally distinct microcircuits, formed by distinct internal states, to trigger similar behavior outcomes.

      Within the physiological range of C. elegans (~15-25C), starvation triggers a profound reduction in temperature-driven thermotaxis behaviors. This reduction involves the recruitment of the amphid sensory neuron pair AWC. The AWC neurons primarily act to sense appetitive chemosensory cues, however under starvation conditions begin to display temperature responses that previous studies have linked to the reduction in thermotaxis navigation. Here, Thapliyal and Glauser investigate the impact of starvation on thermonociceptive responses, innate escape behaviors that are triggered by exposure to noxious temperatures above 26C or rapid thermal stimuli below 26C. They compare the strength of thermonociceptive behaviors, specifically heat-triggered reversals, in worms experiencing either early food deprivation (1 hour off food) or prolonged starvation (6 hours off food). Their experiments demonstrate a progressive loss of heat-triggered reversals that is mediated by AWC and ASI neurons, as well as both glutamateric and neuropeptidergic signaling.

      At the level of neural activity, this study reports that the transition from early food deprivation to prolonged starvation reconfigures the temperature-driven activity of AWC neurons from largely deterministic to stochastic. This finding is interesting in light of previous work that reported the opposite transition (from stochastic to deterministic) in temperature-driven AWC responses when comparing well-fed worms to those kept from food for 3 hours. This study also identifies neural and genetic mechanisms that contribute to differences in thermonociceptive responses at +1 versus +6 hours starvation; confusingly, these mechanisms are partially distinct from those that contribute to differences in negative thermotaxis behaviors in well-fed and +3 hours starvation worms (Takeishi et al 2020). A limitation of this manuscript is that these differences are not particularly acknowledged or addressed, other than the hypothesis that independent mechanisms underlie negative thermotaxis versus thermonociceptive stimuli. However, this suggestion is not experimentally verified. Multiple additional aspects of this study make the results difficult to synthesize with existing knowledge, including 1) differences in - and insufficient discussion of - the magnitude and kinetics of thermal stimuli; 2) this study's use of "heating power" rather than temperature values when presenting behavioral results; 3) the use of +1 hours starvation as a baseline instead of well-fed worms. Indeed, this last point reflects a noticeable experimental result that differs from previous studies, namely that at room temperature the basal movements of well-fed and starved worms are not different. Such a surprisingly result warrants further quantification of worm mobility in general and could have prompted a set of experiments directly testing previously published thermal conditions, to demonstrate that the new effects reported arise specifically from the use of thermonociceptive stimuli, as hypothesized. Finally, a previous report (Yeon et al 2021) demonstrated differences in the impact of chronic versus acute neural silencing on starvation-dependent plasticity in the context of negative thermotaxis. We therefore wonder whether similar developmental compensation impacts the neural circuits that contribute to starvation-dependent plasticity in the thermonociceptive responses.

      A weakness of this manuscript is that the introduction is insufficiently scholarly in terms of citations and the description of current knowledge surrounding the impact of internal state on sensory behavior, particularly given previous work on the impact of feeding state on thermosensory behavioral plasticity (Takeshi et al 2020, Yeon et al 2021) and chemosensory valence (Banerjee et al 2023, Rengarajan et al 2019, etc). Similarly, the authors commanding knowledge of the distinction between thermotaxis navigation (especially negative thermotaxis) and thermonociceptive behaviors could be communicated in more depth and clarity to the readers, in order to contextualize this study's new findings within the previous literature.

      Nevertheless, this study represents a solid addition to the growing evidence that C. elegans sensory behaviors are strongly impacted by internal states, and that neuropeptigergic signaling plays a key role in mediating behavioral plasticity. To that end, the authors have provided solid evidence of their claims.

    3. Reviewer #2 (Public review):

      In this work Thapliyal and Glauser tried to provide mechanistic understanding by which animals modulate their neural circuit responses to control nociceptive behavior on the basis of the dynamic internal feeding state. It is an important study that adds to growing body of evidences coming from multiple model systems. They have used elegant genetics, behavioral and Ca-imaging experiments to demonstrate how the auxiliary thermosensory neuron pair, AWC and one of the internal state sensing interneuron pair, ASI, respond to dynamic internal starvation-state to modulate behavioral response to noxious heat. Interestingly, these neuron pairs use distinct molecular mechanisms along with some other unidentified neurons to suppress heat-indued reversal response under short-term and prolonged starvations. The experiments are well performed that support most of the claims and provide important framework for future studies.

      I have some queries that if answered, will certainly enhance the study,

      (1) The results suggests that ASI is one of the primary drivers for the starvation-evoked behavioral plasticity, which regulates AWC activity under prolonged starvation. It raises many important questions including, a) how starvation modulates ASI response to heat? b) under prolonged starvation, whether ASI also promotes other, non-AWC, glutamatergic inhibitory neurons to suppress heat-induced reversal and how?

      (2) How does ASI regulate AWC activity? In the proposed model (figure 8) authors suggested an independent, unknown signal, other than INS-32 and NLP-18, from ASI to regulate AWC activity. However, from the results the existence of another signal is not very clear.

      (3) Previously, Takeishi et. al., showed that ins-1 dynamically modulates AWC-AIA mediated thermotaxis behavior based on the feeding state of the animal. It raises questions whether ins-1 also contributes to noxious heat-induced reversal behavior.

      (4) Experiments with AWC fate conversion mutants (nsy-1 and nsy-7) were very good ideas, however the results obtained were confusing. flp-6 mutant data suggests AWCoff would be essential for heat induced reversal, especially at the low intensity stimulus level. However, nsy-1 mutant forming two AWCon neurons showed complete rescue at the low heat level, which is quite opposite. Similarly, although less prominent, eat-4 rescue experiments suggested both nsy-1 and nsy-7 should behave normally at high heat condition, which was not the result observed.

    4. Reviewer #3 (Public review):

      Summary:

      Thapliyal and Glauser show that hunger alters how C. elegans respond to noxious thermal stimuli. Using targeted neural ablation, mutant analysis, and live-cell functional imaging the authors demonstrate that hunger changes the properties of AWC sensory neurons, which sense noxious heat. The authors further show that effects of hunger on nociception require ASI neurons, which are known to respond to hunger and mediate effects of food deprivation on behavior. Finally, the study uses mutant analysis to implicate glutamate and specific neuropeptides in thermal nociception and in modulation of nociceptors by hunger-responsive neurons.

      Strengths:

      The study clearly shows a strong effect of hunger on nociception and documents a striking effect of hunger on the intrinsic properties of AWC sensory neurons, which respond to noxious heat. The study also clearly and compellingly demonstrates that ablation of hunger-responsive ASI neurons blocks effects of hunger on nociceptive AWCs. These data, which constitute the kernel of the manuscript, are striking and exciting.

      Weaknesses:

      The study has some weaknesses that the authors should address.

      (1) Ablation of AWC neurons alters the basal sensitivity to noxious heat stimuli. This should be clearly noted in the description of the result and warrants some discussion.

      (2) Throughout the study it seems that data are replotted in multiple figure panels. The authors should clearly indicate in figure legends when this occurs. Also, the authors should ensure that statistical tests requiring multiple comparisons are correctly implemented and reflect the number of times experimental data are compared to a single set of control data.

      (3) How ASIs modulate AWCs remains unclear. The authors find that loss of INS-6, an insulin-like peptide provided by ASIs, partially recapitulates the effect of ASI ablation. This is observation is not further developed and instead the authors characterize other secreted factors that seem to mediate sensitization of animals to noxious heat stimuli. While it is interesting that there are multiple opposing inputs into the nociceptor circuit, the essential connection between ASIs and AWCs that underlies the foundational observations in figures 1 and 2 is not sufficiently characterized.

      (4) The assertion that 'starvation reshapes AWC responses from deterministic to stochastic' is not clearly supported by the data. AWC neurons seem capable of showing different responses to thermal stimuli, and the probabilities associated with these responses change after fasting. The different kinds of responses are seen under basal and fasted conditions.

    1. eLife Assessment

      This study presents a valuable quantitative framework for analyzing transcription dynamics data for enhancers and genes expressed in the early Drosophila embryo. By analyzing existing data across both synthetic reporters and an endogenous gene (eve), this work provides evidence that spatial gene expression patterns within the embryo are largely determined by "activity time" - the time during which a gene is bursting. The methods and evidence are solid and should be of broad interest to researchers in developmental biology and quantitative gene regulation, but the study would be significantly enhanced by clarifying the novelty of the findings relative to prior work and presenting a rigorous benchmarking of their algorithm against previously used algorithms.

    2. Reviewer #1 (Public review):

      Summary:

      In this article, the authors develop a method to re-analyze published data measuring the transcription dynamics of developmental genes within Drosophila embryos. Using a simple framework, they identify periods of transcriptional activity from traces of MS2 signal and analyze several parameters of these traces. In the five data sets they analyzed, the authors find that each transcriptional "burst" has a largely invariant duration, both across spatial positions in the embryo and across different enhancers and genes, while the time between transcriptional bursts varies more. However, they find that the best predictor of the mean transcription levels at different spatial positions in the embryo is the "activity time" -- the total time from the first to the last transcriptional burst in the observed cell cycle.

      Strengths:

      (1) The algorithm for analyzing the MS2 transcriptional traces is clearly described and appropriate for the data.

      (2) The analysis of the four transcriptional parameters -- the transcriptional burst duration, the time between bursts, the activity time, and the polymerase loading rate is clearly done and logically explained, allowing the reader to observe the different distributions of these values and the relationship between each of these parameters and the overall expression output in each cell. The authors make a convincing case that the activity time is the best predictor of a cell's expression output.

      (3) The figures are clearly presented and easy to follow.

      Weaknesses:

      (1) The strength of the relationship between the different transcriptional parameters and the mean expression output is displayed visually in Figures 5 and 7, but is not formally quantified. Given that the tau_off times seem more correlated to mean activity for some enhancers (e.g., rho) than others (e.g., sna SE), the quantification might be useful.

      (2) There are some mechanistic details that are not discussed in depth. For example, the authors observe that the accumulation and degradation of the MS2 signal have similar slopes. However, given that the accumulation represents the transcription of MS2 loops, while the degradation represents diffusion of nascent transcripts away from the site of transcription, there is no mechanistic expectation for this. The degradation of signal seems likely to be a property of the mRNA itself, which shouldn't vary between cells or enhancer reporters, but the accumulation rate may be cell- or enhancer-specific. Similarly, the activity time depends both on the time of transcription onset and the time of transcription cessation. These two processes may be controlled by different transcription factor properties or levels and may be interesting to disentangle.

      (3) There are previous analyses of the eve stripe dynamics, which the authors cite, but do not compare the results of their work to the previous work in depth.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, Nieto et al. investigate how spatial gene expression patterns in the early Drosophila embryo are regulated at the level of transcriptional bursting. Using live-cell MS2 imaging data of four reporter constructs and the endogenous eve gene, the authors extract temporal dynamics of nascent transcription at single-cell resolution. They implement a novel, simplified algorithm to infer promoter ON/OFF states based on fluorescence slope dynamics and use this to quantify burst duration (Ton), inter-burst duration (Toff), and total activity time across space.

      The key finding is that while Ton and Toff remain relatively constant across space, the activity time-the window between first and last burst-is spatially modulated and best explains mean expression differences across the embryo. This uncovers a general strategy where early embryonic patterning genes modulate the duration of their transcriptionally permissive states, rather than the frequency or strength of bursting itself. The manuscript also shows that different enhancers of the same gene (e.g., sna proximal vs. shadow) can differentially modulate Toff and activity time, providing mechanistic insight into enhancer function.

      Strengths:

      The manuscript introduces activity time as a major, previously underappreciated determinant of spatial gene expression, distinct from Ton and Toff, providing an intuitive mechanistic link between temporal bursting and spatial patterning.

      The authors develop a tractable inference algorithm based on linear accumulation/decay rates of MS2 fluorescence, allowing efficient burst state segmentation across thousands of trajectories.

      Analysis across multiple biological replicates and different genes/enhancers lends confidence to the reproducibility and generalizability of the findings.

      By analyzing both synthetic reporter constructs and an endogenous gene (eve), the work provides a coherent view of how enhancer architecture and spatial regulation are intertwined with transcriptional kinetics.

      The supplementary information extends the biological findings with a gene expression noise model that accounts for non-exponential dwell times and illustrates how low-variability Ton buffers stochasticity in transcript levels.

      Weaknesses:

      The manuscript does not clearly delineate how this analysis extends beyond the prior landmark study (citation #40: Fukaya et al., 2016). While the current manuscript offers new modeling and statistics, more explicit clarification of what is novel in terms of biological conclusions and methodological advancement would help position the work.

      While the methods are explained in detail in the Supplementary Information, the manuscript would benefit from including a diagrammatic model and explicitly clarifying whether the model is descriptive or predictive in scope.

      The interpretation that fluorescence decay reflects RNA degradation could be confounded by polymerase runoff or transcript diffusion from the transcription site. These potential limitations are not thoroughly discussed.

      The so-called loading rate is used as an empirical parameter in fitting fluorescence traces, but is not convincingly linked to distinct biological processes. The manuscript would benefit from a more precise definition or reframing of this term.

      Impact and Utility:

      The study provides a general and scalable framework for dissecting transcriptional kinetics in developing embryos, with implications for understanding enhancer logic and developmental robustness. The algorithm is suitable for adaptation to other live-imaging datasets and could be useful across systems where temporal transcriptional variability is being quantified. By highlighting activity time as a key regulatory axis, the work shifts attention to transcriptionally permissive windows as a primary developmental control layer.

      This work will be of interest to: developmental biologists investigating spatial gene expression, researchers studying transcriptional regulation and noise, quantitative biologists developing models for transcriptional dynamics, and imaging and computational biologists working with live single-cell data.

    4. Reviewer #3 (Public review):

      Summary:

      In this paper, the authors developed a simple algorithm to analyse live imaging transcription data (MS2) and infer various kinetic parameters. They then applied it to analyse data from previous publications on Drosophila that measured the dynamics of reporter genes driven by various enhancers alone (sna, Kr, rho), or in an endogenous context (eve).

      The authors find that the main correlate with mean gene expression levels is the activity time, that is, the time during which the gene is bursting. They also find a correlation with the variation of the off time.

      Strengths:

      (1) The findings are very clearly presented.

      (2) The simplicity of the algorithm is nice, and the comparative analysis among the various enhancers can be helpful for the field.

      Weaknesses:

      (1) The algorithm is not benchmarked against previously used algorithms in the field to infer ON and OFF times, for example, those based on Hidden Markov models. A comparison would help strengthen the support for this algorithm (if it really works well) or show at which point one must be careful when interpreting this data.

      (2) More broadly, the novelty of the findings and how those fit within the knowledge of the field is not super clear. A better account of previous findings that have already quantified ON, OFF times and so on, and how the current findings fit within those, would help better appreciate the significance of the work.

    5. Author response:

      Reviewer #1 (Public review):

      (1) The strength of the relationship between the different transcriptional parameters and the mean expression output is displayed visually in Figures 5 and 7, but is not formally quantified. Given that the tau_off times seem more correlated to mean activity for some enhancers (e.g., rho) than others (e.g., sna SE), the quantification might be useful.

      We re-plot Figure 5 and Figure 7 to present the correlation between the studied burst parameters. As the reviewer suggested, after quantifying the correlation we can better study the correlation between the cells averaged tau-off and the cell-averaged fluorescence signal in some of the selected enhancers. As a result of these findings we decide to change our message and instead of claiming that the burst statistics are homogeneous over the embryo domain, to claim that these statistics have weak but significant correlations with the cell-averaged mean gene fluorescence.  

      (2) There are some mechanistic details that are not discussed in depth. For example, the authors observe that the accumulation and degradation of the MS2 signal have similar slopes. However, given that the accumulation represents the transcription of MS2 loops, while the degradation represents diffusion of nascent transcripts away from the site of transcription, there is no mechanistic expectation for this. The degradation of signal seems likely to be a property of the mRNA itself, which shouldn't vary between cells or enhancer reporters, but the accumulation rate may be cell- or enhancer-specific. Similarly, the activity time depends both on the time of transcription onset and the time of transcription cessation. These two processes may be controlled by different transcription factor properties or levels and may be interesting to disentangle.

      The accumulation slope represents the rate of nascent transcript production, which depends on transcription initiation frequency and RNA polymerase elongation rate. While transcription initiation rates can vary between enhancers, our results show that the loading rates are relatively comparable across different enhancer sequences (Figure 5D). Instead, the primary difference observed was in activity time and burst frequency, consistent with previous findings that enhancers predominantly modulate burst frequency (Fukaya et al., 2016). The degradation slope represents the diffusion of completed transcripts away from the transcription site, which should be an intrinsic property of the mRNA molecule and therefore independent of the regulatory sequences driving transcription.

      (3) There are previous analyses of the eve stripe dynamics, which the authors cite, but do not compare the results of their work to the previous work in depth.

      The goal of this manuscript is to compare transcriptional bursting properties across different enhancers, rather than to provide an in-depth analysis of eve stripe dynamics specifically. We analyzed four transgenic constructs with different enhancers alongside an endogenous eve construct, focusing on comparative bursting parameters rather than detailed eve expression patterns. Additionally, the previously published eve stripe dynamics data came from BAC constructs, whereas our data comes from the endogenous eve locus. This methodological difference makes direct comparison of stripe dynamics less straightforward and less relevant to our central research question about enhancer-driven bursting variability.

      Reviewer #2 (Public review):

      (1) The manuscript does not clearly delineate how this analysis extends beyond the prior landmark study (citation #40: Fukaya et al., 2016). While the current manuscript offers new modeling and statistics, more explicit clarification of what is novel in terms of biological conclusions and methodological advancement would help position the work.

      The prior study (Fukaya et al., 2016) characterized transcriptional bursting qualitatively, focusing on average burst properties per nucleus without systematic mathematical modeling or statistical analysis of burst-to-burst variability. While they demonstrated that enhancer strength correlates with burst frequency, no quantitative framework was developed to dissect the molecular mechanisms underlying these differences or to connect burst dynamics to spatial gene expression patterns.

      (1) We developed an explicit mathematical model with rigorous inference algorithms to quantify transcriptional states from fluorescence trajectories; (2) We performed comprehensive statistical analysis of burst timing distributions, revealing that inter-burst intervals follow exponential distributions while burst durations are hypo-exponentially distributed; (3) Most importantly, we discovered that burst kinetics (τON, τOFF) remain remarkably consistent across different genes and spatial locations, while spatial expression gradients arise primarily through modulation of activity time - the temporal window during which bursting occurs. This mechanistic insight reveals that enhancers regulate spatial patterning not by changing intrinsic burst properties, but by controlling the duration of transcriptionally permissive periods.

      (2) While the methods are explained in detail in the Supplementary Information, the manuscript would benefit from including a diagrammatic model and explicitly clarifying whether the model is descriptive or predictive in scope.

      We plan to prepare the diagrammatic model in the formal response. 

      (3) The interpretation that fluorescence decay reflects RNA degradation could be confounded by polymerase runoff or transcript diffusion from the transcription site. These potential limitations are not thoroughly discussed. (Write few lines in the discussion)

      This concern, related to the interpretation of the predictive model will be addressed in a future work. The decay in the fluorescence signal can be biologically related to the transcription termination, polymerase detachment, and diffusion. A key limitation of the approach is that the model is phenomenological and does not these capture processes that can be addressed with a more mechanistic model.

      (4) The so-called loading rate is used as an empirical parameter in fitting fluorescence traces, but is not convincingly linked to distinct biological processes. The manuscript would benefit from a more precise definition or reframing of this term.

      We modify the language of our definition of loading rate as follows: Loading rate is defined as the rate of increase of fluorescence signal following promoter activation. This quantity is a proxy measurement for the rate of RNA Polymerase II transcription initiation.” The full transcription process has multiple mechanisms including chromatin dynamics, 3D enhancer-promoter interactions, transcription factor binding, mRNA polymerase pausing, and interactions between developmental promoter motifs and associated proteins. We did not have access to specific measurements of these mechanisms and therefore cannot provide a solid biological meaning of the model behind the inference algorithm. However, the fact that we have reproducible results in biological replicas can support the robustness of our method at predicting the promoter state in the studied datasets. In the formal response we will compare the performance of our method with other available ones.

      Reviewer #3 (Public review):

      (1)The algorithm is not benchmarked against previously used algorithms in the field to infer ON and OFF times, for example, those based on Hidden Markov models. A comparison would help strengthen the support for this algorithm (if it really works well) or show at which point one must be careful when interpreting this data.

      We are implementing a benchmarking protocol to compare our results with the proposed and already published models. We expect to present this comparison in the formal response.

      (2) More broadly, the novelty of the findings and how those fit within the knowledge of the field is not super clear. A better account of previous findings that have already quantified ON, OFF times and so on, and how the current findings fit within those, would help better appreciate the significance of the work.

      To have a better clarity of the new findings we modified the title from “Regulation of Transcriptional Bursting and Spatial Patterning in Early Drosophila Embryo Development” to “Temporal Duration of Gene Activity is the main Regulator of Spatial Expression Patterns in Early Drosophila Embryos”.

      In short, (1) We developed an explicit mathematical model with rigorous inference algorithms to quantify transcriptional states from fluorescence trajectories; (2) We performed comprehensive statistical analysis of burst timing distributions, revealing that inter-burst intervals follow exponential distributions while burst durations are hypo-exponentially distributed; (3) Most importantly, we discovered that burst kinetics (τON, τOFF) remain remarkably consistent across different genes and spatial locations, while spatial expression gradients arise primarily through modulation of activity time - the temporal window during which bursting occurs. This mechanistic insight reveals that enhancers regulate spatial patterning not by changing intrinsic burst properties, but by controlling the duration of transcriptionally permissive periods.

    1. eLife Assessment

      There is a growing interest in understanding the individuality of animal behaviours. In this important article, the authors build and use an impressive array of high throughput phenotyping paradigms to examine the 'stability' (consistency) of behavioural characteristics in a range of contexts and over time. The results show that certain behaviours are individualistic and persist robustly across external stimuli while others are less robust to these changing parameters. The data supporting their findings is extensive and convincing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths:

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings, and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about inter-individual variability, and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of inter-individual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.

      Comments on revisions:

      While the incorporation of a hierarchical mixed model (HMM) appears to represent an improvement over their prior single-parameter correlation approach, it's not clear to me that this is a multivariate analysis. They write that "For each trait, we fitted a hierarchical linear mixed-effects model in Matlab (using the fit lme function) with environmental context as a fixed effect and fly identity (ID) as a random intercept... We computed the intraclass correlation coefficient (ICC) from each model as the between-fly variance divided by total variance. ICC, therefore, quantified repeatability across environmental contexts."

      Does this indicate that HMM was used in a univariate approach? Can an analysis of only five metrics of several dozen total metrics be characterized as 'holistic'?

      Within Figure 10a, some of the metrics show high ICC scores, but others do not. This suggests that the authors are overstating the overall persistence and/or consistency of behavioral individuality. It is clear from Figure S8 that a large number of metrics were calculated for each fly, but it remains unclear, at least to me, why the five metrics in Figure 10a are justified for selection. One is left wondering how rare or common is the 0.6 repeatability of % time walked among all the other behavioral metrics. It appears that a holistic analysis of this large data set remains impossible.

      The authors write: "...fly individuality persists across different contexts, and individual differences shape behavior across variable environments, thereby making the underlying developmental and functional mechanisms amenable to genetic dissection." However, presumably the various behavioral features (and their variability) are governed by different brain regions, so some metrics (high ICC) would be amenable to the genetic dissection of individuality/variability, while others (low ICC) would not. It would be useful to know which are which, to define which behavioral domains express individuality, and could be targets for genetic analysis, and which do not. At the very least, the Abstract might like to acknowledge that inter-context consistency is not a major property of all or most behavioral metrics.

      I hold that inter-trial repeatability should rightly be called "stability" while inter-context repeatability should be called "consistency". In the current manuscript, "consistency" is used throughout the manuscript, except for the new edits, which use "stability". If the authors are going to use both terms, it would be preferable if they could explain precisely how they define and use these terms.

    3. Reviewer #2 (Public review):

      Summary:

      The authors repeated measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations:

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.

      I think the authors are missing an opportunity to use much more robust statistical methods. It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not change, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? what exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):  

      Summary:  

      The authors state the study's goal clearly: "The goal of our study was to understand to what extent animal individuality is influenced by situational changes in the environment, i.e., how much of an animal's individuality remains after one or more environmental features change." They use visually guided behavioral features to examine the extent of correlation over time and in a variety of contexts. They develop new behavioral instrumentation and software to measure behavior in Buridan's paradigm (and variations thereof), the Y-maze, and a flight simulator. Using these assays, they examine the correlations between conditions for a panel of locomotion parameters. They propose that inter-assay correlations will determine the persistence of locomotion individuality.

      Strengths:  

      The OED defines individuality as "the sum of the attributes which distinguish a person or thing from others of the same kind," a definition mirrored by other dictionaries and the scientific literature on the topic. The concept of behavioral individuality can be characterized as: (1) a large set of behavioral attributes, (2) with inter-individual variability, that are (3) stable over time. A previous study examined walking parameters in Buridan's paradigm, finding that several parameters were variable between individuals, and that these showed stability over separate days and up to 4 weeks (DOI: 10.1126/science.aaw718). The present study replicates some of those findings and extends the experiments from temporal stability to examining correlation of locomotion features between different contexts.  

      The major strength of the study is using a range of different behavioral assays to examine the correlations of several different behavior parameters. It shows clearly that the inter-individual variability of some parameters is at least partially preserved between some contexts, and not preserved between others. The development of high-throughput behavior assays and sharing the information on how to make the assays is a commendable contribution.

      Weaknesses:  

      The definition of individuality considers a comprehensive or large set of attributes, but the authors consider only a handful. In Supplemental Fig. S8, the authors show a large correlation matrix of many behavioral parameters, but these are illegible and are only mentioned briefly in Results. Why were five or so parameters selected from the full set? How were these selected? Do the correlation trends hold true across all parameters? For assays in which only a subset of parameters can be directly compared, were all of these included in the analysis, or only a subset?  

      The correlation analysis is used to establish stability between assays. For temporal re-testing, "stability" is certainly the appropriate word, but between contexts it implies that there could be 'instability'. Rather, instead of the 'instability' of a single brain process, a different behavior in a different context could arise from engaging largely (or entirely?) distinct context-dependent internal processes, and have nothing to do with process stability per se. For inter-context similarities, perhaps a better word would be "consistency".  

      The parameters are considered one-by-one, not in aggregate. This focuses on the stability/consistency of the variability of a single parameter at a time, rather than holistic individuality. It would appear that an appropriate measure of individuality stability (or individuality consistency) that accounts for the high-dimensional nature of individuality would somehow summarize correlations across all parameters. Why was a multivariate approach (e.g. multiple regression/correlation) not used? Treating the data with a multivariate or averaged approach would allow the authors to directly address 'individuality stability', along with the analyses of single-parameter variability stability.

      The correlation coefficients are sometimes quite low, though highly significant, and are deemed to indicate stability. For example, in Figure 4C top left, the % of time walked at 23{degree sign}C and 32{degree sign}C are correlated by 0.263, which corresponds to an R2 of 0.069 i.e. just 7% of the 32{degree sign}C variance is predictable by the 23{degree sign}C variance. Is it fair to say that 7% determination indicates parameter stability? Another example: "Vector strength was the most correlated attention parameter... correlations ranged... to -0.197," which implies that 96% (1 - R2) of Y-maze variance is not predicted by Buridan variance. At what level does an r value not represent stability?

      The authors describe a dissociation between inter-group differences and inter-individual variation stability, i.e. sometimes large mean differences between contexts, but significant correlation between individual test and retest data. Given that correlation is sensitive to slope, this might be expected to underestimate the variability stability (or consistency). Is there a way to adjust for the group differences before examining correlation? For example, would it be possible to transform the values to in-group ranks prior to correlation analysis?

      What is gained by classifying the five parameters into exploration, attention, and anxiety? To what extent have these classifications been validated, both in general, and with regard to these specific parameters? Is increased walking speed at higher temperature necessarily due to increased 'explorative' nature, or could it be attributed to increased metabolism, dehydration stress, or a heat-pain response? To what extent are these categories subjective?

      The legends are quite brief and do not link to descriptions of specific experiments. For example, Figure 4a depicts a graphical overview of the procedure, but I could not find a detailed description of this experiment's protocol.

      Using the current single-correlation analysis approach, the aims would benefit from re-wording to appropriately address single-parameter variability stability/consistency (as distinct from holistic individuality). Alternatively, the analysis could be adjusted to address the multivariate nature of individuality, so that the claims and the analysis are in concordance with each other.

      The study presents a bounty of new technology to study visually guided behaviors. The Github link to the software was not available. To verify successful transfer or open-hardware and open-software, a report would demonstrate transfer by collaboration with one or more other laboratories, which the present manuscript does not appear to do. Nevertheless, making the technology available to readers is commendable.

      The study discusses a number of interesting, stimulating ideas about interindividual variability and presents intriguing data that speaks to those ideas, albeit with the issues outlined above.

      While the current work does not present any mechanistic analysis of interindividual variability, the implementation of high-throughput assays sets up the field to more systematically investigate fly visual behaviors, their variability, and their underlying mechanisms.  

      Comments on revisions:  

      I want to express my appreciation for the authors' responsiveness to the reviewer feedback. They appear to have addressed my previous concerns through various modifications including GLM analysis, however, some areas still require clarification for the benefit of an audience that includes geneticists.  

      (1) GLM Analysis Explanation (Figure 9)  

      While the authors state that their new GLM results support their original conclusions, the explanation of these results in the text is insufficient. Specifically:

      The interpretation of coefficients and their statistical significance needs more detailed explanation. The audience includes geneticists and other nonstatistical people, so the GLM should be explained in terms of the criteria or quantities used to assess how well the results conform with the hypothesis, and to what extent they diverge.

      The criteria used to judge how well the GLM results support their hypothesis are not clearly stated.

      The relationship between the GLM findings and their original correlationbased conclusions needs better integration and connection, leading the reader through your reasoning.

      We thank the reviewer for highlighting this important point. We have revised the Results section in the reviseed manuscript to include a more detailed explanation of the GLM analysis. Specifically, we now clarify the interpretation of the model coefficients, including the direction and statistical significance, in relation to the hypothesized effects. We also outline the criteria we used to assess how well the GLM supports our original correlation-based conclusions—namely, whether the sign and significance of the coefficients align with the expected relationships derived from our prior analysis. Finally, we explicitly describe how the GLM results confirm or extend the patterns observed in the correlation-based analysis, to guide readers through our reasoning and the integration of both approaches.

      (2) Documentation of Changes  

      One struggle with the revised manuscript is that no "tracked changes" version was included, so it is hard to know exactly what was done. Without access to the previous version of the manuscript, it is difficult to fully assess the extent of revisions made. The authors should provide a more comprehensive summary of the specific changes implemented, particularly regarding:

      We thank the reviewer for bringing this to our attention. We were equally confused to learn that the tracked-changes version was not visible, despite having submitted one to eLife as part of our revision. 

      Upon contacting the editorial office, they confirmed that we did submit a trackedchanges version, but clarified that it did not contain embedded figures (as they were added manually to the clean version).  The editorial response said in detail: “Regarding the tracked-changes file: it appears the version with markup lacked figures, while the figure-complete PDF had markup removed, which likely caused the confusion mentioned by the reviewers.” We hope this answer from eLife clarifies the reviewers’ concern.

      (2)  Statistical Method Selection  

      The authors mention using "ridge regression to mitigate collinearity among predictors" but do not adequately justify this choice over other approaches. They should explain:

      Why ridge regression was selected as the optimal method  

      How the regularization parameter (λ) was determined  

      How this choice affects the interpretation of environmental parameters' influence on individuality

      We appreciate the reviewer’s thoughtful question regarding our choice of statistical method. In response, we have expanded the Methods section in the revised manuscript to provide a more detailed justification for the use of a GLM, including ridge regression. Specifically, we explain that ridge regression was selected to address collinearity and to control for overfitting.

      We now also describe how the regularization parameter (λ) was selected: we used 5-fold cross-validation over a log-spaced grid (10<sup>⁻⁶</sup> - 10<sup>⁶</sup) to identify the optimal value that minimized the mean squared error (MSE).

      Finally, we clarify in both the Methods and Results sections how this modeling choice affects the interpretation of our findings. 

      Reviewer #2 (Public review):  

      Summary:  

      The authors repeatedly measured the behavior of individual flies across several environmental situations in custom-made behavioral phenotyping rigs.

      Strengths:  

      The study uses several different behavioral phenotyping devices to quantify individual behavior in a number of different situations and over time. It seems to be a very impressive amount of data. The authors also make all their behavioral phenotyping rig design and tracking software available, which I think is great, and I'm sure other folks will be interested in using and adapting to their own needs.

      Weaknesses/Limitations:  

      I think an important limitation is that while the authors measured the flies under different environmental scenarios (i.e. with different lighting, temperature) they didn't really alter the "context" of the environment. At least within behavioral ecology, context would refer to the potential functionality of the expressed behaviors so for example, an anti-predator context, or a mating context, or foraging. Here, the authors seem to really just be measuring aspects of locomotion under benign (relatively low risk perception) contexts. This is not a flaw of the study, but rather a limitation to how strongly the authors can really say that this demonstrates that individuality is generalized across many different contexts. It's quite possible that rank-order of locomotor (or other) behaviors may shift when the flies are in a mating or risky context.  

      I think the authors are missing an opportunity to use much more robust statistical methods It appears as though the authors used pearson correlations across time/situations to estimate individual variation; however far more sophisticated and elegant methods exist. The problem is that pearson correlation coefficients can be anti-conservative and additionally, the authors have thus had to perform many many tests to correlate behaviors across the different trials/scenarios. I don't see any evidence that the authors are controlling for multiple testing which I think would also help. Alternatively, though, the paper would be a lot stronger, and my guess is, much more streamlined if the authors employ hierarchical mixed models to analyse these data, which are the standard analytical tools in the study of individual behavioral variation. In this way, the authors could partition the behavioral variance into its among- and within-individual components and quantify repeatability of different behaviors across trials/scenarios simultaneously. This would remove the need to estimate 3 different correlations for day 1 & day 2, day 1 & 3, day 2 & 3 (or stripe 0 & stripe 1, etc) and instead just report a single repeatability for e.g. the time spent walking among the different strip patterns (eg. figure 3). Additionally, the authors could then use multivariate models where the response variables are all the behaviors combined and the authors could estimate the among-individual covariance in these behaviors. I see that the authors state they include generalized linear mixed models in their updated MS, but I struggled a bit to understand exactly how these models were fit? What exactly was the response? what exactly were the predictors (I just don't understand what Line404 means "a GLM was trained using the environmental parameters as predictors (0 when the parameter was not changed, 1 if it was) and the resulting individual rank differences as the response"). So were different models run for each scenario? for different behaviors? Across scenarios? What exactly? I just harp on this because I'm actually really interested in these data and think that updating these methods can really help clarify the results and make the main messages much clearer!

      I appreciate that the authors now included their sample sizes in the main body of text (as opposed to the supplement) but I think that it would still help if the authors included a brief overview of their design at the start of the methods. It is still unclear to me how many rigs each individual fly was run through? Were the same individuals measured in multiple different rigs/scenarios? Or just one?

      I really think a variance partitioning modeling framework could certainly improve their statistical inference and likely highlight some other cool patterns as these methods could better estimate stability and covariance in individual intercepts (and potentially slopes) across time and situation. I also genuinely think that this will improve the impact and reach of this paper as they'll be using methods that are standard in the study of individual behavioral variation

      Reviewer #3 (Public review):  

      This manuscript is a continuation of past work by the last author where they looked at stochasticity in developmental processes leading to inter-individual behavioural differences. In that work, the focus was on a specific behaviour under specific conditions while probing the neural basis of the variability. In this work, the authors set out to describe in detail how stable individuality of animal behaviours is in the context of various external and internal influences. They identify a few behaviours to monitor (read outs of attention, exploration, and 'anxiety'); some external stimuli (temperature, contrast, nature of visual cues, and spatial environment); and two internal states (walking and flying).

      They then use high-throughput behavioural arenas - most of which they have built and made plans available for others to replicate - to quantify and compare combinations of these behaviours, stimuli, and internal states. This detailed analysis reveals that:

      (1) Many individualistic behaviours remain stable over the course of many days.  

      (2) That some of these (walking speed) remain stable over changing visual cues. Others (walking speed and centrophobicity) remain stable at different temperatures.

      (3) All the behaviours they tested fail to remain stable over spatially varying environment (arena shape).

      (4) and only angular velocity (a read out of attention) remains stable across varying internal states (walking and flying)

      Thus, the authors conclude that there is a hierarchy in the influence of external stimuli and internal states on the stability of individual behaviours.

      The manuscript is a technical feat with the authors having built many new high-throughput assays. The number of animals are large and many variables have been tested - different types of behavioural paradigms, flying vs walking, varying visual stimuli, different temperature among others.  

      Comments on revisions:'  

      The authors have addressed my previous concerns.  

      We thank the reviewer for the positive feedback and are glad our revisions have satisfactorily addressed the previous concerns. We appreciate the thoughtful input that helped us improve the clarity and rigor of the manuscript.

      Reviewer #1 (Recommendations for the authors):  

      Comment on Revised Manuscript  

      Recommendations for Improvement  

      (1) Expand the Results section for Figure 9 with a more detailed interpretation of the GLM coefficients and their biological significance

      (2) Provide explicit criteria (or at least explain in detail) for how the GLM results confirm or undermine their original hypothesis about environmental context hierarchy

      While the claims are interesting, the additional statistical analysis appears promising. However, clearer explanation of these new results would strengthen the paper and ensure that readers from diverse backgrounds can fully understand how the evidence supports the authors' conclusions about individuality across environmental contexts. 

      We thank the reviewer for these constructive suggestions. In response to these suggestions, we have expanded both the Methods and Results sections to provide a more detailed explanation of the GLM coefficients, including their interpretation and how they relate to our original correlation-based findings.

      We now clarify how the direction, magnitude, and statistical significance of specific coefficients reflect the influence of different environmental factors on the persistence of individual behavioral traits. To make this accessible to readers from diverse backgrounds, we explicitly outline the criteria we used to evaluate whether the GLM results support our hypothesis about the hierarchical influence of environmental context, namely, whether the structure and strength of effects align with the patterns predicted from our prior correlation analysis.

      These additions improve clarity and help readers understand how the new statistical results reinforce our conclusions about the context-dependence of behavioral individuality.

      Reviewer #2 (Recommendations for the authors):  

      Thanks for the revision of the paper! I updated my review to try and provide a little more guidance by what I mean about updating your analyses. I really think this is a super cool data set and I genuinely wish this were MY dataset so that way I could really dig into it to partition the variance. These variance partitioning methods are standard in my particular subfield (study of individual behavioral variation in ecology and evolution) and so I think employing them is 1) going to offer a MUCH more elegant and holistic view of the behavioral variation (e.g. you can report a single repeatability estimate for each behavior rather than 3 different correlations) and 2) improve the impact and readership for your paper as now you'll be using methods that a whole community of researchers are very familiar with. It's just a suggestion, but I hope you consider it!

      We sincerely thank the reviewer for the insightful and encouraging feedback and for introducing us to this modeling approach. In response to this suggestion, we have incorporated a hierarchical linear mixed-effects model into our analysis (now presented in Figure 10), accompanied by a new supplementary table (Table T3). We also updated the Methods, Results, and Discussion sections to describe the rationale, implementation, and implications of the mixed-model analysis.

      We agree with the reviewer that this approach provides a more elegant way to quantify behavioral variation and individual consistency across contexts. In particular, the ability to estimate repeatability directly aligns well with the core questions of our study. It facilitates improved communication of our findings to ecology, evolution, and behavior researchers. We greatly appreciate the suggestion; it has significantly strengthened both the analytical framework and the interpretability of the manuscript.

    1. eLife Assessment

      This valuable study analyzes aging-related chromatin changes through the lens of intra-chromosomal gene correlation length, which is a novel computational metric that captures spatial correlations in gene expression along the chromosome. The authors propose that this metric reflects chromatin structure and can serve as a proxy for its changes during aging. While currently the strength of evidence is somewhat incomplete, if revised with further supporting data, this work will provide a systems-level understanding of aging and genome regulation, which is predicted to have a substantive impact on the field.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Mahajan et.al introduce two innovative macroscopic measures-intrachromosomal gene correlation length (𝓁∗) and transition energy barrier-to investigate chromatin structural dynamics associated with aging and age-related syndromes such as Hutchinson-Gilford Progeria Syndrome (HGPS) and Werner Syndrome (WRN). The authors propose a compelling systems-level approach that complements traditional biomarker-driven analyses, offering a more holistic and quantitative framework to assess genome-wide dysregulation. The concept of 𝓁∗ as a spatial correlation metric to capture chromatin disorganization is novel and well-motivated. The use of autocorrelation on distance-binned gene expression adds depth to the interpretation of chromatin state shifts. The energy landscape framework for gene state transitions is an elegant abstraction, with the notion of "irreversibility" providing a thermodynamic interpretation of transcriptional dysregulation. The application to multiple datasets (Fleischer, Line-1) and pathological states adds robustness to the analysis. The consistency of chromosome 6 (and to some extent chromosomes 16 and X) emerging as hotspots aligns well with known histone cluster localization and disease-relevant pathways. The manuscript does an excellent job of integrating transcriptomic trends with known epigenetic hallmarks of aging, and the proposed metrics can be used in place of traditional techniques like PCA in capturing structural transcriptome features. However, a direct correlation with ATACseq/ HiC data with the present analysis will be more informative.

      Strengths:

      Novel inclusion of statistical metrics that can help in systems-level studies in aging and chromatin biology.

      Weaknesses:

      (1) In the manuscript, the authors mention "While it may be intuitive to assume that highly expressed genes originate from euchromatin, this cannot be conclusively stated as a complete representation of euchromatin genes, nor can LAT be definitively linked to heterochromatin". What percentage of LAT can be linked to heterochromatin? What is the distribution of LAT and HAT in the euchromatin?

      (2) In Figure 2, the authors observe "that the signal from the HAT class is the stronger between two and the signal from the LAT class, being mostly uniform, can be constituted as background noise." Is this biologically relevant? Are low-abundance transcripts constitutively expressed? The authors should discuss this in the Results section.

      (3) The authors make a very interesting observation from Figure 3: that ASO-treated LINE-1 appears to be more effective in restoring HGPS cell lines closer to wild-type compared to WRN.. This can be explained by the difference in the basal activity of L1 elements in the HGPS vs WRN cell types. The authors should comment on this.

      (4) The authors report that "from the results on Fleicher dataset is the magnitude of the difference in similarity distance is more pronounced in 𝓁∗ than in gene expression." Does this mean that the alterations in gene distance and chromatin organization do not result in gene expression change during aging?

      (5) "In Fleischer dataset, as evident in Figure 4a, although changes in the heterochromatin are not identical for all chromosomes shown by the different degrees of variation of 𝓁∗ in each age group." The authors should present a comprehensive map of each chromosome change in gene distance to better explain the above statement.

      (6) While trends in 𝓁∗ are discussed at both global and chromosome-specific levels, stronger statistical testing (e.g., permutation tests, bootstrapping) would lend greater confidence, especially when differences between age groups or treatment states are modest.

      (7) While the transition energy barrier is an insightful conceptual addition, further clarification on the mathematical formulation and its physical assumptions (e.g., energy normalization, symmetry conditions) would improve interpretability. Also, in between Figures 7 and 8, the authors first compare the energy barrier of Chromosome 1 and then for all other chromosomes. What is the rationale for only analyzing chromosome 1? How many HAT or LAT are present there?

    3. Reviewer #2 (Public review):

      The authors report that intra-chromosomal gene correlation length (spatial correlations in gene expressions along the chromosome) serves as a proxy of chromatin structure and hence gene expression. They further explore changes in these metrics with aging. These are interesting and important findings. However, there are fundamental problems at this time.

      (1) The basic method lacks validation. There is no validation of the method by approaches that directly measure chromatin structure, for example ATAC-seq, ChIP-seq, or CUT n RUN.

      (2) There is no validation by interventions that directly probe chromatin structure, such as HDAC inhibitors. The authors employ datasets with knockdown of LINE-1 for validation. However, this is not a specific chromatin intervention.

      (3) There is no statistical analysis, e.g., in Figures 4 and 5.

      (4) The authors state, "in Figure 4a changes in the heterochromatin are not identical for all chromosomes shown...." I do not see the data for individual chromosomes.

      (5) In comparisons of WT vs HGPS NT or HGPS SCR (Figure S6), is this a fair comparison? The WT and HGPS are presumably from different human donors, so they have genetic and epigenetic differences unrelated to HGPS.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Mahajan et. al. introduce two innovative macroscopic measures-intrachromosomal gene correlation length (𝓁∗) and transition energy barrier-to investigate chromatin structural dynamics associated with aging and age-related syndromes such as Hutchinson-Gilford Progeria Syndrome (HGPS) and Werner Syndrome (WRN). The authors propose a compelling systems-level approach that complements traditional biomarker-driven analyses, offering a more holistic and quantitative framework to assess genome-wide dysregulation. The concept of 𝓁∗ as a spatial correlation metric to capture chromatin disorganization is novel and well-motivated. The use of autocorrelation on distance-binned gene expression adds depth to the interpretation of chromatin state shifts. The energy landscape framework for gene state transitions is an elegant abstraction, with the notion of "irreversibility" providing a thermodynamic interpretation of transcriptional dysregulation. The application to multiple datasets (Fleischer, Line-1) and pathological states adds robustness to the analysis. The consistency of chromosome 6 (and to some extent chromosomes 16 and X) emerging as hotspots aligns well with known histone cluster localization and disease-relevant pathways. The manuscript does an excellent job of integrating transcriptomic trends with known epigenetic hallmarks of aging, and the proposed metrics can be used in place of traditional techniques like PCA in capturing structural transcriptome features. However, a direct correlation with ATACseq/HiC data with the present analysis will be more informative.

      (1) In the manuscript, the authors mention "While it may be intuitive to assume that highly expressed genes originate from euchromatin, this cannot be conclusively stated as a complete representation of euchromatin genes, nor can LAT be definitively linked to heterochromatin". What percentage of LAT can be linked to heterochromatin? What is the distribution of LAT and HAT in the euchromatin?

      Thank you for this insightful question. In the revision we will add chromatin state annotations using ChromHMM to identify overlap between HAT/LAT and corresponding chromatin state. This should provide the specific percentages and distributions you requested.

      We would like to take this opportunity to clarify that based on the plots Fig S1, and differential gene expressions, HAT is most likely a subset of euchromatin and LAT may contain both euchromatin and heterochromatin. The HAT/LAT cutoff occurs around the knee point in the log-log plot (Figure S1), where the linear portion indicates scale-invariant behavior with similar relative changes across expression ranks. The non-linear portion represents departure from power-law scaling, where low-expression genes exhibit sharper decline than expected. This suggests potential biological mechanisms such as chromatin silencing, detection limits, or technical artifacts related to sequencing depth.

      We will provide detailed chromatin state analysis in the revision. For reference, HAT gene lists per chromosome are available in our GitHub repository at: https://github.com/altoslabs/papers-2025-rnaseq-chrom-aging/tree/main/data/Preprocessed_dat a under /<dataset>/chromosome_{}/data_hi.

      (2) In Figure 2, the authors observe "that the signal from the HAT class is the stronger between two and the signal from the LAT class, being mostly uniform, can be constituted as background noise." Is this biologically relevant? Are low-abundance transcripts constitutively expressed? The authors should discuss this in the Results section.

      We apologize for the confusion arising from the usage of the term “background noise”. We agree that the distinction between high-abundance transcripts (HATs) and low-abundance transcripts (LATs) deserves more explicit discussion in the Results.

      Our intention is to say that HAT has a higher signal-to-noise ratio (SNR) compared to LAT. This is coming from the power law graph of FigS1.  Our intention is to state that the HAT class provides a strong, robust signal, consistent across chromosomes and the LAT class exhibits lower SNR and a more uniform background-like distribution in the context of the problem we are solving and not rather a generic biological statement. The experiment result that led to this statement is presented in FigS3. This does not imply that low-abundance transcripts lack biological relevance, but rather that they contribute less to the spatial organization patterns we measure.

      (3) The authors make a very interesting observation from Figure 3: that ASO-treated LINE-1 appears to be more effective in restoring HGPS cell lines closer to wild-type compared to WRN.. This can be explained by the difference in the basal activity of L1 elements in the HGPS vs WRN cell types. The authors should comment on this.

      We thank the reviewer for this incisive biological observation. While the differential effectiveness of ASO-treated LINE-1 in HGPS versus WRN cell lines is indeed an interesting phenomenon that may relate to basal L1 activity differences, this biological mechanism falls outside the scope of our current study.

      Our paper focuses on demonstrating that the 𝓁∗ metric can sensitively detect chromatin structural changes that have been independently validated. We utilize the Della Valle et al. (2022) dataset specifically because it provides experimentally confirmed chromatin structural differences (Progeroid vs wild-type vs ASO-treated Progeriod), allowing us to validate that 𝓁∗ correlates with these established changes.

      For detailed discussion of the biological mechanisms underlying differential LINE-1 ASO effectiveness between progeroid syndromes, we would direct readers to Della Valle et al. (2022) and related LINE-1 biology literature. Our contribution lies in demonstrating that 𝓁∗ can capture these chromatin organizational changes with enhanced sensitivity compared to traditional expression-based approaches. We are reluctant, without further experimentation, to venture into over-interpreting these results from a biology perspective.  

      (4) The authors report that "from the results on Fleischer dataset is the magnitude of the difference in similarity distance is more pronounced in 𝓁∗ than in gene expression." Does this mean that the alterations in gene distance and chromatin organization do not result in gene expression change during aging?

      Thank you for this important clarification request. This observation, illustrated in Figure 3, highlights two key points: (1) 𝓁∗ shows similar trends to PCA analysis, and (2) 𝓁∗ demonstrates higher sensitivity than traditional gene expression analysis.

      This enhanced sensitivity enables better discrimination between aging states, particularly in the Fleischer dataset representing natural aging where changes are more gradual. The higher sensitivity stems from 𝓁∗'s ability to capture transcriptional spatial organization through spatial autocorrelation, which can detect subtle organizational changes that may precede or accompany expression changes rather than replacing them.

      We will clarify in the revision that chromatin organizational changes and gene expression changes are complementary rather than mutually exclusive phenomena during aging.

      (5) "In Fleischer dataset, as evident in Figure 4a, although changes in the heterochromatin are not identical for all chromosomes shown by the different degrees of variation of 𝓁∗ in each age group." The authors should present a comprehensive map of each chromosome change in gene distance to better explain the above statement.

      Thank you for the feedback. If we understand your comment correctly, we need to provide a chromosome-wise distribution for Fig3c. We will update the paper and the supplementary.

      (6) While trends in 𝓁∗ are discussed at both global and chromosome-specific levels, stronger statistical testing (e.g., permutation tests, bootstrapping) would lend greater confidence, especially when differences between age groups or treatment states are modest.

      Thank you for the helpful suggestion. In the revision, we will incorporate permutation-based significance testing by shuffling the gene annotation and count table to generate a null distribution for our 𝓁∗ calculation. This will allow us to more rigorously assess whether the observed differences across age groups or treatment states deviate from chance expectations and thereby lend greater statistical confidence to our findings.

      (7) While the transition energy barrier is an insightful conceptual addition, further clarification on the mathematical formulation and its physical assumptions (e.g., energy normalization, symmetry conditions) would improve interpretability. Also, in between Figures 7 and 8, the authors first compare the energy barrier of Chromosome 1 and then for all other chromosomes.

      What is the rationale for only analyzing chromosome 1? How many HAT or LAT are present there?

      Regarding chromosome 1 focus: we initially presented chromosome 1 as a representative example, but we will include energy landscape analysis for all chromosomes in the supplementary materials

      We use the same HATs that were extracted during 𝓁∗ for the energy landscape as well. The HAT details are present in the github repo, the link provided in response to 1st feedback.

      The normalization of the energy barrier ensures comparability across chromosomes of different sizes and across samples with different absolute expression scales. Specifically, we normalize with respect to the total area under the two-dimensional energy landscape while using the thermal energy (k_B T) as a scaling factor to place transition energy barriers on the scale of thermal fluctuations. This is formally expressed as in Eq. (1). 

      The physical consequences of symmetry in the energy landscape are discussed in lines 472-491 of the manuscript, where we also introduce the concept of irreversibility. In brief, the chromatin energy landscape (Figure 8) is constructed by quantifying the energy contributions of genes that are upregulated (lower triangular matrix) and downregulated (upper triangular matrix) between two states. If the integrated energy contributions of upregulated and downregulated genes are equal, the landscape is symmetric, representing a thermodynamically reversible process, for example, nucleosome repositioning between euchromatic and heterochromatic regions without net gain or loss of nucleosomes. However, in cases where epigenetic modifications alter nucleosome density (e.g., disease states that reduce nucleosome numbers), the integrated energies are unequal, reflecting an irreversible energy cost. In this case, restoring chromatin requires additional energy input (e.g., to replace “missing” nucleosomes), which manifests as asymmetry in the landscape.

      Reviewer #2 (Public review):

      The authors report that intra-chromosomal gene correlation length (spatial correlations in gene expressions along the chromosome) serves as a proxy of chromatin structure and hence gene expression. They further explore changes in these metrics with aging. These are interesting and important findings. However, there are fundamental problems at this time.

      (1) The basic method lacks validation. There is no validation of the method by approaches that directly measure chromatin structure, for example ATAC-seq, ChIP-seq, or CUT n RUN.

      We appreciate the reviewer’s point that direct measurements such as ATAC-seq and ChIP-seq remain the gold standard for characterizing chromatin structure. Our method is designed to complement, not replace, these approaches by leveraging RNA-seq data to detect large-scale transcriptional patterns that correlate with chromatin dynamics.

      We agree that integrating datasets with paired RNA-seq and chromatin accessibility assays would strengthen the manuscript and plan to include one such dataset in the revision.

      Based on this feedback, we will also take the opportunity during revision to clarify and soften certain statements. Specifically, we will reposition ℓ∗ as a sensitive, computational proxy for detecting transcriptional signatures that are suggestive of chromatin structural changes. In other words, ℓ∗ provides an indirect window into chromatin dynamics through transcriptional spatial organization, allowing detection of patterns that may precede or accompany structural changes. Direct assays such as ATAC-seq or ChIP-seq remain essential for confirming the underlying physical modifications. To make this scope clear, we will revise the title to: “Macroscopic RNA-seq Analysis to Detect Transcriptional Patterns Associated with Chromatin State Changes,” and adjust the main text.  

      We would like to take this opportunity to clarify why our initial version focused on the Della Valle and Fleischer datasets rather than including new paired datasets with direct chromatin measurements. The primary objective of our paper is to introduce two macroscopic RNA-seq–based measures, ℓ∗ and the energy landscape, that are designed to detect transcriptional signatures suggestive of chromatin structural changes in the context of aging and age-related diseases. These measures explicitly model transcriptional spatial organization and provide a sensitive, scalable way to analyze RNA-seq data in domains where direct chromatin assays may not be readily available.

      The datasets we used (Della Valle et al., Fleischer et al.) have been rigorously validated and independently demonstrated differences in chromatin structure between conditions. Our goal was to show that ℓ∗ and the energy landscape align with and extend these established findings, offering a more sensitive measure of transcriptional spatial organization. Specifically, in the Della Valle dataset, chromatin structural differences between progeroid and healthy donors — and their partial rescue by LINE-1 ASO treatment — were experimentally confirmed, providing a strong foundation for testing whether our metrics reflect these known changes. Similarly, the Fleischer dataset captures natural, in vivo aging, which has also been linked to chromatin alterations in prior studies.

      Thus, our approach builds on this well-established biological context rather than attempting to re-demonstrate these chromatin differences from scratch. Finally, we emphasize that our current focus is aging and age-related diseases. While the framework could potentially be applied to other chromatin modification contexts, we have not tested it outside this domain and do not claim general applicability at this stage.

      (2) There is no validation by interventions that directly probe chromatin structure, such as HDAC inhibitors. The authors employ datasets with knockdown of LINE-1 for validation. However, this is not a specific chromatin intervention.

      We request the reviewer to refer to our response to (1) as it includes the rationale behind the selection of LINE-1 and Fleischer dataset. We would also like to state that while the focus of Della Valle et al. was LINE-1 treated ASO to show rescue of progeroid samples, it also contains data for non-treated as well as healthy samples. Importantly, untreated progeroid samples show distinctly different chromatin structure compared to healthy samples, with substantial differences detectable by both PCA and our 𝓁∗ metric.

      Our 𝓁∗ method provides additional interpretability by capturing transcriptional spatial organization, resulting in shorter correlation lengths for healthy patients and longer lengths for progeroid patients.

      But as mentioned in our response to (1) we will try to add an additional dataset with paired rna-seq and one of ATAC, ChIP-seq or CUT n RUN in the revision

      (3) There is no statistical analysis, e.g., in Figures 4 and 5.

      We have provided statistical analysis for Fig 4 (lines 237-241). We will do a similar analysis for Fig. 5. 

      (4) The authors state, "in Figure 4a changes in the heterochromatin are not identical for all chromosomes shown...." I do not see the data for individual chromosomes.

      The data for individual chromosomes is available in supplementary Fig. S11 – references at line 425. We will make this cross-reference clearer in the main text and consider whether some of this chromosome-specific information should be elevated to the main figures for better accessibility.

      (5) In comparisons of WT vs HGPS NT or HGPS SCR (Figure S6), is this a fair comparison? The WT and HGPS are presumably from different human donors, so they have genetic and epigenetic differences unrelated to HGPS.

      Figure S6 demonstrates that 𝓁∗ analysis identifies chromosome 6 as most affected, consistent with differential gene expression patterns.

      Regarding donor differences in WT vs HGPS comparisons, we defer to the experimental design of Della Valle et al., which follows standard practices in progeroid research. Our review of the literature indicates that progeroid studies typically use either parent/child samples or different donor comparisons (as individuals cannot simultaneously represent both WT and HGPS states).

      Importantly, the LINE-1 ASO treatment comparisons use the same cell lines, eliminating donor variability concerns. This experimental design allows us to validate that 𝓁∗ can detect rescue effects within genetically identical samples, supporting the method's sensitivity to chromatin structural changes  

      Reviewing Editor Comments:

      You'll note that both reviewers were very thoughtful in their comments, and in principle are supportive and excited by the work. However, their evaluation of the strength of evidence diverged substantially. I'm inclined to suggest that finding a way to support the novel method with an alternative approach would greatly improve the impact of this work. I encourage you to consider a revision that provides such data, in the context of technology currently available to the field.

      We sincerely thank the editor for their thoughtful and encouraging assessment of our work. We are grateful for their recognition of the novelty of our macroscopic measures (ℓ∗ and the transition energy barrier) and their potential to provide a systems-level understanding of chromatin structural dynamics in aging and age-related syndromes. In response to the editor’s suggestion for direct validation with chromatin accessibility data, we plan to integrate an additional dataset containing paired RNA-seq and ATAC-seq or related measurements in our revision. This will help strengthen the link between our RNA-seq–based metrics and direct chromatin assays. We have also clarified and softened the manuscript text to ensure it is clear that ℓ∗ serves as a complementary, computational proxy, not a replacement, for direct experimental approaches. Very specifically, to make this scope clear, we will revise the title to: “Macroscopic RNA-seq Analysis to Detect Transcriptional Patterns Associated with Chromatin State Changes,” and adjust the main text. We thank the editor for the feedback. We have provided additional details in response to specific comments made by the reviewers.

    1. eLife Assessment

      This study presents a new toolbox for Representational Similarity Analysis, representing a valuable contribution to the neuroscience community. The authors offer a well-integrated platform that brings together a range of state-of-the-art methodological advances within a convincing framework, with strong potential to enable more rigorous and insightful analyses of neural data across multiple subfields.

    2. Reviewer #1 (Public review):

      Summary

      This manuscript presents an updated version of rsatoolbox, a Python package for performing Representational Similarity Analysis (RSA) on neural data. The authors provide a comprehensive and well-integrated framework that incorporates a range of state-of-the-art methodological advances. The updated version extends the toolbox's capabilities.

      The paper outlines a typical RSA workflow in five steps:

      (1) Importing data and estimating activity patterns.

      (2) Estimating representational geometries (computing RDMs).

      (3) Comparing RDMs.

      (4) Performing inferential model comparisons.

      (5) Handling multiple testing across space and time.

      For each step, the authors describe methodological advances and best practices implemented in the toolbox, including improved measures of representational distances, evaluators for representational models, and statistical inference methods.

      While the relative impact of the manuscript is somewhat limited to the new contributions in this update (which are nonetheless very useful), the general toolbox - here thoroughly described and discussed - remains an invaluable contribution to the field and is well-received by the cognitive and computational neuroscience communities.

      Strengths:

      A key strength of the work is the breadth and integration of the implemented methods. The updated version introduces several new features, such as additional comparators and dissimilarity estimators, that closely follow recent methodological developments in the field. These enhancements build on an already extensive set of functionalities, offering seamless support for RSA analyses across a wide variety of data sources, including deep neural networks, fMRI, EEG, and electrophysiological recordings.

      The toolbox also integrates effectively with the broader open-source ecosystem, providing compatibility with BIDS formats and outputs from widely used neuroscience software. This integration will make it easier for researchers to incorporate rsatoolbox into existing workflows. The documentation is extensive, and the scope of functionality - from dissimilarity estimation to statistical inference - is impressive.

      For researchers already familiar with RSA, rsatoolbox offers a coherent environment that can streamline analyses, promote methodological consistency, and encourage best practices.

      Weaknesses:

      While I enjoyed reading the manuscript - and even more so exploring the toolbox - I have some comments for the authors. None of these points is strictly major, and I leave it to the authors' discretion whether to act on them, but addressing them could make the manuscript an even more valuable resource for those approaching RSA.

      (1) While several estimators and comparators are implemented, Figure 4 appears to suggest that only a subset should be used in practice. This raises the question of whether the remaining options are necessary, and under what circumstances they might be preferable. Although it is likely that different measures are suited to different scenarios, this is not clearly explained in the manuscript. As presented, a reader following the manuscript's guidance might rely on only a few of the available comparators and estimators without understanding the rationale. It would be helpful if the authors could provide practical examples illustrating when one measure might be preferred over another, and how different measures behave under varying conditions-for instance, in what situations the user should choose manifold similarity versus Bures similarity?

      (2) The comparison to other RSA tools is minimal, making it challenging to place rsatoolbox in the broader landscape of available resources. Although the authors mention some existing RSA implementations, they do not provide a detailed comparison of features or performance between their toolbox and alternatives.

      (3) Finally, given the growing interest in comparing neural network models with brain data, a more detailed discussion of how the toolbox can be applied to common questions in this area would be a valuable addition.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript, "A Python Toolbox for Representational Similarity Analysis", presents an overview of the RSAToolbox, including a review of the methods it implements (some of which are more recently developed) and recommendations for constructing RSA analysis pipelines. It is encouraging to see that this toolbox, which has existed in both Python and other forms, continues to be actively developed and maintained.

      Strengths:

      The authors do a nice job reviewing the history of RSA analysis while introducing the methods within the toolbox. It is helpful that the authors discuss when and how to apply specific measures to different data types (e.g., why Euclidean or Mahalanobis distances are suboptimal for spike data). The manuscript strikes a valuable balance between theoretical background and hands-on instruction. The inclusion of decision-making aids, such as the Euler diagram for selecting similarity measures, and well-maintained demo scripts (available on GitHub), enhance the manuscript's utility as a practical guide.

      Overall, this paper will be particularly useful to researchers new to RSA and those interested in performing a rigorous analysis using this framework. The manuscript and accompanying toolbox provide everything a researcher needs to get started, provided they take the time to engage with the methodological details and references offered

      Weaknesses:

      While the links to the demos in the figure legend did not work for me, it was easy to locate the current demos online, and it's encouraging to see that they are actively maintained. One small issue is that a placeholder ("XXX") remains in the description of Figure 3b and should be corrected.

    4. Author response:

      We thank the reviewers for their valuable feedback. We will prepare a revision of the manuscript based on these suggestions and comments. We are sure these revisions will improve the paper.

      The only major point we wish to clarify is that this is the first and only manuscript describing the toolbox; it is not a version update. Although it shares a similar name with its 2015 MATLAB predecessor (Nili et al., PLoS Comput Biol), rsatoolbox was designed from scratch. Also, they have no code or structural overlap beyond implementing some similar methods.

      Developed publicly since 2019, rsatoolbox reflects a decade of research in RSA methodology across multiple labs and incorporates new dissimilarity metrics, RDM comparators, inferential procedures, and visualization methods. Importantly, although we cite several papers describing methods implemented in the toolbox, this is the first manuscript to present the toolbox as a whole, its design principles, and the unified analytical framework it offers.

      We are sorry about the forgotten placeholder and the links not working. The links work for us in the pdf at least and we will certainly fix the placeholder as soon as possible.

    1. eLife Assessment

      This important study uses advanced computational methods to elucidate how environmental dielectric properties influence the interaction strengths of tyrosine and phenylalanine in biomolecular condensates. The evidence supporting the claims of the authors is convincing, as the simulations are performed rigorously providing mechanistic insights into the origin of the differences between the two aromatic amino acids considered. This study will be of broad interest to researchers studying biomolecular phase separation.

    2. Reviewer #1 (Public review):

      This is an interesting and timely computational study using molecular dynamics simulation as well as quantum mechanical calculation to address why tyrosine (Y), as part of an intrinsically disordered protein (IDP) sequence, has been observed experimentally to be stronger than phenylalanine (F) as a promoter for biomolecular phase separation. Notably, the authors identified the aqueous nature of the condensate environment and the corresponding dielectric and hydrogen bonding effects as a key to understand the experimentally observed difference. This principle is illustrated by the difference in computed transfer free energy of Y- and F-containing pentapeptides into solvent with various degrees of polarity. The elucidation offered by this work is important. The computation appears to be carefully executed, the results are valuable, and the discussion is generally insightful. However, there is room for improvement in some parts of the presentation in terms of accuracy and clarity, including, e.g., the logic of the narrative should be clarified with additional information (and possibly additional computation), and the current effort should be better placed in the context of prior relevant theoretical and experimental works on cation-π interactions in biomolecules and dielectric properties of biomolecular condensates. Accordingly, this manuscript should be revised to address the following, with added discussion as well as inclusion of references mentioned below.

      (1) Page 2, line 61: "Coarse-grained simulation models have failed to account for the greater propensity of arginine to promote phase separation in Ddx4 variants with Arg to Lys mutations (Das et al., 2020)". As it stands, this statement is not accurate, because the cited reference to Das et al. showed that although some coarse-grained model, namely the HPS model of Dignon et al., 2018 PLoS Comput did not capture the Arg to Lys trend, the KH model described in the same Dignon et al. paper was demonstrated by Das et al. (2020) to be capable of mimicking the greater propensity of Arg to promote phase separation than Lys. Accordingly, a possible minimal change that would correct the inaccuracy of this statement in the manuscript would be to add the word "Some" in front of "coarse-grained simulation models ...", i.e., it should read "Some coarse-grained simulation models have failed ...". In fact, a subsequent work [Wessén et al., J Phys Chem B 126: 9222-9245 (2022)] that applied the Mpipi interaction parameters (Joseph et al., 2021, already cited in the manuscript) showed that Mpipi is capable of capturing the rank ordering of phase separation propensity of Ddx4 variants, including a charge scrambled variant as well as both the Arg to Lys and the Phe to Ala variants (see Fig.11a of the above-cited Wessén et al. 2022 reference). The authors may wish to qualify their statements in the introduction to take note of these prior results. For example, they may consider adding a note immediately after the next sentence in the manuscript "However, by replacing the hydrophobicity scales ... (Das et al., 2020)" to refer to these subsequent findings in 2021-2022.

      (2) Page 8, lines 285-290 (as well as the preceding discussion under the same subheading & Fig.4): "These findings suggest that ... is not primarily driven by differences in protein-protein interaction patterns ..." The authors' logic in terms of physical explanation is somewhat problematic here. In this regard, "Protein-protein interaction patterns" appears to be a straw man, so to speak. Indeed, who (reference?) has argued that the difference in the capability of Y and F in promoting phase separation should be reflected in the pairwise amino acid interaction pattern in a condensate that contains either only Y (and G, S) and only F (and G, S) but not both Y and F? Also, this paragraph in the manuscript seems to suggest that the authors' observation of similar contact patterns in the GSY and GSF condensates is "counterintuitive" given the difference in Y-Y and F-F potentials of mean force (Joseph et al., 2021); but there is nothing particularly counterintuitive about that. The two sets of observations are not mutually exclusive. For instance, consider two different homopolymers, one with a significantly stronger monomer-monomer attraction than the other. The condensates for the two different homopolymers will have essentially the same contact pattern but very different stabilities (different critical temperatures), and there is nothing surprising about it. In other words, phase separation propensity is not "driven" by contact pattern in general, it's driven by interaction (free) energy. The relevant issue here is total interaction energy or critical point of the phase separation. If it is computationally feasible, the authors should attempt to determine the critical temperatures for the GSY condensate versus the GSF condensate to verify that the GSY condensate has a higher critical temperature than the GSF condensate. That would be the most relevant piece of information for the question at hand.

      (3) Page 9, lines 315-316: "...Our ε [relative permittivity] values ... are surprisingly close to that derived from experiment on Ddx4 condensates (45{plus minus}13) (Nott et al., 2015)". For accuracy, it should be noted here that the relative permittivity provided in the supplementary information of Nott et al. was not a direct experimental measurement but based on a fit using Flory-Huggins (FH), but FH is not the most appropriate theory for polymer with long-spatial-range Coulomb interactions. To this reviewer's knowledge, no direct measurement of relative permittivity in biomolecular condensates has been made to date. Explicit-water simulation suggests that relative permittivity of Ddx4 condensate with protein volume fraction ≈ 0.4 can have relative permittivity ≈ 35-50 (Das et al., PNAS 2020, Fig.7A), which happens to agree with the ε = 45{plus minus}13 estimate. This information should be useful to include in the authors' manuscript.

      (4) As for the dielectric environment within biomolecular condensates, coarse-grained simulation has suggested that whereas condensates formed by essentially electric neutral polymers (as in the authors' model systems) have relative permittivities intermediate between that of bulk water and that of pure protein (ε = 2-4, or at most 15), condensates formed by highly charge polymers can have relative permittivity higher than that of bulk water [Wessén et al., J Phys Chem B 125:4337-4358 (2021), Fig.14 of this reference]. In view of the role of aromatic residues (mainly Y and F) in the phase separation of IDPs such as A1-LCD and LAF-1 that contain positively and negatively charged residues (Martin et al., 2020; Schuster et al., 2020, already cited in the manuscript), it should be useful to address briefly how the relationship between the relative phase-separation promotion strength of Y vs F and dielectric environment of the condensate may or may not be change with higher relative permittivities.

      (5) The authors applied the dipole moment fluctuation formula (Eq.2 in the manuscript) to calculate relative permittivity in their model condensates. Does this formula apply only to an isotropic environment? The authors' model condensates were obtained from a "slab" approach (p.4) and thus the simulation box has a rectangular geometry. Did the authors apply their Eq.2 to the entire simulation box or only to the central part of the box with the condensate (see, e.g., Fig.3C in the manuscript). If the latter is the case, is it necessary to use a different dipole moment formula that distinguishes between the "parallel" and "perpendicular" components of the dipole moment (see, e.g., Eq.16 in the above-cited Wessén et al. 2021 paper). A brief added comments will be useful.

      (6) With regard to the general role of Y and F in the phase separation of biomolecules containing positively charged Arg and Lys residues, the relative strength of cation-π interactions (cation-Y vs cation-F) should be addressed (in view of the generality implied by the title of the manuscript), or at least discussed briefly in the authors' manuscript if a detailed study is beyond the scope of their current effort. It has long been known that in the biomolecular context, cation-Y is slightly stronger than cation-F, whereas cation-tryptophan (W) is significantly stronger than either cation-Y and cation-F [Wu & McMahon, JACS 130:12554-12555 (2008)]. Experimental data from a study of EWS (Ewing sarcoma) transactivation domains indicated that Y is a slightly stronger promoter than F for transcription, whereas W is significantly stronger than either Y or F [Song et al., PLoS Comput Biol 9:e1003239 (2013)]. In view of the subsequent general recognition that "transcription factors activate genes through the phase-separation capacity of their activation domain" [Boija et al., Cell 175:1842-1855.e16 (2018)] which is applicable to EWS in particular [Johnson et al., JACS 146:8071-8085 (2024)], the experimental data in Song et al. 2013 (see Fig.3A of this reference) suggests that cation-Y interactions are stronger than cation-F interactions in promoting phase separation, thus generalizing the authors' observations (which focus primarily on Y-Y, Y-F and F-F interactions) to most situations in which cation-Y and cation-F interactions are relevant to biomolecular condensation.

      (7) Page 9: The observation of a weaker effective F-F (and a few other nonpolar-nonpolar) interaction in a largely aqueous environment (as in an IDP condensate) than in a nonpolar environment (as in the core of a folded protein) is intimately related to (and expected from) the long-recognized distinction between "bulk" and "pair" as well as size dependence of hydrophobic effects that have been addressed in the context of protein folding [Wood & Thompson, PNAS 87:8921-8927 (1990); Shimizu & Chan, JACS 123:2083-2084 (2001); Proteins 49:560-566 (2002)]. It will be useful to add a brief pointer in the current manuscript to this body of relevant resource in protein science.

      Comments on revisions:

      The authors have largely addressed my previous concerns and the manuscript has been substantially improved. Nonetheless, it will benefit the readers more if the authors had included more of the relevant references provided in my previous review so as to afford a broader and more accurate context to the authors' effort. This deficiency is particularly pertinent for point number 6 in my previous report about cation-pi interactions. The authors have now added a brief discussion but with no references on the rank ordering of Y, F, and W interactions. I cannot see how providing additional information about a few related works could hurt. Quite the contrary, having the references will help readers establish scientific connections and contribute to conceptual advance.

    3. Reviewer #2 (Public review):

      Summary:

      In this preprint, De Sancho and López use alchemical molecular dynamics simulations and quantum mechanical calculations to elucidate the origin of the observed preference of Tyr over Phe in phase separation. The paper is well written, and the simulations conducted are rigorous and provide good insight into the origin of the differences between the two aromatic amino acids considered.

      Strengths:

      The study addresses a fundamental discrepancy in the field of phase separation where the predicted ranking of aromatic amino acids observed experimentally is different from their anticipated rankings when considering contact statistics of folded proteins. While the hypothesis that the difference in the microenvironment of the condensed phase and hydrophobic core of folded proteins underlies the different observations, this study provides a quantification of this effect. Further, the demonstration of the crossover between Phe and Tyr as a function of the dielectric is interesting and provides further support for the hypothesis that the differing microenvironments within the condensed phase and the core of folded proteins is the origin of the difference between contact statistics and experimental observations in phase separation literature. The simulations performed in this work systematically investigate several possible explanations and therefore provide depth to the paper.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This is an interesting and timely computational study using molecular dynamics simulation as well as quantum mechanical calculation to address why tyrosine (Y), as part of an intrinsically disordered protein (IDP) sequence, has been observed experimentally to be stronger than phenylalanine (F) as a promoter for biomolecular phase separation. Notably, the authors identified the aqueous nature of the condensate environment and the corresponding dielectric and hydrogen bonding effects as a key to understanding the experimentally observed difference. This principle is illustrated by the difference in computed transfer free energy of Y- and F-containing pentapeptides into a solvent with various degrees of polarity. The elucidation offered by this work is important. The computation appears to be carefully executed, the results are valuable, and the discussion is generally insightful. However, there is room for improvement in some parts of the presentation in terms of accuracy and clarity, including, e.g., the logic of the narrative should be clarified with additional information (and possibly additional computation), and the current effort should be better placed in the context of prior relevant theoretical and experimental works on cation-π interactions in biomolecules and dielectric properties of biomolecular condensates. Accordingly, this manuscript should be revised to address the following, with added discussion as well as inclusion of references mentioned below.

      We are grateful for the referee’s assessment of our work and insightful suggestions, which we address point by point below.

      (1) Page 2, line 61: "Coarse-grained simulation models have failed to account for the greater propensity of arginine to promote phase separation in Ddx4 variants with Arg to Lys mutations (Das et al., 2020)". As it stands, this statement is not accurate, because the cited reference to Das et al. showed that although some coarse-grained models, namely the HPS model of Dignon et al., 2018 PLoS Comput did not capture the Arg to Lys trend, the KH model described in the same Dignon et al. paper was demonstrated by Das et al. (2020) to be capable of mimicking the greater propensity of Arg to promote phase separation than Lys. Accordingly, a possible minimal change that would correct the inaccuracy of this statement in the manuscript would be to add the word "Some" in front of "coarse-grained simulation models ...", i.e., it should read "Some coarse-grained simulation models have failed ...". In fact, a subsequent work [Wessén et al., J Phys Chem B 126: 9222-9245 (2022)] that applied the Mpipi interaction parameters (Joseph et al., 2021, already cited in the manuscript) showed that Mpipi is capable of capturing the rank ordering of phase separation propensity of Ddx4 variants, including a charge scrambled variant as well as both the Arg to Lys and the Phe to Ala variants (see Figure 11a of the above-cited Wessén et al. 2022 reference). The authors may wish to qualify their statements in the introduction to take note of these prior results. For example, they may consider adding a note immediately after the next sentence in the manuscript "However, by replacing the hydrophobicity scales ... (Das et al., 2020)" to refer to these subsequent findings in 2021-2022.

      We agree with the referee that the wording used in the original version was inaccurate. We did not want to expand too much on the previous results on Lys/Arg, to avoid overwhelming our readers with background information that was not directly relevant to the aromatic residues Phe and Tyr. We have now introduced some of the missing details in the hope that this will provide a more accurate account of what has been achieved with different versions of coarse-grained models. In the revised version, we say the following:

      Das and co-workers attempted to explain arginine’s greater propensity to phase separate in Ddx4 variants using coarse-grained simulations with two different energy functions (Das et al., 2020). The model was first parametrized using a hydrophobicity scale, aimed to capture the “stickiness” of different amino acids (Dignon et al., 2018), but this did not recapitulate the correct rank order in the stability of the simulated condensates (Das et al., 2020). By replacing the hydrophobicity scale with interaction energies from amino acid contact matrices —derived from a statistical analysis of the PDB (Dignon et al., 2018; Miyazawa and Jernigan, 1996; Kim and Hummer, 2008)— they recovered the correct trends (Das et al., 2020). A key to the greater propensity for LLPS in the case of Arg may derive from the pseudo-aromaticity of this residue, which results in a greater stabilization relative to the more purely cationic character of Lys (Gobbi and Frenking, 1993; Wang et al., 2018; Hong et al., 2022).

      (2) Page 8, lines 285-290 (as well as the preceding discussion under the same subheading & Figure 4): "These findings suggest that ... is not primarily driven by differences in protein-protein interaction patterns ..." The authors' logic in terms of physical explanation is somewhat problematic here. In this regard, "Protein-protein interaction patterns" appear to be a straw man, so to speak. Indeed, who (reference?) has argued that the difference in the capability of Y and F in promoting phase separation should be reflected in the pairwise amino acid interaction pattern in a condensate that contains either only Y (and G, S) and only F (and G, S) but not both Y and F? Also, this paragraph in the manuscript seems to suggest that the authors' observation of similar contact patterns in the GSY and GSF condensates is "counterintuitive" given the difference in Y-Y and F-F potentials of mean force (Joseph et al., 2021); but there is nothing particularly counterintuitive about that. The two sets of observations are not mutually exclusive. For instance, consider two different homopolymers, one with a significantly stronger monomer-monomer attraction than the other. The condensates for the two different homopolymers will have essentially the same contact pattern but very different stabilities (different critical temperatures), and there is nothing surprising about it. In other words, phase separation propensity is not "driven" by contact pattern in general, it's driven by interaction (free) energy. The relevant issue here is total interaction energy or the critical point of the phase separation. If it is computationally feasible, the authors should attempt to determine the critical temperatures for the GSY condensate versus the GSF condensate to verify that the GSY condensate has a higher critical temperature than the GSF condensate. That would be the most relevant piece of information for the question at hand.

      We are grateful for this very insightful comment by the referee. We have followed this suggestion to address whether, despite similar interaction patterns in GSY and GSF condensates, their stabilities are different. As in our previous work (De Sancho, 2022), we have run replica exchange MD simulations for both condensates and derived their phase diagrams. Our results, shown in the new Figure 5 and supplementary Figs. S6-S7, clearly indicate that the GSY condensate has a lower saturation density than the GSF condensate. This result is consistent with the trends observed in experiments on mutants of the low-complexity domain of hnRNPA1, where the relative amounts of F and Y determine the saturation concentration (Bremer et al., 2022).

      (3) Page 9, lines 315-316: "...Our ε [relative permittivity] values ... are surprisingly close to that derived from experiment on Ddx4 condensates (45{plus minus}13) (Nott et al., 2015)".  For accuracy, it should be noted here that the relative permittivity provided in the supplementary information of Nott et al. was not a direct experimental measurement but based on a fit using Flory-Huggins (FH), but FH is not the most appropriate theory for a polymer with long-spatial-range Coulomb interactions. To this reviewer's knowledge, no direct measurement of relative permittivity in biomolecular condensates has been made to date. Explicit-water simulation suggests that the relative permittivity of Ddx4 condensate with protein volume fraction ≈ 0.4 can have a relative permittivity ≈ 35-50 (Das et al., PNAS 2020, Fig.7A), which happens to agree with the ε = 45{plus minus}13 estimate. This information should be useful to include in the authors' manuscript.

      We thank the referee for this useful comment. We are aware that the estimate we mentioned is not direct. We have now clarified this point and added the additional estimate from Das et al. In the new version of the manuscript, we say:

      Our 𝜀 values for the condensates (39 ± 5 for GSY and 47 ± 3 for GSF) are surprisingly close to that derived from experiments on Ddx condensates using Flory-Huggins theory (45±13) (Nott et al., 2015) and from atomistic simulations of Ddx4 (∼35−50 at a volume fraction of 𝜙 = 0.4) (Das et al., 2020).

      (4) As for the dielectric environment within biomolecular condensates, coarse-grained simulation has suggested that whereas condensates formed by essentially electric neutral polymers (as in the authors' model systems) have relative permittivities intermediate between that of bulk water and that of pure protein (ε=2-4, or at most 15), condensates formed by highly charged polymers can have relative permittivity higher than that of bulk water [Wessén et al., J Phys Chem B 125:4337-4358 (2021), Fig.14 of this reference]. In view of the role of aromatic residues (mainly Y and F) in the phase separation of IDPs such as A1-LCD and LAF-1 that contain positively and negatively charged residues (Martin et al., 2020; Schuster et al., 2020, already cited in the manuscript), it should be useful to address briefly how the relationship between the relative phase-separation promotion strength of Y vs F and dielectric environment of the condensate may or may not be change with higher relative permittivities.

      We thank the referee for their comment regarding highly charged polymers. However, we have chosen not to address these systems in our manuscript, as they are significantly different from the GSY/GSF peptide condensates under investigation. In polyelectrolyte systems, condensate formation is primarily driven by electrostatic interactions and counterion release, while we highlight the role of transfer free energies. At high dielectric constants (and dielectrics even higher than that of water), the strength of electrostatic interactions will be greatly reduced. In our approach to estimate differences between Y and F, the transfer free energy should plateau at a value of ΔΔG=0 in water. At greater values of ε>80, it becomes difficult to predict whether additional effects might become relevant. As this lies beyond the scope of our current study, we prefer not to speculate further.

      (5) The authors applied the dipole moment fluctuation formula (Eq.2 in the manuscript) to calculate relative permittivity in their model condensates. Does this formula apply only to an isotropic environment? The authors' model condensates were obtained from a "slab" approach (page 4 and thus the simulation box has a rectangular geometry. Did the authors apply Equation 2 to the entire simulation box or only to the central part of the box with the condensate (see, e.g., Figure 3C in the manuscript). If the latter is the case, is it necessary to use a different dipole moment formula that distinguishes between the "parallel" and "perpendicular" components of the dipole moment (see, e.g., Equation 16 in the above-cited Wessén et al. 2021 paper). A brief added comment will be useful.

      We have calculated the relative permittivity from dense phases only. These dense phases were sliced from the slab geometry and then re-equilibrated. Long simulations were then run to converge the calculation of the dielectric constant. We have clarified this in the Methods section of the paper. We say:

      For the calculation of the dielectric constant of condensates, we used the simulations of isolated dense phases mentioned above.

      (6) Concerning the general role of Y and F in the phase separation of biomolecules containing positively charged Arg and Lys residues, the relative strength of cation-π interactions (cation-Y vs cation-F) should be addressed (in view of the generality implied by the title of the manuscript), or at least discussed briefly in the authors' manuscript if a detailed study is beyond the scope of their current effort. It has long been known that in the biomolecular context, cation-Y is slightly stronger than cation-F, whereas cation-tryptophan (W) is significantly stronger than either cation-Y and cation-F [Wu & McMahon, JACS 130:12554-12555 (2008)]. Experimental data from a study of EWS (Ewing sarcoma) transactivation domains indicated that Y is a slightly stronger promoter than F for transcription, whereas W is significantly stronger than either Y or F [Song et al., PLoS Comput Biol 9:e1003239 (2013)]. In view of the subsequent general recognition that "transcription factors activate genes through the phase-separation capacity of their activation domain" [Boija et al., Cell 175:1842-1855.e16 (2018)] which is applicable to EWS in particular [Johnson et al., JACS 146:8071-8085 (2024)], the experimental data in Song et al. 2013 (see Figure 3A of this reference) suggests that cation-Y interactions are stronger than cation-F interactions in promoting phase separation, thus generalizing the authors' observations (which focus primarily on Y-Y, Y-F and F-F interactions) to most situations in which cation-Y and cation-F interactions are relevant to biomolecular condensation.

      We thank our referee for this insightful comment. While we restrict our analysis to aromatic pairs in this work, the observed crossover will certainly affect other pairs where tyrosine or phenylalanine are involved. We now comment on this point in the discussions section of the revised manuscript. This topic will be explored in detail in a follow-up manuscript we are currently completing. We say:

      We note that, although we have not included in our analysis positively charged residues that form cation-π interactions with aromatics, the observed crossover will also be relevant to Arg/Lys contacts with Phe and Tyr. Following the rationale of our findings, within condensates, cation-Tyr interactions are expected to promote phase separation more strongly than cation-Phe pairs.

      (7) Page 9: The observation of weaker effective F-F (and a few other nonpolar-nonpolar) interactions in a largely aqueous environment (as in an IDP condensate) than in a nonpolar environment (as in the core of a folded protein) is intimately related to (and expected from) the long-recognized distinction between "bulk" and "pair" as well as size dependence of hydrophobic effects that have been addressed in the context of protein folding [Wood & Thompson, PNAS 87:8921-8927 (1990); Shimizu & Chan, JACS 123:2083-2084 (2001); Proteins 49:560-566 (2002)]. It will be useful to add a brief pointer in the current manuscript to this body of relevant resources in protein science.

      We thank the referee for bringing this body of work to our attention. In the revised version of our work, we briefly mention how it relates to our results. We also note that the suggested references have pointed to another of the limitations of our study, that of chain connectivity, addressed in the work by Shimizu and Chan. While we were well aware of these limitations, we had not mentioned them in our manuscript. Concerning the distinction between pair and bulk hydrophobicities, we include the following in the concluding lines of our work:

      The observed context dependence has deep roots in the concepts of “pair” and “bulk” hydrophobicity (Wood and Thompson, 1990; Shimizu and Chan, 2002). While pair hydrophobicity is connected to dimerisation equilibria (i.e. the second step in Figure 2B), bulk hydrophobicity is related to transfer processes (the first step). Our work stresses the importance of considering both the pair contribution that dominates at high solvation, and the transfer free energy contribution, which overwhelms the interaction strength at low dielectrics.

      Reviewer #2 (Public review):

      Summary:

      In this preprint, De Sancho and López use alchemical molecular dynamics simulations and quantum mechanical calculations to elucidate the origin of the observed preference of Tyr over Phe in phase separation. The paper is well written, and the simulations conducted are rigorous and provide good insight into the origin of the differences between the two aromatic amino acids considered.

      We thank the referee for his/her positive assessment of our work. Below, we address all the questions raised one by one.

      Strengths:

      The study addresses a fundamental discrepancy in the field of phase separation where the predicted ranking of aromatic amino acids observed experimentally is different from their anticipated rankings when considering contact statistics of folded proteins. While the hypothesis that the difference in the microenvironment of the condensed phase and hydrophobic core of folded proteins underlies the different observations, this study provides a quantification of this effect. Further, the demonstration of the crossover between Phe and Tyr as a function of the dielectric is interesting and provides further support for the hypothesis that the differing microenvironments within the condensed phase and the core of folded proteins is the origin of the difference between contact statistics and experimental observations in phase separation literature. The simulations performed in this work systematically investigate several possible explanations and therefore provide depth to the paper.

      Weaknesses:

      While the study is quite comprehensive and the paper well written, there are a few instances that would benefit from additional details. In the methods section, it is unclear as to whether the GGXGG peptides upon which the alchemical transforms are conducted are positioned restrained within the condensed/dilute phase or not. If they are not, how would the position of the peptides within the condensate alter the calculated free energies reported? 

      The peptides are not restrained in our simulations and can therefore diffuse out of the condensate given sufficient time. Although the GGXGG peptide can, given sufficient time, leave the peptide condensate, we did not observe any escape event in the trajectories we used to generate starting points for switching. Hence, the peptide environment captured in our calculations reflects, on average, the protein-protein and protein-solvent interactions inside the model condensate. We believe this is the right way of performing the calculation of transfer free energy differences into the condensate. We have clarified this point when we describe the equilibrium simulation results in the revised manuscript. We say:

      Also, the peptide that experiences the transformation, which is not restrained, must remain buried within the condensate for all the snapshots that we use as initial frames, to avoid averaging the work in the dilute and dense phases.

      On the referee’s second point of whether there would be differences if the peptide visited the dilute phase, the answer is that, indeed, we would. We expect that the behaviour of the peptide would approach ΔΔG=0, considering the low protein concentration in the dilute phase. For mixed trajectories with sampling in both dilute and dense phases, our expectation would be a bimodal distribution in the free energy estimates from switching (see e.g. Fig. 8 in DOI:10.1021/acs.jpcb.0c10263). Because we are exclusively interested in the transfer free energies into the condensate, we do not pursue such calculations in this work.

      It would also be interesting to see what the variation in the transfer of free energy is across multiple independent replicates of the transform to assess the convergence of the simulations. 

      Upon submission of our manuscript, we were confident that the results we had obtained would pass the test of statistical significance. We had, after all, done many more simulations than those reported, plus the comparable values of ΔΔG<sub>Transfer</sub> for both GSY and GSF pointed in the right direction. However, we acknowledge that the more thorough test of running replicates recommended by the referee is important, considering the slow diffusion within the Tyr peptide condensates due to its stickiness. Also, the non-equilibrium switching method had not been tested before for dense phases like the ones considered here.

      We have hence followed our referee's suggestion and done three different replicates, 1 μs each, of the equilibrium runs starting from independent slab configurations, for both the GSY and GSF condensates (see the new supporting figures Fig. S1, S2 and S5). We now report the errors from the three replicates as the standard error of the mean (bootstrapping errors remain for the rest of the solvents). Our results are entirely consistent with the values reported originally, confirming the validity of our estimates.

      Additionally, since the authors use a slab for the calculation of these free energies, are the transfer free energies from the dilute phase to the interface significantly different from those calculated from the dilute phase to the interior of the condensate? 

      We thank the referee for this valuable comment, as it has pointed us in the direction of a rapidly increasing body of work on condensate interfaces, for example, as mediators of aggregation, that we may consider for future study with the same methodology. However, as discussed above, we have not considered this possibility in our work, as we decided to focus on the condensate environment, rather than its interface.

      The authors mention that the contact statistics of Phe and Tyr do not show significant difference and thereby conclude that the more favorable transfer of Tyr primarily originates from the dielectric of the condensate. However, the calculation of contacts neglects the differences in the strength of interactions involving Phe vs. Tyr. Though the authors consider the calculation of energy contact formation later in the manuscript, the scope of these interactions are quite limited (Phe-Phe, Tyr-Tyr, Tyr-Amide, Phe-Amide) which is not sufficient to make a universal conclusion regarding the underlying driving forces. A more appropriate statement would be that in the context of the minimal peptide investigated the driving force seems to be the difference in dielectric. However, it is worth mentioning that the authors do a good job of mentioning some of these caveats in the discussion section.

      We thank the referee for this important comment. Indeed, the similar contact statistics and interaction patterns that we reported originally do not necessarily imply identical interaction energies. In other words, similar statistics and patterns can still result in different stabilities for the Phe and Tyr condensates if the energetics are different. Hence, we cannot conclude that the GSF and GSY condensate environments are equivalent.

      To address this point, we have run new simulations for the revised version of our paper, using the temperature-replica exchange method, as before. From the new datasets, we derive the phase diagrams for both the GSF and GSY condensates (see the new Fig. 5). We find that the tyrosine-containing condensate is more stable than that of phenylalanine, as can be inferred from the lower saturation density in the low-density branch of the phase diagram. In consequence, despite the similar contact statistics, the energetics differ, making the saturation density of the GSY slightly lower than that of GSF. This result is consistent with experimental data by Bremer et al (Nat. Chem. 2022). 

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors address the paradox of how tyrosine can act as a stronger sticker for phase separation than phenylalanine, despite phenylalanine being higher on the hydrophobicity scale and exhibiting more prominent pairwise contact statistics in folded protein structures compared to tyrosine.

      We are grateful for the referee’s favourable opinion on the paper. Below, we address all of the issues raised.

      Strengths:

      This is a fascinating problem for the protein science community with special relevance for the biophysical condensate community. Using atomistic simulations of simple model peptides and condensates as well as quantum calculations, the authors provide an explanation that relies on the dielectric constant of the medium and the hydration level that either tyrosine or phenylalanine can achieve in highly hydrophobic vs. hydrophilic media. The authors find that as the dielectric constant decreases, phenylalanine becomes a stronger sticker than tyrosine. The conclusions of the paper seem to be solid, it is well-written and it also recognises the limitations of the study. Overall, the paper represents an important contribution to the field.

      Weaknesses:

      How can the authors ensure that a condensate of GSY or GSF peptides is a representative environment of a protein condensate? First, the composition in terms of amino acids is highly limited, second the effect of peptide/protein length compared to real protein sequences is also an issue, and third, the water concentration within these condensates is really low as compared to real experimental condensates. Hence, how can we rely on the extracted conclusions from these condensates to be representative for real protein sequences with a much more complex composition and structural behaviour?

      We agree with the main weakness identified by the referee. In fact, all these limitations had already been stated in our original submission. Our ternary peptide condensates are just a minimal model system that bears reasonable analogies with condensates, but definitely is not identical to true LCR condensates. The analogies between peptide and protein condensates are, however, worth restating: 

      (1) The limited composition of the peptide condensates is inspired by LCR sequences (see Fig. 4 in Martin & Mittag, 2018).

      (2) The equilibrium phase diagram, showing a UCST, is consistent with that of LCRs from Ddx4 or hnRNPA1.

      (3) The dynamical behaviour is intermediate between liquid and solid (De Sancho, 2022). 

      (4) The contact patterns are comparable to those observed for FUS and LAF1 (Zheng et al, 2020).

      The third issue pointed out by the referee requires particular attention. Indeed, the water content in the model condensates is low (~200 mg/mL for GSY) relative to the experiment (e.g. ~600 mg/mL for FUS and LAF-1 from simulations). Considering that both interaction patterns and solvation contribute to the favorability of Tyr relative to Phe, we speculate that a greater degree of solvation in the true protein condensates will further reinforce the trends we observe.

      In any case, in the revised version of the manuscript, we have made an effort to insist on the limitations of our results, some of which we plan to address in future work.

      Reviewer #3 (Recommendations for the authors):

      (1) The fact that protein density is so high within GSY or GSF peptide condensates may significantly alter the conclusions of the paper. Can the authors show that for condensates in which the protein density is ~0.2-0.3 g/cm3, the same conclusions hold? Could the authors use a different peptide sequence that establishes a more realistic protein concentration/density inside the condensate?

      Unfortunately, recent work with a variety of peptide sequences suggests that finding peptides in the density range proposed by the referee may be very challenging. For example, Pettit and his co-workers have extensively studied the behaviour of GGXGG peptides. In a recent work, using the CHARMM36m force field and TIP3P water, they report densities of ~1.2-1.3 g/mL for capped pentapeptide condensates (Workman et al, Biophys. J. 2024; DOI: 10.1016/j.bpj.2024.05.009). Brown and Potoyan have recently run simulations of zwitterionic GXG tripeptides with the Amber99sb-ILDNQ force field and TIP3P water, starting with a homogenous distribution in cubic simulation boxes (Biophys. J. 2024, DOI: 10.1016/j.bpj.2023.12.027). In a box with an initial concentration of 0.25 g/mL, upon phase separation, the peptide ends up occupying what would seem to be ~1/3 of the box, although we could not find exact numbers. This would imply densities of ~0.75 g/mL in the dense phase, with the additional problem of many charges. Finally, Joseph and her co-workers have recently simulated a set of hexapeptide condensates with varied compositions using a combination of atomistic and coarse-grained simulations. For the atomistic simulations, the Amber03ws force field and TIP4P water were used (see BioRxiv reference 10.1101/2025.03.04.641530). They have found values of the protein density in the dense phase ranging between 0.8 and 1.2 g/mL.  The consistency in the range of densities reported in these studies suggests that short peptides, at least up to 7-residues long, tend to form quite dense condensates, akin to those investigated in our work. While the examples mentioned do not comprehensively span the full range of peptide lengths, sequences, and force fields, they nonetheless support the general behaviour we observe. A systematic exploration of all these variables would require an extensive search in parameter space, which we believe falls outside the scope of the present study.

      (2) Do the conclusions hold for phase-separating systems that mostly rely on electrostatic interactions to undergo LLPS, like protein-RNA complex coacervates? In other words, could the authors try the same calculations for a binary mixture composed of polyR-polyE, or polyK-polyE?

      This is an excellent idea that we may attempt in future work, but the remit of the current work is aromatic amino acids Phe and Tyr only. Hence, we do not include calculations or discussion on polyR-polyE systems in our revised manuscript.

      (3) One of the major approximations made by the authors is the length of the peptides within the condensates, which is not realistic, or their density. Specifically, could they double or triple the length of these peptides while maintaining their composition so it can be quantified the impact of sequence length in the transfer of free energies?

      We thank the referee for this comment and agree with the main point, which was stated as a limitation in our original submission. The suggested calculations anticipate research that we are planning but will not include in the current work. One of the advantages of our model systems is that the small size of the peptides allows for small simulation boxes and relatively rapid sampling. Longer peptide sequences would require conformational sampling beyond our current capabilities, if done systematically. An example of these limitations is the amount of data that we had to discard from the new simulations we report, which amounts to up to 200 ns of our replica exchange runs in smaller simulation boxes (i.e. >19 μs in total for the 48 replicas of the two condensates!). As stated in the answer to point 1, we have found in the literature work on peptides in the range of 1-7 residues with consistent densities. Additionally, a recent report using alchemical transformations using equilibrium techniques with tetrapeptide condensates, pointing to the role of transfer free energy as driving force for condensate formation, further supports the observations from our work.

      Minor issues:

      (1) The caption of Figure 3B is not clear. It can only be understood what is depicted there once you read the main text a couple of times. I encourage the authors to clarify the caption.

      We have rewritten the caption for greater clarity. Now it reads as follows:

      Time evolution of the density profiles calculated across the longest dimension of the simulation box (L) in the coexistence simulations. In blue we show the density of all the peptides, and in dark red that of the F/Y residue in the GGXGG peptide.

      (2) Why was the RDF from Figure 5A cut at such a short distance? Can the authors expand the figure to clearly show that it has converged?

      In the updated Figure 5 (now Fig. 6), we have extended the g(r) up to r=1.75 nm so that it clearly plateaus at a value of 1.

    1. eLife Assessment

      This valuable study reports evidence that items maintained in working memory can bias attention in an oscillatory manner, with the attentional capture effect fluctuating at theta frequency. The study provides incomplete evidence that this dynamic attentional bias is associated with oscillatory neural mechanisms, particularly in the alpha and theta bands, as measured by EEG. The study will be relevant for researchers studying attention, working memory, and neural oscillations, particularly those interested in how memory and perception interact over time.

    2. Reviewer #1 (Public review):

      Summary

      In the presented paper, Lu and colleagues focus on how items held in working memory bias someone's attention. In a series of three experiments, they utilized a similar paradigm in which subjects were asked to maintain two colored squares in memory for a short and variable time. After this delay, they either tested one of the memory items or asked subjects to perform a search task.

      In the search task, items could share colors with the memory items, and the authors were interested in how these would capture attention, using reaction time as a proxy. The behavioral data suggest that attention oscillates between the two items. At different maintenance intervals, the authors observed that items in memory captured different amounts of attention (attentional capture effect).

      This attentional bias fluctuates over time at approximately the theta frequency range of the EEG spectrum. This part of the study is a replication of Peters and colleagues (2020).

      Next, the authors used EEG recordings to better understand the neural mechanisms underlying this process. They present results suggesting that this attentional capture effect is positively correlated with the mean amplitude of alpha power. Furthermore, they show that the weighted phase lag index (wPLI) between the alpha and theta bands across different electrodes also fluctuates at the theta frequency.

      Strengths

      The authors focus on an interesting and timely topic: how items in working memory can bias our attention. This line of research could improve our understanding of the neural mechanisms underlying working memory, specifically how we maintain multiple items and how these interact with attentional processes. This approach is intriguing because it can shed light on neuronal mechanisms not only through behavioral measures but also by incorporating brain recordings, which is definitely a strength.<br /> Subjects performed several blocks of experiments, ranging from 4 to 30, over a few days depending on the experiment. This makes the results - especially those from behavioral experiments 2 and 3, which included the most repetitions - particularly robust.

      Weaknesses

      One of the main EEG results is based on the weighted phase lag index (wPLI) between oscillations in the alpha and theta bands. In my opinion, this is problematic, as wPLI measures the locking of oscillations at the same frequency. It quantifies how reliably the phase difference stays the same over time. If these oscillations have different frequencies, the phase difference cannot remain consistent. Even worse, modeling data show that even very small fluctuations in frequency between signals make wPLI artificially small (Cohen, 2015).

      In response authors stated : "Additionally, the present study referenced previous research by using the wPLI index as a measure of cross-frequency coupling strength31,64-66"<br /> Unfortunately, after checking those publications, we can see that in paper 31 there is no mention of "wPLI" or "PLV." In 64 and 65, the authors use wPLI, but only to measure same-frequency coherence, whereas cross-frequency coupling is computed by phase-amplitude coupling or cross-frequency coupling also known as n:m-PS. In 66, I cannot find any cross-frequency results, only cross-species analysis. This is very problematic, as it indicates that the authors included references in their rebuttal without verifying their relevance.<br /> 31 de Vries, I. E. J., van Driel, J., Karacaoglu, M. & Olivers, C. N. L. Priority Switches in Visual Working Memory are Supported by Frontal Delta and Posterior Alpha Interactions. Cereb Cortex 28, 4090-4104, doi:10.1093/cercor/bhy223 (2018).64 Delgado-Sallent, C. et al. Atypical, but not typical, antipsychotic drugs reduce hypersynchronized prefrontal-hippocampal circuits during psychosis-like states in mice: Contribution of 5-HT2A and 5-HT1A receptors. Cerebral Cortex 32, 870 3472-3487 (2022). 65 Siebenhühner, F. et al. Genuine cross-frequency coupling networks in human resting-state electrophysiological recordings. PLoS Biology 18, e3000685 (2020). 66 Zhang, F. et al. Cross-Species Investigation on Resting State Electroencephalogram. Brain Topogr 32, 808-824, doi:10.1007/s10548-019-00723-x (2019).

      Another result from the electrophysiology data shows that the attentional capture effect is positively correlated with the mean amplitude of alpha power. In the presented scatter plot, it seems that this result is driven by one outlier. Unfortunately, Pearson correlation is very sensitive to outliers, and the entire analysis can be driven by an extreme case. I extracted data from the plot and obtained a Pearson correlation of 0.4, similar to what the authors report. However, the Spearman correlation, which is robust against outliers, was only 0.13 (p = 0.57) indicating a non-significant relationship.

      Cohen, M. X. (2015). Effects of time lag and frequency matching on phase based connectivity. Journal of Neuroscience Methods, 250, 137-146

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Thank you very much for your recognition of our work and for pointing out the shortcomings. We have made revisions one by one and provided corresponding explanations regarding the issues you raised.

      Weaknesses:

      One of the main EEG results is based on the weighted phase lag index (wPLI) between oscillations in the alpha and theta bands. In my opinion, this is problematic, as wPLI measures the locking of oscillations at the same frequency. It quantifies how reliably the phase difference stays the same over time. If these oscillations have different frequencies, the phase difference cannot remain consistent. Even worse, modeling data show that even very small fluctuations in frequency between signals make wPLI artificially small (Cohen, 2015).

      thank you for raising the question regarding the application of wPLI between the alpha and theta bands, which indeed deserves further explanation. In our study, we referred to some relevant previous literatures and adopted their approach of using wPLI to measure cross-frequency coupling strength, as this index itself can reflect the stability of phase differences. We have also considered the point you mentioned that the phase differences of oscillations with different frequencies are difficult to remain consistent. However, in this study, the presentation times of the two memory items are the same, which is fair to both from this perspective. Moreover, the study observed that the wPLI values of these two items alternately dominate over time, and this changing pattern is consistent with the regularity of behavioral data. It seems hard to explain this as a mere coincidence. 

      The corresponding discussion has been added to the revised part of the paper:“the present study referenced previous research by using the wPLI index as a measure of cross-frequency coupling strength31,64-66 (this index quantifies the stability of phase differences), yet the phases of different oscillations inherently change over time. However, this is fair to the two memory items in the present study, as their presentation times were balanced. The study found that the wPLI values of the two items alternately dominated over time, consistent with the pattern of behavioral data, which is hardly explicable by coincidence”

      Another result from the electrophysiology data shows that the attentional capture effect is positively correlated with the mean amplitude of alpha power. In the presented scatter plot, it seems that this result is driven by one outlier. Unfortunately, Pearson correlation is very sensitive to outliers, and the entire analysis can be driven by an extreme case. I extracted data from the plot and obtained a Pearson correlation of 0.4, similar to what the authors report. However, the Spearman correlation, which is robust against outliers, was only 0.13 (p = 0.57), indicating a non-significant relationship.

      you mentioned that the correlation between the attentional capture effect and the mean amplitude of alpha power in the electrophysiological data might be influenced by an outlier, and you also compared the results of Pearson and Spearman correlation coefficients, which we fully agree with.

      It is true that the small sample size of the current study makes the results vulnerable to interference from extreme data. Regarding this point, I have already explained it in the limitations section of the discussion in the revised manuscript:“the sample size of the current study is small, which may render the results vulnerable to interference from extreme cases”

      The behavioral data are interesting, but in my opinion, they closely replicate Peters and colleagues (2020) using a different paradigm. In that study, participants memorized four spatial positions that formed the endpoints of two objects, and one object was cued. Similarly, reaction times fluctuated at theta frequency, and there was an anti-phase relationship between the two objects. The main novelty of the present study is that this bias can be transferred to an unrelated task. While the current study extends Peters and colleagues' findings to a different task context, the lack of a thorough, direct comparison with Peters et al. limits the clarity of the novel insights provided.

      thank you very much for your attention to the behavioral data and its relevance to the study by Peters et al. (2020). We have noticed that there are similarities in some results between the two studies, which also indicates the stability of the relevant phenomena from one aspect.

      However, we would also like to further explain the differences between this study and the study by Peters et al. In the study by Peters et al., participants memorized four spatial positions that formed the endpoints of two objects (one of which was cued), and their results showed that after the two objects disappeared, attention fluctuated at the theta rhythm between their original positions with an inverse correlation. In contrast, the present study explores the manner of memory maintenance indirectly by leveraging the guiding effect of working memory on attention, effectively avoiding the influence of spatial positions.

      The study by Peters et al. directly examined differences in probe positions, clearly demonstrating that attention undergoes rhythmic changes at the two spatial locations and persists after the objects vanish, but it hardly clarifies the rhythmicity of working memory performance. Whereas the present study directly investigates such performance using the attention-capture effect of working memory, revealing that when maintaining multiple memory items, their attention-capturing capabilities alternate in dominance, i.e., multiple working memory items alternately become priority templates in a rhythmic manner. This is also some new attempts in the research perspective and method of this study.

      The corresponding discussion has been added to the revised part of the paper

      “Similar to the present study, Peters et al. had participants memorize four spatial positions forming the endpoints of two objects (one cued), and their results showed that after the two objects disappeared, attention fluctuated at the theta rhythm between their original positions with an inverse correlation; in contrast, the present study explores the manner of memory maintenance indirectly by leveraging the guiding effect of working memory on attention, effectively avoiding the influence of spatial positions—while Peters et al.’s study, which directly examined differences in probe positions, clearly demonstrates that attention undergoes rhythmic changes at the two spatial locations and persists after the objects vanish, it hardly clarifies the rhythmicity of working memory performance, whereas the present study directly investigates such performance using the attention-capture effect of working memory, revealing that when maintaining multiple memory items, their attention-capturing capabilities alternate in dominance, i.e., multiple working memory items alternately become priority templates in a rhythmic manner.”

      Reviewer #2 (Public review):

      The information provided in the current version of the manuscript is not sufficient to assess the scientific significance of the study.

      thank you very much for pointing out the multiple issues in our manuscript. Due to several revisions of this work, including experimental adjustments, there have been some inconsistencies in details. We appreciate you identifying them one by one.  We have made corresponding revisions based on your comments:

      (1) In many cases, the details of the experiments or behavioral tasks described in the main text are not consistent with those provided in the Materials and Methods section. Below, I list only a few of these discrepancies as examples:

      a) For Experiment 1, the Methods section states that the detection stimulus was presented for 2000 ms (lines 494 and 498), but Figure 1 in the main text indicates a duration of 1500 ms.

      we greatly appreciate you catching this inconsistency. We have made unified revisions by referring to the final implemented experimental procedures.  Corresponding revisions have been made in the paper:

      b) For Experiment 2, not only is the range of SOAs mentioned in the Methods section inconsistent with that shown in the main text and the corresponding figure, but the task design also differs between sections.

      Thank you for bringing this discrepancy to our attention. We have made unified revisions by referring to the final implemented experimental procedures. The correct SOAs are 233:33:867 ms.

      Corresponding revisions have been made in the paper:

      c) For Experiment 3, the main text indicates that EEG recordings were conducted, but in the Methods section, the EEG recording appears to have been part of Experiment 2 (lines 538-540).

      we’re grateful for you noticing this mix-up. In fact, only Experiment 3 is an EEG experiment, and we have made corresponding corrections in the "Methods" section. Corresponding revisions have been made in the paper: “The remaining components after this process were then projected back into the channel space. We extracted data from -500 ms to 2000 ms relative to cue stimulus presentation in Experiment 3.”  

      (2) The results described in the text often do not match what is shown in the corresponding figure. For example:

      a) In lines 171-178, the SOAs at which a significant difference was found between the two conditions do not appear to match those shown in Figure 2A.

      Many thanks for spotting this error. The previous results missed one SOA time, namely 33 ms, leading to a 33 ms difference in time. We have corrected it in the revised manuscript.

      Corresponding revisions have been made in the paper:“Specifically, the capture effect of cued items was significantly greater than that of uncued items at SOAs of 267ms (t(24) = 2.72, p = 0.03, Cohen's d = 1.11), 667ms (t(24) = 2.37, p = 0.03, Cohen's d= 0.97) and 833ms (t(24) = 3.53, p = 0.002, Cohen's d = 1.44), while the capture effect of uncued items was significantly greater than that of cued items at SOAs of 333ms (t(24) = 2.97, p = 0.007, Cohen's d = 1.21), 367ms (t(24) = 2.14, p = 0.04, Cohen's d = 0.87), 433ms (t(24 )= 2.49, p = 0.02, Cohen's d = 1.02), 467ms (t(24)=2.37, p = 0.03, Cohen's d = 0.97) and 567ms (t(24)=2.72, p = 0.02, Cohen's d = 1.11). ”

      (b) In Figure 4, the figure legend (lines 225-228) does not correspond to the content shown in the figure.

      we appreciate you pointing out this oversight. When adjusting the color scheme during the revision of the manuscript, we neglected to revise the legend, which has now been corrected in the revised manuscript.

      Corresponding revisions have been made in the paper:“Figure 4. The red line represents the average across all participants of the Fourier transforms of the differences in capture effects between left and right memory items at the individual level. The gray area represents values below the group average of medians derived from 1000 permutations, with each permutation involving Fourier transforms for each participant. *: p < 0.05.”

      (c) In Figure 9, not sufficient information is provided within the figure or in the text, making it difficult to understand. Consequently, the results described in the text cannot be clearly linked to the figure.

      Thank you for drawing our attention to this issue. We have revised Figure 9 and its legend in the revised manuscript to make them clearer and easier to understand.

      Corresponding revisions have been made in the paper

      (3) Insufficient information is provided regarding the data analysis procedures, particularly the permutation tests used for the data presented in Figures 2B, 4, and 10. The results shown in these figures are critical for the main conclusions drawn in the manuscript.

      we’re thankful for you highlighting this gap. In the revised manuscript, we have provided a more detailed explanation in the "Methods" section, especially regarding the content related to frequency analysis, to make the expression clearer.

      Corresponding revisions have been made in the paper:“As shown in Figure 8, the alpha power (8-14 Hz) induced by cued and uncued items alternated in dominance during the memory retention phase. To quantify this rhythmic alternation, we conducted a spectral analysis following these steps: First, we computed the power difference between cued and uncued items within the 8-14 Hz range during the retention phase. These differences were then downsampled to 100 Hz using a 10 ms window for averaging, generating a one-dimensional time series spanning the 0-2000 ms retention period. This time series was subsequently subjected to amplitude spectrum analysis across frequencies from 1 Hz to 50 Hz using Fourier transformation.

      To assess the statistical significance of the observed spectral features, we employed a permutation test. Specifically, we randomly shuffled the temporal order of the time series of power differences between cued and uncued items—thereby preserving the amplitude distribution of the data while eliminating temporal correlations in the original sequence—and repeated the Fourier transform and spectral analysis for each shuffled time series. This permutation process was replicated 1000 times to generate a null distribution of spectral power values. A frequency component in the original data was considered statistically significant if its power ranked within the top 5% of the corresponding null distribution (p < 0.05).

      We applied the same analytical pipeline to investigate differences in the weighted phase-lag index (wPLI) between the contralateral regions of the two items and the prefrontal cortex during the retention phase. Specifically, wPLI differences (i.e., the difference between the two conditions) were computed, downsampled to 100 Hz using a 10 ms window for averaging to generate a time series spanning 0-2000 ms, and then subjected to amplitude spectrum analysis (1-50 Hz) using Fourier transformation. Significance was assessed via the identical permutation test procedure described above (randomly shuffling the temporal order of the difference time series).”

    1. eLife Assessment

      Marshall et al describe the effects of altering metabotropic glutamate receptor 5 activity on activity of D1 receptor expressing spiny projection neurons in dorsolateral striatum focusing on two states - locomotion and rest. The authors examine effects of dSPN-specific constitutive mGlu5 deletion in several motor tests to arrive at this finding. Effects of inhibiting the degradation of the endocannabinoid 2-arachidonoyl glycerol are also examined. Overall, this is a valuable study that provides solid new information of relevance to movement disorders and possibly psychosis.

    2. Joint Public Review:

      Marshall et al describe the effects of altering metabotropic glutamate receptor 5 activity on activity of D1 receptor expressing spiny projection neurons in dorsolateral striatum focusing on two states - locomotion and rest. The authors examine effects of dSPN-specific constitutive mGlu5 deletion in several motor tests to arrive at this finding. Effects of inhibiting the degradation of the endocannabinoid 2-arachidonoyl glycerol are also examined. Overall, this is a valuable study that provides solid new information of relevance to movement disorders and possibly psychosis.

      The combination of in vivo cellular calcium imaging, pharmacology, receptor knockout and movement analysis is effectively used. The main findings do not involve gross firing rates or numbers of active neurons, but rather are revealed by specialized measures involving Jaccard coefficient and an assessment of coactivity. The authors conclude that mGlu5 expressed in dSPNs contributes to movement through effects on clustered spatial coactivity of dSPNs. More specifically, reduced mGluR5 increases coactivity during rest (defined as low velocity periods) but not during locomotion periods. The authors observe a role for mGlu5 expression in dSPNs in modulating the frequency of mEPSCs, suggesting a role in presynaptic neurotransmitter release. Some data suggesting the story may be different in the other major SPN subpopulation (iSPNs) are also presented but these studies are relatively underdeveloped leaving some ambiguity as to how cell-selective the findings are. In addition, an occlusion experiment in which the pharmacological mGluR5 agents are delivered to the dSPN mGluR5 KO to clarify if other sites of action are involved beyond the proposed D1-expressing neurons is missing. Finally, the authors present a working model that sets the stage for future experimentation. Overall, this study provides an important and detailed assessment of mGluR5 contributions to striatal circuit function and behavior.

      Remaining concerns include:

      (1) To clarify that dSPNs are sole site of action, it is necessary to examine effects of the mGlu5 NAM in the dSPN mGlu5 cKO mice. If the effects of the two manipulations occluded one another this would certainly support the hypothesis that the drug effects are mediated by receptors expressed in dSPNs. A similar argument can be made for examining effects of the JNJ PAM in the cKO mice.

      (2) There is a concern that the D1 Cre line used (Ey262), which may also target cortical neurons expands the interpretation of the study beyond the striatal populations. Further discussion of this point, particularly in the interpretation of the mGluR5 cKO experiments, would provide a better understanding of the contribution of the paper.

      (3) The use of CsF-based whole-cell internal solutions has caused concern in some past studies due to possible interference with G-protein, phosphatase and channel function (https://www.sciencedirect.com/science/article/abs/pii/S1044743104000296, https://www.jneurosci.org/content/jneuro/6/10/2915.full.pdf). It is reassuring the DHPG-induced LTD was still observable with this solution. However, it might be worth examining this plasticity with a different internal to ensure that the magnitude of the agonist effect is not altered by this manipulation.

      (4) Behavioral resolution of actions at low velocity that are termed "rest" are not explored in this study. Thus, a remaining ambiguity is whether the activities in rest include only periods of immobility or other low-velocity activities such as grooming or rearing.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      “Can the authors offer a hypothesis as to how decreased coactivity promotes increased movement velocity.” 

      In our revision we have added an additional metric measuring how spatial coactivity changes during movement onset, the spatial correlation index, which replicates a previous finding that co-activity among proximal neurons is statistically greater surrounding movement onset. We did not find, as outlined in the revision, that mGluR5 manipulations significantly altered this relationship. Our data therefore shows, consistent with that shown previously, that ensembles of dSPNs that are co-active during movement onset, in particular ambulatory movement, are more likely to contain neurons that are closer together and the neurons are highly active. In contrast, rest ensembles contain neurons that are less active but have more highly correlated activity, across all pairwise distances. Additionally, mGluR5 inhibition, genetic or pharmacological, promotes the activation of rest ensembles but does not affect the properties of movement ensembles. Previous studies (e.g. Klaus A. et al., 2017) have shown that neurons in rest ensembles are, in general, unlikely to also be members of movement ensembles, We therefore hypothesize that corticostriatal synapses onto SPNs of rest ensembles are more likely, during spontaneous behavior, to have reduced synaptic weight due to mGluR5 signaling, potentially due to eCB mediated inhibition of neurotransmitter release. Therefore, when we inhibit mGluR5 at these synapses, we increase synaptic weight and increase the probability of activation of this coordinated rest ensemble, which suppresses movement. If, on the other hand, the synapses that govern activation of neurons in movement ensembles have a higher weight, they may be unaffected by mGluR5 inhibition. 

      The use of the Jaccard similarity index in this study is not intuitive and not fully explained by the methods or the diagram in Figure 1. 

      We have added more detail to the paper to explain the methodology of the jaccard similarity measure. The advantage of this method is that is specifically captures cells that are jointly active, as opposed to jointly inactive and is therefore useful for capturing co-activity in our sparsely active Ca<sup>2+</sup> imaging data. 

      The analysis of a possible 2-AG role in the mGlu5 mediated processes is incomplete. 

      We agree that, as an experiment to outline which endocannabinoids are involved in modulating synaptic strength through mGluR5, this experiment alone is not sufficient.

      However, our main focus in this paper is how manipulations of mGluR5 affect the spatiotemporal dynamics of dSPNs and we chose not to focus on specific mechanisms of endocannabinoid signaling, though these would certainly be interesting to investigate further in vivo.

      It would seem to be a simple experiment to examine effects of the mGlu5 NAM in the dSPN mGlu5 cKO mice. If effects of the two manipulations occluded one another this would certainly support the hypothesis that the drug effects are mediated by receptors expressed in dSPNs. A similar argument can be made for examining effects of the JNJ PAM in the cKO mice. 

      We agree that this experiment would be valuable and extend our findings presented in the paper, however, it has practically been outside the scope of the current work. 

      Reviewer #2 (Public review):

      Pharmacological and genetic manipulations of mGluR5 do not differentially/preferentially modulate the activity of proximal vs distal dSPNs, therefore, it could also be interpreted that mGluR5 is blanketly boosting/suppressing all dSPN activity as opposed to differential proximal/distal spatial relationships. 

      As in the response to reviewer 1 above, we have added additional clarification to the text explaining that our manipulations do not differentially affect the co-activity of proximal vs distal dSPNs, this is also quantified throughout the text using the spatial coordination index. However, we disagree that “it could also be interpreted that mGluR5 is blanketly boosting/suppressing all dSPN activity” as we do not observe statistically significant changes in the event rate following either pharmacological or genetic manipulations of mGluR5. Rather, we consistently observe statistically significant changes in co-activity among neurons, the extent to which activity of active neurons during either rest or movement are correlated with each other. This is the central finding of our manuscript, inhibiting or potentiating mGluR5 signaling alters behavior, not by blanket suppression or enhancement of the activity as measured using the event rate, of dSPNs, but by affecting their ensemble dynamic properties.  Co-activity during rest versus ambulatory movement is statistically greater in both proximal and distal cells and inhibiting mGluR5 increases this co-activity and decreases movement. 

      For these analyses of prox vs distal and all others, please include the detail of how many proximal vs distal cells were involved and per subject. 

      We have added a supplemental table that details the number of cells included per subject in all analyses

      Ln. 151-152: Please provide data concerning how volumes of infectivity differ between injecting AAV vs. coating the lens? If these numbers are very different, this could impact the number of Jaccard pairings and bias results. 

      While viral injection may lead to a larger volume of expression, with this one photon imaging method only those cells within ~200 microns of the edge of the lens will be able to be resolved, therefore practically, if there is an additional volume of infected tissue outside of the field of view of the lens, it would not affect the results as these neurons will not be resolved by the endoscope camera. Accordingly, the average number of cells detected per session is very similar following each approach (mean # of cells per session with coating 90.93 ± 23.69 cells, with viral injection 90.03 ± 29.29 cells)

      Is mGluR5 affecting dSPN activity in other measures beyond co-activity and rate? Does the amplitude of events change?

      We have added supplemental data for figures 2, 3, and 5 demonstrating that manipulations of mGluR5 do not affect the amplitude or length of Ca<sup>2+</sup> events included in the analysis. 

      What is the model of mGluR5 signaling in a resting state vs. movement? What other behaviors are occurring when the mouse is in a low velocity "resting state" (0-0.5 cm/s). If this includes other forms of movement (i.e. rearing, grooming) then the animal really isn't in a resting state. This is not mentioned in the open field behavior section of the methods and should be described (Ln. 486) in addition to greater explanation of what behavior measures were obtained from the video tracking software (only locomotion?)

      It would be very interesting to determine if during “rest,” when the animals is not engaged in ambulatory behavior, it may be engaged in some fine motor behavior. However, the resolution of the cameras used to measure locomotor activity in this dataset does not allow us to do this. 

      There is large variability in co-activity in proximal dSPNs when animals are "resting" (2j). Could this be explained by different behavior states within your definition of "rest"?

      We agree that if the animal is engaging in fine motor behavior that we cannot resolve with our behavior setup, this could produce some variability in coactivity. However, as shown previously (e.g. Klaus A. et al., 2017), ensembles active when the animal is not moving (our definition of “resting”), regardless of additional fine motor behaviors the animal may be engaged in when not moving, are substantially different that those ensembles that are active when the animal is moving. We therefore expect that this may limit, although potentially not eliminate, variability due to different behavioral states we may have grouped into our “resting” category. Unfortunately, as mentioned above, we are not able resolve variations in fine motor output in this behavioral data. 

      Have you performed IHC, ISH or another measure to validate D1 cell specific cKO?

      The mGluR5<sup>loxP/loxP</sup> mice used in this study were characterized previously by our lab (Xu et al., 2009), we used the same mice here with a different, but also published and characterized Cre-driver line, Drd1a-Cre Ey262 (Gerfen et al., 2013).

      Why are the "Mean Norm Co-activity" values in 5e so high in this experiment relative to figures 2-4?  

      In experiments where we treated the same animal with vehicle and a drug (i.e., experiments in Figure 2 and 3), we normalized the values for each animal in the drug treatment group to the distal bin of that animal following vehicle treatment. This allowed us to more clearly resolve the changes within each animal due to drug treatment. As comparisons in the data in figure 5 d–f are between different animals (rather than different treatments of the same animal) we could not perform this normalization procedure.  

      Reviewer #3 (Public review):

      Some D1 Cre lines have expression in the cortex. Which specific Cre line was used in this study? 

      We used, Drd1a-Cre Ey262. This is included in methods. 

      The text says JNJ treatment .... increased locomotor speed (Figure 3b) and increased the duration but not frequency of movement bouts (Figure 3c, d). However, the statistics of the figure legends say: however the change in mean velocity (3b) is not significant (p=0.060, U=3, Mann-Whitney U test), nor is the mean bout length during vehicle and JNJ (p=0.060, U=3, Mann-Whitney U test) (3d) Comparison of mean number of bouts of each animal during vehicle and JNJ (p=0.403, U=8, Mann-Whitney U test). 

      This has been corrected to indicate only the change in time spend at rest is statistically significant.

      This effect was most pronounced during periods of rest (Figure 3i, j). The decrease was only in rest? Are the colors in Figure 3J inverted? Therefore, JNJ treatment had effects that were qualitatively the inverse to the effects of fenobam on locomotion and dSPN activity. 

      We have corrected the text to state that, overall, and during periods of rest but not movement, JNJ had effects that were qualitatively the opposite of fenobam.

    1. eLife Assessment

      The important paper presents a new behavioral assay for Drosophila aggression and demonstrates that social experience influences fighting strategies, with group-housed males favoring high-intensity but low-frequency tussling over aggressive lunging observed in isolated males. The experiments are solid and the conclusions are of interest to researchers studying the impact of social isolation on aggression.

    2. Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays. This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Weakness:

      All prior concerns have been addressed in the revised manuscript. The added 'Limitations of the study' section is a welcome and important clarification. Despite these limitations, the study provides valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

    3. Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its possible biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study discusses an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in the neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflicting between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that, in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors suggested that an altered fighting strategy has effects with respect to these behaviors.

      Weaknesses:

      New experimental paradigm in Fig. 6 is quite useful, but as the authors mentioned, still the future investigations are needed to reveal a direct relationship between aggression strategies and reproductive success.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed.

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays using a food cup (Chen et al., 2002). This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Thank you for the precise summary of the manuscript and acknowledgment of the novelty and significance of the study.

      Weakness:

      Although most concerns have been addressed, the manuscript still lacks a rigorous, objective method for quantifying lunging and tussling. Because scoring appears to have been done manually and a single lunge in a 30 fps video spans only 2-3 frames, the 0.2 s cutoff seems arbitrary, and there are no objective criteria distinguishing reciprocal lunging from tussling. Despite this, the study offers valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

      Thank you for this comment. The duration of each lunge was measured by analyzing the videos frame by frame—from the frame before the initiation of the lunge to the frame after its completion—resulting in an average span of 3–5 frames. Given a frame rate of 30 fps, this corresponds to approximately 0.1–0.17 seconds. We acknowledge that there are certain limitations for manually quantifying the two types of aggressive behaviors, which has now been stated in the newly added “Limitations of the Study” section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) and the associated change in aggression strategy are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based on the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors experimentally demonstrated that altered fighting strategy has effects with respect to these behaviors.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Reviewer #3 (Public review):

      In this revised manuscript, Gao et al. presented a series of well-controlled behavioral data showing that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) is enhanced specifically among socially experienced and relatively old males. Moreover, results of behavioral assays led authors to suggest that increased tussling among socially experienced males may increase mating success. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, have not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days old) flies tend to tussle more often than younger (2 to 7-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Newly presented data have made several conclusions convincing. Detailed descriptions of methods to quantify behaviors help understand the basis of their claims by improving transparency. However, I remain concerned about authors' persistent attempt to link the high intensity aggression to reproductive success. The authors' effort to "tone down" the link between the two phenomena remains insufficient. There are purely correlational. I reiterate this issue because the overall value of the manuscript would not change with or without this claim.

      Thank you for acknowledging the novelty and significance of the study. Regarding the relationship you mentioned between high-intensity aggression and reproductive success, we further toned down the statement between them throughout the manuscript in the revised manuscript. We also modified the title to “Social Experience Shapes Fighting Strategies in Drosophila”. In addition, we now added a ‘Limitations of the Study’ section to clearly state the correlation between tussling and reproductive success.

      Reviewer #1 (Recommendations for the authors):

      If possible, mention the EM-connectome data showing the minimal interneuronal path from Or47b ORNs to pC1SS2 neurons (even if derived from the female connectome), which can strengthen the model of parallel sensory-central pathways.

      Thank you for this comment. According to data from the EM connectome, connecting Or47b ORNs to pC1d neurons requires at least two intermediate neurons. An example minimal pathway is: ORN_VA1v (L) → AL-AST1 (L) → PLP245 (L) → pC1d (R). We have added this point in the Discussion section of the revised manuscript.

      I'm not convinced that labeling lunges as "gentle" combat behavior works, either in the abstract or elsewhere. While lunging is indeed a lower-intensity form of aggression compared to tussling, applying anthropomorphic descriptors risks misleading readers.

      Thank you for this comment. We now use “low-intensity” instead of “gentle” to describe lunging.

      In Materials & Methods, please cross-check all figure-panel references after the recent re-numbering (e.g. "Figure 5A6A" etc.).

      Thank you for this comment. We have thoroughly verified the figure panel references in the Materials & Methods section.

      Ensure that Table S1 is clearly cited in the main text where you first describe fly genotypes.

      Thank you for this comment. We have now cited Table S1 in the main text.

      There are multiple grammatical errors and typos throughout the manuscript. Please correct them. Some examples are below, but this is not an exhaustive list:

      Line 98-102 requires rephrasing as the results are already published and not being observed by the authors.

      Thank you for this comment. We have revised the manuscript to “we occasionally observed the high-intensity boxing and tussling behavior in male flies as previously reported (Chen et al., 2002; Nilsen et al., 2004), which….”

      line 116- lower not 'lowed'.

      Corrected.

      line 942 & 945- knock-down males not 'knocking down males'.

      Corrected. Thank you very much for these comments.

      Reviewer #2 (Recommendations for the authors):

      The authors have almost completely answered the major comments I have noted on the ver.1 manuscript: (1) They clearly show changes in fighting strategy in the territory control behavior experiment in Fig. 6-figure supplements. (2) A detailed description of how aggressive behavior is measured. Thus, I am convinced by this revision.

      Thank you for these comments that make the manuscript a better version.

      Furthermore, in Fig. 5, which examined the relationship of pC1[SS2] characteristics with the function of dsx, is a novel data and very interesting. I look forward to further developments.

      Thank you. We will continue to explore this part in our future study.

      However, one point still concerns me.

      Line 192: Although the authors describe it as "usage-dependent," the trans-Tango technique is essentially a postsynaptic cell-labeling technique. It is possible that the labeling intensity in postsynaptic cells increases from the change in expression levels of the Or47b gene due to GH. However, there is no difference in the expression level of the Or47b gene labeled by GFP between SH and GH. Therefore, we cannot conclude that the expression of the Or47b gene is increased by rearing conditions.

      The original paper on trans-TANGO (Talay et al., 2017) does not discuss the usage-dependency. A review of trans-synaptic labeling techniques (Ni, Front Neural Circuits. 2021) discusses that the increase in trans-TANGO signaling with aging may be related to synaptic strength, but there is no experimental evidence for this. In my opinion, the results in Figure 3-figure supplement 2 only weakly suggest that the increase in trans-TANGO signaling may be explained by an increase in synaptic strength due to group rearing.

      We appreciate the reviewer’s insightful comment regarding the interpretation of the trans-Tango signal. Indeed, the original trans-Tango study (Talay et al., 2017) does not claim that the method is usage-dependent. The observed increase in trans-Tango labeling with age, as reported in their supplemental figures, may reflect accumulation over time, potentially influenced by synaptic maturation or increased component expression. To avoid overstating our results, we have revised the relevant statement in the manuscript to remove the term "usage-dependent" and now describe the change in trans-Tango signal more cautiously.  

      Reviewer #3 (Recommendations for the authors):

      Below are the cases where their professed attempts to "tone down the statement" appear ignored:

      Lines 27-29:

      "Our findings... suggest how social experience shapes fighting strategies to optimize reproductive success".

      We have now revised the manuscript to “Our findings… suggest that social experience may shape fighting strategies to optimize reproductive success.”

      Lines 85-86:

      "... discover that this infrequent yet intense form of combat is... crucial for territory dominance and mating competition".

      We have now revised the manuscript to “…discover that this infrequent yet intense form of combat is enhanced by social enrichment, while the low-intensity lunging is suppressed by social enrichment.” 

      Lines 335-339:

      "Here, we found that... GH males tend to... increase the high-intensity tussling, which enhances their territorial and mating competition."

      We have removed “which enhances their territorial and mating competition” in the revised manuscript.

      Lines 343-344:

      "... presenting a paradox between social experience, aggression and reproductive success. Our result resolved this paradox..."

      We have now revised the manuscript to “...Our results provide an explanation for this paradox…”

      Lines 355-358:

      "Interestingly, we found that the mating advantage gained through social enrichment can even offset the mating disadvantage associated with aging, further supporting the vital role of shifting fighting strategies in experienced, aged males."

      We have removed “further supporting the vital role of shifting fighting strategies in experienced, aged males” in the revised manuscript.

      Lines 361-362:

      "These results separate the function of the two fighting forms and rectify out understanding of how social experiences regulate aggression and reproductive success."

      We have removed this sentence in the revised manuscript.

      Some may say that a speculative statement is harmless, but I think it indeed is harmful unless it is clearly indicated as a speculation. It is regrettable that authors remain reluctant to change their claim without providing any new supporting evidence. All three reviewers raised the same concern in the first round of review.

      We apologize for not making the speculative nature of the statement clearer in the previous version. In the revised manuscript, we have now explicitly rephrased sentences to only suggest a correlation but not a causal link between tussling and reproductive success.

      I have no choice but to keep my evaluation of the manuscript as "Incomplete" unless the authors thoroughly eliminate any attempt to link these two. This must go beyond changing a few words in the lines listed above.

      Thank you for this comment. In addition to the lines listed above, we carefully checked all statements regarding the correlation between fighting strategies and reproductive success throughout the full text. Furthermore, we have also added a “Limitations of the Study” section to address the shortcomings of this study in the revised manuscript.

      I do not have the same level of concern over the interpretation of Fig. 6A-C, because this is directly linked to aggressive interactions. Even if the socially isolated males do not engage in tussling, it is not a leap to assume that a different fighting tactic of socially experienced males can give them an advantage in defending a territory. To me, this is a sufficient ethological link with the observed behavioral change.

      Thank you for this insightful comment.

      The following are relatively minor, although important, concerns.

      I beg to differ over the authors' definition of "tussling". Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunging at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases suggest that the definition of "tussling" as opposed to "lunging" has a subjective element. However, I would not delve on this matter further because it is impossible to be completely objective over behavioral classification, even by using a computational method. An important point is that the definition is applied consistently within the publication. I have no reason to doubt that this was not the case.

      Thank you for this comment. Since the analysis of tussling behavior was conducted manually, it is challenging to achieve complete objectivity. However, we made every effort to apply consistent criteria throughout the analysis. We have added a “Limitations of the Study” section in the revised manuscript to clearly state this caveat. We appreciate your understanding.

      Authors now state that "all tester flies were loaded by cold anesthesia" (lines 432-433). I would like to draw attention to the well-known fact that anesthesia, whether by ice or by CO2, are long known to affect fly's subsequent behaviors (for aggression, see Trannoy S. et al., Learn. Mem. 2015. 22: 64-68). It will be prudent to acknowledge the possibility that this handling method could have contributed to unusually high levels of spontaneous tussling, which has not been reported elsewhere before.

      Thank you for this comment. The increased tussling behavior observed in our study is unlikely due to cold anesthesia, as noted by Trannoy S. et al. (2015), cold anesthesia profoundly reduces locomotion and general aggressiveness in flies. We acknowledge that the use of cold anesthesia in behavioral experiments may have potential effects on aggression. To minimize this influence, we allowed the flies to recover and adapt for at least 30 minutes before behavioral recording. Moreover, both control and experimental groups were treated in exactly the same manner to ensure consistency.

      It is intriguing that pC1SS2 neurons are dsx+ but fru-. Authors convincingly demonstrated that these neurons are clearly distinct from the P1a neurons, a well-characterized hub for male social behaviors. It is possible that pC1SS2 neurons overlap with previously characterized dsx+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020, a point authors could have explicitly raised.

      Thank you for this comment. We have added this point into the Discussion section of the revised manuscript, as follows: “That tussling-promoting… aggression (Koganezawa et al., 2016). Moreover, the anatomical features of pC1<sup>SS2</sup> neurons are highly similar to the male-specific aggression-promoting (MAP) neurons identified by another previous study (Chiu et al., 2021).

      I acknowledge the authors' courage to initiate an investigation to a less characterized, high intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there are confusion over the distinction between lunges and tussling, authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategy is convincing. The concern I raised above is about the interpretation of the data, not about the quality of data.

      Thank you for your constructive comments to make this manuscript better.

    1. eLife Assessment

      This fundamental work provides novel insights into the blood flow-dependent mechanisms of neuronal migration and the role of Gherlin signaling in the adult brain. The authors present convincing evidence that newborn rostral migratory stream (RMS) neurons are closely situated alongside blood vessels, preferentially along arterioles, and that migratory speed is correlated with blood flow. They also provide evidence (in vitro and some in vivo) that Ghrelin from blood is involved in augmenting RMS neuron migration speed.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides compelling evidence suggesting that ghrelin, a molecule released in the surrounding of the major adult brain neurogenic niche (V-SVZ) by blood vessels with high blood flow controls the migration of newborn interneurons towards the olfactory bulbs.

      Strengths:

      This study is a tour de force as it provides a solid set of data obtained by time lapse recordings in vivo. The data demonstrate that the migration and guidance of newborn neurons relies on factors released by selective type of blood vessels.

      Weaknesses:

      Some intermediate conclusions are weak and may be reinforced by additional experiments.

      Comments on revisions: The manuscript has improved.

    3. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary: 

      This study provides compelling evidence suggesting that ghrelin, a molecule released in the surroundings of the major adult brain neurogenic niche (V-SVZ) by blood vessels with high blood flow, controls the migration of newborn interneurons towards the olfactory bulbs. 

      Strengths:

      This study is a tour de force as it provides a solid set of data obtained by time-lapse recordings in vivo. The data demonstrate that the migration and guidance of newborn neurons rely on factors released by selective types of blood vessels. 

      Weaknesses:

      Some intermediate conclusions are weak and may be reinforced by additional experiments. 

      We thank the reviewer for the thoughtful evaluation and constructive comments outlined in the “Recommendations for The Authors”. In response, we have incorporated additional data, revised relevant figures, and clarified explanations in the revised manuscript.

      Reviewer #2 (Public review)

      Summary: 

      The authors establish a close spatial relationship between RMS neurons and blood vessels. They demonstrated that high blood flow was correlated with migratory speed. In vitro, they demonstrate that Ghrelin functions as a motogen that increases migratory speed through augmentation of actin cup formation. The authors proceed to demonstrate through the knockdown of the Ghrelin receptor that fewer RMS neurons reach the OB.

      They show the opposite is true when the animal is fasted. 

      Strengths: 

      Compelling evidence of close association of RMS neurons with blood vessels (tissue clearing 3D), preferentially arterioles. Good use of 2-photon imaging to demonstrate migratory speed and its correlation with blood flow. In vitro analysis of Ghrelin administration to cultured RMS neurons, actin visualization, Ghsr1KD, is solid and compelling. 

      We sincerely thank the reviewer for the encouraging comments and helpful suggestions. As noted, our original manuscript lacked sufficient in vivo evidence connecting blood flow with ghrelin signaling. To address this, we have added new data and revised the explanations throughout the manuscript as described below.

      Weaknesses: 

      (1) Novelty of findings attenuated due to prior work, especially Li et al., Experimental Neurology 2014. Here, the authors demonstrated that Ghrelin enhances migration in adultborn neurons in the SVZ and RMS. 

      We agree with the reviewer that the idea that ghrelin enhances migration of new neurons is not entirely novel. The study by Li et al. (2014) provided critical insights that guided our investigation into ghrelin as a blood-derived factor promoting neuronal migration. However, our study expands on this by demonstrating that ghrelin directly stimulates migration via GHSR1a in cultured new neurons, and we further identified the cellular and cytoskeletal mechanisms involved. Specifically, we showed that ghrelin enhances somal translocation by activating actin dynamics at the rear of the cell soma. We have revised the Results and Discussion sections accordingly to emphasize these novel aspects as follows:

      “A previous study demonstrated that the migration of V-SVZ-derived new neurons was attenuated in ghrelin knockout mice (Li et al., 2014). In our study, we found that the migration of cultured new neurons was enhanced by the application of ghrelin to the culture medium, and this effect was abolished by Ghsr1a knockdown (KD). These findings suggest that ghrelin directly stimulates neuronal migration through its receptor, GHSR1a, on new neurons. A previous study showed that GHSR1a is expressed in various regions of the brain (Zigman et al., 2006). In our experiments, new neuron-specific KD of Ghsr1a indicated that ghrelin signaling acts in a cell-autonomous manner to regulate neuronal migration.” (Discussion, page 13, lines 10–18)

      “Furthermore, we identified the cellular and cytoskeletal mechanisms underlying this effect on migration. The results indicate that ghrelin enhances somal translocation during migration by activating actin cytoskeletal dynamics at the rear of the neuronal soma.” (Discussion, page 13, lines 24–26)

      (2) The evidence for blood delivery of Ghrelin is not very convincing. Fluorescently-labeled Ghrelin appears to be found throughout the brain parenchyma, irrespective of the distance from vessels. It is also not clear from the data whether there is a link between increased blood flow and Ghrelin delivery. 

      We agree that the correlation between blood flow and ghrelin transcytosis is not very convincing in our study. As the reviewer pointed out, Figure 3A gives the impression that fluorescent-labeled ghrelin is uniformly distributed throughout the brain parenchyma. However, high-magnification images newly added in Figure 3 show that some, but not all, vessels have particularly strong fluorescent signals in the parenchymal area adjacent to the abluminal side of vascular endothelial cells, visualized by CD31 immunostaining (Feng et al., 2004) (Figure 3A′, A′′). To quantify these observations, we defined two regions: Area I (perivascular area), within 10 μm of the abluminal surface of CD31-positive endothelium; and Area II (distant area), located 10–20 μm away (Figure 3E). Of note, Area I corresponds to the perivascular region where new neurons are frequently observed (Figure 1).

      Importantly, we found strong ghrelin signals in vascular endothelial cells of endomucin-negative high-flow vessels (Figure 3C, D). This suggests that transcytosis of blood-derived ghrelin may occur more frequently in high-flow vessels due to increased endocytosis at the endothelium. To test this, we quantified signal gradients in the extra-vessel regions as fold changes (Area I / Area II), as illustrated in Figure 3E. The proportion of vessel segments with >1.5-fold increases was significantly higher in endomucin-negative vessels than in endomucin-positive ones (Figure 3F). Furthermore, vessels with >2-fold increases were observed exclusively in the endomucinnegative group (6.48% ± 1.18%). 

      These data suggest that, in high-flow vessels, blood-derived ghrelin accumulates more in the immediate perivascular region than in areas further away. This supports the possibility that elevated blood flow delivers a larger amount of ghrelin to the vascular endothelium, enhancing its transcytosis into adjacent brain parenchyma. This mechanism may underlie the preferential migration of new neurons along perivascular regions with high blood flow, as shown in Figure 1.  We have incorporated this new data in Figure 3 and corresponding explanations into the Results, Figure legend and Methods

      (3) The in vivo link between Ghsr1KD and migratory speed is not established. Given the strong work to open the study on blood flow and migratory speed and the in vitro evidence that migratory speed is augmented by Ghrelin, the paper would be much stronger with direct measurement of migration speed upon Ghsr1KD. Indeed, blood flow should also be measured in this experiment since it would address concerns in 2. If blood flow and ghrelin delivery are linked, one would expect that Ghsr1KD neurons would not exhibit increased migratory speed when associated with slow or fast blood flow vessels. 

      In Figure 3, we showed that ghrelin transcytosis occurs preferentially in high-flow vessels, suggesting a role for ghrelin in mediating the effects of blood flow on neuronal migration. However, whether this dependence is solely attributable to ghrelin signaling remains unclear. 

      To address this, we tested whether Ghsr1a-KD modifies the impact of reduced blood  flow on neuronal migration by combining Ghsr1a-KD with bilateral common carotid artery stenosis (BCAS), a chronic cerebral hypoperfusion model (Figure S9A). We found that BCAS decreased the percentage of Ghsr1a-KD new neurons reaching the OB, similar to the effect seen in control neurons (Figure S9B, see also Figure 2A–C). This suggests that blood flow influences neuronal migration even under Ghsr1a-KD conditions. 

      Furthermore, we analyzed the distribution of Ghsr1a-KD neurons with respect to vessel flow characteristics. Even under Ghsr1a-KD, a higher proportion of new neurons were located in the area of endomucin-negative (high-flow) vessels compared with endomucin-positive (low-flow) vessels (Figure S9C), indicating that Ghsr1a-KD does not abolish the preferential association of migrating neurons with high flow vessels. These findings suggest that although ghrelin signaling contributes to blood flow-dependent migration, it is not the sole factor. Other blood-derived signals may also mediate this effect. We have included these new data in Figure S9 and updated the corresponding sections in the Results

      Reviewer #1 (Recommendations for the authors) :

      Major 

      Page 6, Line 13. Please provide in the result section some explanation about how photothrombic clot is induced.  

      We added the following explanation to the Results section to clarify the method used to induce photothrombotic clot formation.

      “For clot formation, a restricted area of selected vessels was irradiated by a two-photon laser immediately after intravenous injection of rose bengal.” (Results, Page 7, lines 27–28)

      Page 6, Line 18. The authors use the marmoset as an additional experimental model. Here, V-SVZ-derived newborn neurons migrate in other brain regions as compared to rodents. Please provide a clear rationale for moving from rodents to "common marmosets" as an experiment model. And why use marmosets only for this set of experiments? 

      We clarified the rationale for using common marmosets in addition to mice as follows:

      “Because blood vessel-guided neuronal migration in the adult brain is a conserved phenomenon across species (Kishimoto et al., 2011; Akter et al., 2021; Shvedov et al., 2024), we hypothesized that blood flow may also influence neuronal migration in other brain regions of primates. The neocortex, which supports higher-order brain functions and has undergone evolutionary expansion in primates, was selected as a target region. In common marmosets, but not in mice, V-SVZ-derived new neurons migrate toward the neocortex and ventral striatum (Akter et al., 2021) (Supplemental Movies S4 and S5).” (Results, Page 6, lines 19–25)

      Figure 2B. The experimental setup is possibly problematic as the lentiviral tracing measurement does not take into consideration the rate of neurogenesis or newborn neuron survival. Can authors assess the rate of proliferation and survival in the VSVZ/RMS upon BCAS to decipher whether the reduced number of cells observed in the OB only results from migration changes? (comparable remark stands for Figure 5) 

      To evaluate whether the reduction in the number of new neurons observed in the OB after BCAS (Figure 2B, C) is due solely to impaired migration, we assessed cell proliferation and survival in the V-SVZ and RMS. Specifically, we quantified the density of Ki67+ proliferating cells and cleaved caspase-3+ apoptotic cells in the sham and BCAS groups. BCAS significantly decreased cell proliferation and increased cell death in both the V-SVZ and RMS (Figure S4), suggesting that reduced neurogenesis and/or survival may contribute to the decreased neuronal distribution in the OB. 

      Although we cannot exclude the possibility that changes in cell proliferation or survival contributed to this effect, our photothrombotic clot formation experiments are better suited to directly examine how acute reduction in blood flow affects neuronal migration. These experiments allowed us to measure the migration speed of new neurons shortly after inducing localized blood flow inhibition. We found that clot formation significantly reduced the migration speed of new neurons (Figure 2E, H), indicating that blood flow changes directly impair neuronal migration in the adult brain. 

      We have included these new data in Figure S4 and updated the corresponding text in the Results, Discussion, Figure legend, and Methods as follows:

      Figure 3. About ghrelin signaling. It is unclear whether its transcytosis occurs in endomucin-negative because of the high bloodstream flow. How can this be explained? What happens upon BCAS, is there still a close relation between ghrelin transcytosis, blood flow, and neuron migration? 

      As correctly noted, our initial explanation and data did not provide sufficient evidence that higher blood flow delivers a larger amount of ghrelin into the brain parenchyma. We found that some vessels had particularly strong fluorescent signals in the parenchymal area adjacent to the abluminal surface of vascular endothelial cells, as visualized by CD31 immunostaining (Feng et al., 2004) (Figure 3A′, A′′). On the basis of our observation that strong fluorescent signals were detected in vascular endothelial cells of endomucin-negative (high-flow) vessels (Figure 3C, D), we hypothesized that ghrelin transcytosis may occur more frequently in high-flow vessels due to increased endocytosis at the vessel endothelium. 

      To test this hypothesis, we quantified signal gradients in the extra-vessel regions by calculating fold changes in fluorescent intensity between two zones: Area I (0–10 μm from the abluminal surface of the endothelium) and Area II (10–20 μm away), as illustrated in Figure 3E. Area I corresponds to the perivascular region where new neurons are frequently found (Figure 1). We found that the proportion of vessel segments with >1.5-fold signal increase in Area I relative to Area II was significantly higher in endomucin-negative vessels than endomucin-positive ones (Figure 3F). Furthermore, vessel segments with >2-fold increases were observed exclusively in the endomucin-negative group (6.48% ± 1.18%). These results support the idea that higher blood flow increases the amount of ghrelin that reaches the luminal surface of vascular endothelial cells, thereby increasing the possibility of ghrelin transcytosis into the brain parenchyma.

      We also examined whether blood flow inhibition–induced by BCAS or photothrombotic clot formation–affects the relationship between ghrelin transcytosis, blood flow, and neuronal migration. The above results suggest that blood flow reduction may decrease ghrelin transcytosis, thereby contributing to impaired neuronal migration. To further explore this, we analyzed the distribution of new neurons around high- versus low-flow vessels under BCAS conditions. In the BCAS group, we still observed a higher density of new neurons in the region of high-flow (endomucin-negative) vessels compared with in low-flow (endomucin-positive) ones (Figure S9C). This suggests that even under reduced blood flow, neuronal migration preferentially occurs near high-flow vessels. Taken together, these results suggest that ghrelin transcytosis, blood flow and neuronal migration are connected, and that this relationship persists under conditions of blood flow reduction.

      Figure 4. Is ghrelin controlling both individual Dcx+ neuron migration as well as chain migration (cells moving more together)? This should be assessed and clarified. 

      How is ghrelin controlling actin dynamics in newborn migrating neurons? Since somal translocation speed and somal stride length are both modulated by ghrelin, this factor may also control MT remodeling, could that be checked? 

      We have revised the manuscript to better explain the role of ghrelin in both modes of neuronal migration–chain and individual. Initially, we demonstrated that ghrelin enhances the migration of new neurons in V-SVZ culture (Figure 4A, B), where these neurons migrate outward as chains, indicating that ghrelin facilitates chain migration. In subsequent in vitro experiments (Figure 4C–M), we showed that ghrelin also enhances the migration of individual neurons. To examine this in vivo, we injected Ghsr1a-KD and control lentiviruses into two different anatomical regions: the V-SVZ, where chain migration originates, and the OB core, where new neurons migrate individually. These experiments enabled us to assess the role of ghrelin signaling in each mode of migration independently. We found that ghrelin enhanced both chain migration in the RMS and individual migration in the OB. These results indicate that ghrelin signaling facilitates both forms of neuronal migration. We added the following text in the Results section:

      “To assess the direct effect of ghrelin on neuronal migration, we applied recombinant ghrelin to V-SVZ cultures, in which new neurons emerge and migrate as chains (Figure 4A). Ghrelin significantly increased the migration distance of these neurons (Figure 4B), indicating enhanced chain migration. We then used super-resolution time-lapse imaging to examine individually migrating neurons with or without knockdown (KD) of growth hormone secretagogue receptor 1a (GHSR1a), a ghrelin receptor expressed in V-SVZ-derived new neurons (Li et al., 2014) (Figure 4C). Ghrelin enhanced the migration speed of control cells (lacZ-KD) cells, indicating that it also facilitates individual migration (Figure 4D).” (Results, Page 9, lines 5–12)

      “Of the total labeled Dcx+ cells, the percentage of Dcx+ cells reaching the GL was significantly lower in the Ghsr1a-KD group than in the control group (Figure 5B, C), suggesting that ghrelin enhances individual radial migration of new neurons in the OB.” (Results, Page 10, lines 5–8) “These data indicate that ghrelin signaling facilitates both individual migration in the OB and chain migration in the RMS.” (Results, Page 10, lines 17–18)

      We also added discussion on how ghrelin may regulate cytoskeletal dynamics in migrating neurons. Ghrelin signaling has been reported to control actin cytoskeletal remodeling in astrocytoma cells (Dixit et al., 2006), which led us to investigate similar effects in migrating neurons. Rac, a member of the Rho GTPase family, was shown to mediate this actin remodeling in astrocytoma migration, suggesting it may also be involved in ghrelin-induced actin cup formation in new neurons. Furthermore, because somal translocation depends not only on actin but also on microtubule dynamics (Kaneko et al., 2017), it is possible that ghrelin influences both systems. Supporting this idea, ghrelin signaling was shown to modulate microtubule behavior via SFK-dependent phosphorylation of α-tubulin (Slomiany and Slomiany, 2017). These findings suggest that ghrelin may enhance somal translocation through coordinated regulation of both the actin and microtubule systems. We added following text in the Results and Discussion sections:

      “Ghrelin signaling has been reported to regulate actin cytoskeletal dynamics in astrocytoma cells (Dixit et al., 2006), which led us to examine whether a similar mechanism operates in migrating neurons.”(Results, Page 9, lines 23–25)

      “Further studies are needed to elucidate how ghrelin promotes actin cup formation in migrating neurons. Given that Rac, a Rho family GTPase, mediates actin remodeling downstream of ghrelin in astrocytoma cells (Dixit et al., 2006), it is possible that Rac may also be involved in ghrelininduced cytoskeletal regulation in new neurons.” (Discussion, Page 13, lines 28–31)

      “In addition to actin remodeling, ghrelin may regulate microtubule dynamics. Ghrelin signaling was shown to modulate microtubules via SFK-dependent phosphorylation of α-tubulin (Slomiany and Slomiany, 2017), raising the possibility that ghrelin promotes somal translocation of new neurons through coordinated regulation of both actin and microtubule networks (Kaneko et al., 2017).” (Discussion, Page 13, line 31–Page 14, line 2)

      It would also be informative to provide immunolabeling of Ghsr1 in the V-SVZ / RMS/ OB to have a clear picture of the expression pattern of this receptor. Newborn neurons migrate along blood vessels, which are surrounded by astrocytes that have also been reported to express Ghsr1, thus could newborn neuron migration change may also arise from activation of Ghsr1 in their surrounding astrocytes? 

      A previous study reported that GHSR1a is expressed in DCX+ new neurons in the RMS and OB, and in V-SVZ neural progenitor cells (Li et al., 2014). To visualize the spatial expression pattern of Ghsr1a, we performed RNAscope in situ hybridization because specific anti-GHSR1a antibodies suitable for immunohistochemistry were not available. Consistent with the previous report, we detected Ghsr1a mRNA in DCX+ new neurons in the VSVZ, RMS, and OB (Figure S5A), indicating that new neurons directly receive ghrelin signaling. 

      Moreover, our KD experiments demonstrated that ghrelin enhanced the migration of new neurons in a cell-autonomous manner via GHSR1a (Figure 4, 5). Nevertheless, a recent study (Stark et al., 2024) showed that GHSR1a was expressed in various cell types, including glutamatergic and GABAergic neurons, suggesting that ghrelin may also exert non-cellautonomous effects on neuronal migration. Given the presence of diverse cell types, including neurons, microglia, pericytes, and astrocytes, along the migratory route, it remains possible that GHSR1a activation in these neighboring cells contributes to the overall regulation of neuronal migration. 

      Figure 5. About the in vivo knockdown of Ghsr1a. The results section (page 9, line 3) mentioned that mice were either injected with one or the other construct but Figure 5 shows coincidence of GFP and dsRed positive cells. Were control and Ghsr1a shRNAs injected together into the same mouse? Could you quantify the number of cells in green (control), red (Ghsr1a KD), and yellow (both)? Won't they mostly be yellow? Have you tried injecting control and Ghsr1a separately? If yes, do you get the same result? Such analysis would be important to separate cell autonomous from noncell autonomous effects. 

      To minimize variability in injection conditions, we initially coinjected control and Ghsr1a-KD lentiviruses into the same mice and analyzed their migration using a paired design. As the reviewer correctly noted, some cells were coinfected and expressed both EmGFP and DsRed (18.7% ± 2.86% of EmGFP+ cells and 10.8% ± 0.533% of DsRed+ cells). To ensure that this overlap did not affect our analysis, we excluded EmGFP+/DsRed+ double-positive cells and focused solely on EmGFP+/DsRed− (control) and EmGFP−/DsRed+ (Ghsr1a-KD) single-positive cells. 

      We agree with the reviewer that coinjection could lead to reciprocal interactions between control and Ghsr1a-KD cells, potentially masking cell-autonomous effects. To address this, we performed an independent experiment in which control and Ghsr1a-KD lentiviruses were injected separately into different mice (Figure S7A), as suggested. Consistent with the results of the coinjection experiment, we found that the Ghsr1a-KD cells showed significantly reduced distribution in the GL compared with that in control cells (Figure S7B). Although we cannot exclude the possibility of a non-cell-autonomous effect of ghrelin, this result supports the conclusion that ghrelin signaling enhances neuronal migration in a cell-autonomous manner. 

      Who is expressing Ghsr1a, newborn neurons, and or their progenitors? The production and survival of newborn V-ZVS cells should be assessed upon knockdown of the ghrelin receptor too. 

      To determine whether the altered distribution of new neurons observed upon Ghsr1aKD is due to impaired migration rather than decreased cell production or survival, we examined the effects of Ghsr1a-KD on the proliferation and survival of new neurons and their progenitors, which express GHSR1a (Li et al., 2014). 

      We compared the proportion of cleaved caspase-3+ cells and Ki67+ cells from the total labeled cells in the V-SVZ and RMS between the control and Ghsr1a-KD groups. There was no significant difference in the proportion of cleaved caspase-3+ cells between the groups (Control: 874 cells from 5 mice; Ghsr1a-KD: 678 cells from 7 mice), suggesting that ghrelin signaling does not affect the survival of new neurons and their progenitors. 

      Similarly, the proportion of Ki67+ cells in the RMS did not differ significantly between the two groups (Figure S8), indicating that Ghsr1a-KD does not impair cell proliferation in the RMS. However, it remains technically difficult to evaluate whether Ghsr1a-KD affects proliferation in the VSVZ, because lentivirus injection into the VSVZ may interfere with GHSR1a expression not only in new neurons and neural progenitors, but also in other cell types known to express GHSR1a (Zigman et al., 2006). A previous study reported that ghrelin signaling promoted cell proliferation in the V-SVZ (Li et al., 2014), thus we cannot exclude the possibility that Ghsr1a-KD may affect V-SVZ proliferation.

      To overcome this limitation, we assessed the effects of Ghsr1a-KD on neuronal migration using in vitro KD experiments (Figure 4C–J) and in vivo OB-core lentivirus injections (Figure 5A–C), both of which did not interfere with proliferation in the V-SVZ. These complementary approaches consistently demonstrated that Ghsr1a-KD reduces the migration speed of new neurons. 

      “To determine whether the altered distribution of new neurons after Ghsr1a-KD is due to impaired migration rather than changes in cell production or survival, we assessed the effects of Ghsr1aKD on the proliferation and survival of new neurons and their progenitors, which express GHSR1a (Li et al., 2014). We quantified the proportion of cleaved caspase-3+ cells and Ki67+ cells from the total labeled cells in the V-SVZ and RMS in both control and Ghsr1a-KD groups. We found no significant difference in cleaved caspase-3+ cell proportions between the groups (Control: 874 cells from 5 mice; Ghsr1a-KD: 678 cells from 7 mice), suggesting that ghrelin signaling does not influence the survival of new neurons and their progenitors. Similarly, the percentage of Ki67+ cells in the RMS was similar between the two groups (Figure S8), indicating that Ghsr1a-KD does not impair cell proliferation in the RMS. However, technical limitations prevented a reliable evaluation of proliferation in the V-SVZ, as lentivirus injection into this region may interfere with GHSR1a expression in not only neural progenitors and new neurons, but also other GHSR1aexpressing cell types (Zigman et al., 2006). Although ghrelin signaling has been reported to promote cell proliferation in the V-SVZ (Li et al., 2014), our complementary in vitro KD experiments (Figure 4C–J) and in vivo OB-core lentivirus injections (Figure 5A–C), which did not affect the V-SVZ, consistently demonstrated that Ghsr1a-KD reduces neuronal migration. Taken together, our results suggest that blood-derived ghrelin enhances neuronal migration in the RMS and OB by stimulating actin cytoskeleton contraction in the cell soma, rather than by altering cell proliferation or survival.” (Results, Page 10, line 19–Page 11, line 4)

      “rat anti-Ki67 (1:500, #14-5698-82, eBioscience); and rabbit anti-cleaved caspase-3 (1:200, #9661, Cell Signaling Technology)” (Methods, Page 48, lines 14–16)

      How much is ghrelin/Ghsr1 signaling conserved in marmosets? 

      How ghrelin signaling is conserved between mice and common marmosets is important to clarify. A previous study reported the existence of a ghrelin homolog in common marmoset, which shares high sequence similarity with that in mice (Takemi et al., 2016). Moreover, the GHSR1a homolog in the common marmoset (https://www.ncbi.nlm.nih.gov/protein/380748978) shares 95.36% amino acid identity with its mouse counterpart. These findings suggest that blood-derived ghrelin may similarly promote neuronal migration in the marmoset brain, as observed in mice. 

      We have added the following text in the Discussion section:

      “Our data showed that new neurons preferentially migrate along arteriole-side vessels rather than venule-side vessels in both mouse and common marmoset brains, suggesting that the mechanism of blood flow-dependent neuronal migration is conserved across rodent and primate species, as well as across brain regions. A previous study identified a ghrelin homolog in the common marmoset with high sequence similarity to the murine version (Takemi et al., 2016). In addition, the marmoset GHSR1a homolog shares 95.36% amino acid identity with that of the mouse (https://www.ncbi.nlm.nih.gov/protein/380748978). These findings suggest that bloodderived ghrelin promotes neuronal migration in the common marmoset brain in a manner similar to that in mice.” (Discussion, Page 15, lines 8–16)

      Page 9. Starvation has been shown to boost ghrelin blood levels. What is the exact protocol used in this experiment and is this indeed increasing Ghrelin release from blood vessels in the V-SVZ? What about Ghsr1 expression level in newborn neurons? 

      We have clarified the calorie restriction (CR) protocol used in our experiments. We adopted a 70% CR protocol, which was previously shown to enhance hippocampal neurogenesis when administered for 14 days (Hornsby et al., 2016). In our study, the daily food intake under ad libitum (AL) conditions was first measured, and CR mice were then fed 70% of that amount for 5 consecutive days (see Figure 5I and Figure S10A). 

      To assess whether CR enhances ghrelin transcytosis into the brain parenchyma, we performed ELISA to quantify ghrelin levels in the OB and RMS. However, ghrelin concentrations were below the detection limit in both groups, precluding a direct comparison.

      We also considered whether CR modulates the expression level of the ghrelin receptor GHSR1a. A recent study reported that fasting increased GHSR1a expression in the OB (Stark et al., 2024), raising the possibility that CR may exert a similar effect. To test this, we performed in situ hybridization and quantified Ghsr1a mRNA puncta in Dcx+ cells in the OB. No significant difference was found between the AL and CR groups (Figure S5B), suggesting that CR does not alter GHSR1a expression levels in new neurons. 

      Although we cannot exclude the possibility that CR increases GHSR1a expression in other OB cell types, our combined CR and Ghsr1a-KD experiments strongly support a cellautonomous contribution of ghrelin signaling to the enhanced neuronal migration observed under CR conditions. Corresponding data and text have been added to Figure S5 and the Results, Discussion, and the Figure legend sections as follows:

      Minor 

      Page 4 

      Line 19 In Supplemental movies 1 and 2, it is unclear where to see the GFP+ new neurons interact with BV. Can you add arrows as an indication for the readers? It will be better to add the anatomy term for orientation, caudal, or rostral in the video. (The same for Supplemental movies 3, 4, and 5).  

      To clarify the regions of interest in Supplemental Movies 1 and 2, where neuron–vessel interactions in the RMS are highlighted, we added dotted lines indicating the RMS boundaries. In addition, we created a new movie (Supplemental Movie S1′) showing a high-magnification view of Supplemental Movie S1, in which arrows mark EGFP+ new neurons interacting with blood vessels. We also added orientation indicators (e.g., caudal and rostral) and arrows to highlight new neuron–vessel interactions in Supplemental Movies S1–S5. 

      The following descriptions have been added to the Figure legends:

      “Supplemental Movie S1′ 

      High-magnification view extracted from Supplemental Movie S1. Arrows indicate EGFP+ cells interacting with blood vessels.” (Figure legend, Page 46, lines 6–8)

      “Arrows indicate EGFP+ cells interacting with blood vessels.” (Figure legend, Supplemental Movie S3, Page 46, lines 16–17)

      “Arrows indicate Dcx+ cells interacting with blood vessels.” (Figure legend, Supplemental Movies S4 and S5, Page 46, lines 21–22, 26–27)

      Blood vessels are labeled in the Supplemental movies 2 and 3 by employing Flt1DsRed transgenic mice instead of RITC-Dex-GMA. However, Flt1-DsRed transgenic mice are not mentioned in the results section. 

      We have now included an explanation regarding the use of Flt1-DsRed mice, in which vascular endothelial cells were labeled with DsRed.

      “To visualize blood vessels, we also used Flt1-DsRed transgenic mice, in which vascular endothelial cells were specifically labeled with DsRed (Matsumoto et al., 2012). Using DcxEGFP/Flt1-DsRed double transgenic mice, we observed close spatial relationships between new neurons and blood vessels (Supplemental Movies S2 and S3).” (Results, Page 4, lines 22– 26)

      Figure 5. Can you indicate (in the figure legend and the result section) the stage of the adult brain used for this experiment? 

      We used 6- to 12-week-old adult male mice in all experiments in this study. To specify this, we have added the age of animals to both the Results and the relevant Figure legends as follows:

      “Therefore, we first studied blood vessel-guided neuronal migration in the RMS and OB using three-dimensional imaging in 6- to 12-week-old adult mice, which enabled analysis of the in vivo spatial relationship between new neurons and blood vessels.” (Results, Page 4, lines 14–16)

      “Figure 1 New neurons migrate along blood vessels with abundant flow in the adult brain.” (Figure legend, Page 25, line 4)

      “(B, C) Three-dimensional reconstructed images of a new neuron (green) and blood vessels (red) in the rostral migratory stream (RMS) (B) and glomerular layer (GL) (C) of 6- to 12-weekold adult mice.” (Figure legend, Page 25, lines 6–8)

      “(E) Transmission electron microscopy image of a new neuron (green) in close contact with a blood vessel (red) in the GL of a 6- to 12-week-old adult mouse.” (Figure legend, Page 26, lines 4–5)

      “(F) Time-lapse images of a migrating neuron (indicated by asterisks) in the GL of a 6- to 12week-old Dcx-EGFP mouse.” (Figure legend, Page 26, lines 6–7)

      “Figure 3 Ghrelin is delivered from the bloodstream to the RMS and OB in the adult brain (A) Representative images of the OB and cortex of a fluorescent ghrelin-infused mouse (6 to 12 weeks old).” (Figure legend, Page 30, lines 1–3)

      “Lentivirus injection into the OB core (A) and the VSVZ (D) was performed in 6- to 12-week-old adult mice.” (Figure legend, Page 33, lines 3–4)

      Reviewer #2 (Recommendations for author):

      Major:

      Ghsr1KD and blood flow 2-photon experiments to directly measure migratory speed. Could also do the same with fasting with or without Ghsr1KD.  

      We thank the reviewer for the valuable suggestion to strengthen our study. As pointed out in the Public Review, we agree that direct in vivo measurement of neuronal migration speed under Ghsr1a-KD conditions is important to clarify the link between ghrelin signaling and blood flow. 

      Two-photon imaging is the most suitable method for this purpose. Although we attempted two-photon imaging of Ghsr1a-KD new neurons, the number of virus-infected cells observed in vivo was too low to yield reliable data. Therefore, we chose an alternative strategy, combining Ghsr1a-KD with blood flow reduction using the BCAS model (Figure S9A), in which migration speed can be quantified based on the percentage of labeled cells reaching the OB. As stated in the Public Review response, BCAS significantly decreased the migration speed of Ghsr1a-KD new neurons (Figure S9B), indicating that Ghsr1a-KD does not abolish the influence of blood flow reduction. These findings suggest that ghrelin signaling is involved, but is not essential, for blood flow-dependent neuronal migration. 

      As suggested by the reviewer, direct observation of migration dynamics (e.g., somal translocation, leading process extension, stationary and migratory phases) is needed, especially in calorie restriction experiments. Although our data indicate that ghrelin signaling is required for fasting-induced increases in migration speed of new neurons, calorie restriction could also change concentrations of other factors in blood (Bonnet et al., 2020; Wu et al., 2024; Alogaiel et al., 2025), which may independently affect behavior of migrating neurons. Given that ghrelin is not the sole factor contributing to blood flow-dependent neuronal migration, other circulating factors could affect behavior of migrating neurons in a different manner during fasting. In vivo twophoton imaging would be a powerful approach to determine whether fasting-induced neuronal migration is caused by upregulated somal translocation speed, which would further support a role for ghrelin in this process.

      We have added the following text in the Discussion:

      “Although our data indicate that ghrelin signaling is essential for fasting-induced acceleration of neuronal migration, calorie restriction may also alter the concentrations of other circulating factors (Bonnet et al., 2020; Wu et al., 2024; Alogaiel et al., 2025), which could independently influence the behavior of migrating neurons.” (Discussion, Page 14, lines 25–29)

      Minor: 

      (1) Show fluorescent Ghreliin in Figure 3 for all brain areas measured in Figure 1 (GL, EPL, GCL, and RMS) for direct comparison.  

      To allow for direct comparison across brain regions, we added a new Supplemental figure showing the distribution of fluorescently labeled ghrelin in the OB, including the GL, EPL, GCL and RMS. This comprehensive view highlights ghrelin localization relative to vasculature and migrating neurons in the regions analyzed in Figure 1.

      (1) Figure 1, panel I is presented in a confusing manner. High blood flow points to 0 degrees, low blood flow to 180 degrees. It implies (unintentionally, I am sure) that low blood flow results in migration away from OB. Maybe plot separately?

      We agree that the original presentation of Figure 1I could be misinterpreted as referring to anatomical orientation (i.e., toward or away from the OB). To avoid confusion, we revised the figure to categorize new neuron–vessel interactions into four groups according to (1) the angle between the migration direction and vessel axis (small or large), and (2) whether the new neuron is migrating toward or away from the direction of higher blood flow. This new presentation avoids implying a fixed anatomical direction and better reflects the relationship between local blood flow and neuronal migration behavior. The revised figure is presented as Supplemental Figure S1.

    1. eLife Assessment

      This important work begins to understand how BDNF regulates the phosphorylation and activity of LRRK2. The overall strength of evidence has been assessed as compelling, though some claims are only partially supported. The work will be of interest for those that might pursue specific LRRK2 interactions and mutational effects on these pathways as the work continues to develop.

    2. Reviewer #1 (Public review):

      Summary:

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity.

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and brain tissue from genetically modified mice. They examine a LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function.

      Strengths:

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health.

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. IN this revised manuscript, aspects are well validated e.g., drebrin binding, but there is a disconnect between these findings and alterations to LRRK2 substrates. A convincing phosphoproteomic analysis of PD mutant Knock-in mouse brain is included. Overall the links between LRRK2, LRRK2 activity, and the changes to synaptic molecules, structures, and activity are intriguing.

      Weaknesses:

      The data sets remain disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking.

      Main Conclusions of Abstract:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not pERK pAkt & pRab?

      (2) Omics Proteome remodelling of LRRK2 interactome with BDNF & different in G2019S mouse neurons.

      Supports that the phosphoproteome of G2019S is different. Drebrin interaction with LRRK2 very well supported. Link between drebrin and LRRK2 activity somewhat supported (pS935 site), but the consequence (non-specific pRab8) not supported, as there is no evidence of a change in LRRK2 substrate(s).

      (3) Golgi 1 month LKO mouse altered dendritic spines, transient at 1m not older.

      Supported but very small transient change in spines, disconnected to other results (e.g., drebrin).

      (4) iPSC-derived neurons BDNF increases mEPSC frequency (transient at 70 not 50 or 90 days) in WT not KO "which appear to bypass this regulation through developmental compensation"

      Weak, not clear what is being bypassed.

      Main Conclusions Based on Old and New Figure / Data:

      (1) Increase in pLRRK2 Ser935 and pRAB after BDNF in SH-SY5Y & mouse neurons

      Well supported, but only for pLRRK2 in neurons, why not ERK Akt & Rab?

      (2) BDNF promotes LRRK2 interaction with "post-synaptic actin cytoskeleton components"

      Tone down, only one postsynaptic validated - drebrin strong BUT CONTRADICTORY; link between drebrin and LRRK2 activity (pS935 site) supported, consequence (non-specific pRab8) broken, no evidence of change in LRRK2 substrate.

      (3) LRRK2 G2019S striatal phosphoproteome is different from WT.

      It is different. Where is link to BDNF or Drebrin?

      (4) BDNF signaling is impaired in Lrrk2 knockout neurons

      TrkB changes seem higher in SHSY5Y. pAKT impaired, pERK not convincing. Primary neurons Akt slower but it and Erk mostly intact. MLi-2 did not block pAkt or pErk in WT or KO (higher in latter). Whatever is happening in KO, Mli-2 not really blocking effect in WT. If we are to assume that studying the KO was a means to understand LRRK2 function, the authors data should explain why we care if an effect is absent in LKO, if LRRK2 isn't doing the same job in WT?

      BDNF increases synaptic puncta in WT not LKO (which start higher?). Is this BDNF increase blocked by LRRK2 inhibition?

      (5) Postsynaptic structural changes in Lrrk2 knockout neurons

      Golgi impregnation shows some very small spine changes at 1m. Not sustained over age. mRNA changes are very small (10% not even a fold... very weak and should be written as so). Derbrin levels reduced clearly at 1m, but probably also at 4 & 18. Underpowered, disconnected time course from the spine changes.

      (6) An effect on "spontaneous electrical activity" at Div70

      Weak. What is so special at 70 days that means we should be confident in the differences, or be satisfied that the other time points are legitimately ignored? These are 10-11 cells from 3 cultures assayed at 3 time points but only one is presented (rest in supplement). This should be a 2 (time) or 3 way (+culture RM) ANOVA. As it stands, in WT there is a little - no activity at 50 days, little to no at 70 days, and variable to lots or none at 90. BDNF did nothing at 50 or 90 but may have at 70. In KO low activity stable at 50 & 70, tanks at 90. BDNF would seem to have a similar effect on KO at 90 as WT at 70, but as there are only 7 cells it remains inconclusive. Thus the conclusion that BDNF signalling is broken in LKO is not well supported by the ephys data, nor is the BDNF effect in WT cells (even at the 70 day time point) shown to be susceptible to LRRK2 inhibition.

    3. Reviewer #2 (Public review):

      The data show that BDNF regulates the PD-associated kinase LRRK2, they place LRRK2 within well-described BDNF pathways biochemically, and they show that LRRK2 can play a role mediating BDNF-driven synaptic outcomes at excitatory synapses. The chief strength is that the data provide a potential focal point for multiple observations that have been made across many labs. The findings will be of broad interest because LRRK2 has emerged as a protein that is likely to be part of Parkinson's pathology and its normal and pathological actions remain poorly understood.

      A major strength of the study is the multiple approaches that were used (biochemistry, bioinformatics, light and electron microscopy and electrophysiology) across different experimental models (cells, primary neurons, human neurons, mice) to identify and examine the impact of BDNF on LRRK2 signaling and functions. Noteworthy is also the employment of LRRK2KO preparations to validate outcomes and to place LRRK2 actions up or downstream.

      The demonstration that LRRK2 and drebrin interact directly is important and suggests that other interacting proteins identified biochemically and bioinformatically in the paper will be important to pursue.

      Some data from different models do not fit well with one another (like mouse and human neurons). This is likely due to inherent differences in the preparations. Since different experiments were carried out on the different preps, however, it is not possible to cross compare. The lack of this information is viewed more as an open question than a cause for concern.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity. 

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and synaptosome preparations from the brain. They examine an LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function. 

      Strengths: 

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health? 

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. Some effects of BDNF stimulation appear impaired in (some of the) LRRK2 knock-out scenarios (but not all). A phosphoproteomic analysis of PD mutant Knock-in mouse brain synaptosomes is included. 

      We thank this Reviewer for pointing out the strengths of our work. 

      Weaknesses: 

      The data sets are disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is very light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are likely underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking. 

      We appreciate the Reviewer’s overall evaluaVon. In this revised version, we have provided several novel results that strengthen the omics data and the mechanisVc experiments and make the conclusions in line with the data.

      As examples to aid reader interpretation: (a) pS935 LRRK2 seems to go up at 5 minutes but goes down below pre-stimulation levels after (at times when BDNF-induced phosphorylation of other known targets remains very high). This is ignored in favour of discussion/investigation of initial increases, and the fact that BDNF does many things (which might indirectly contribute to initial but unsustained changes to pLRRK2) is not addressed.  

      We thank the Reviewer for raising this important point, which we agree deserves additional investigation. Although phosphorylation does decrease below pre-stimulation levels, a reduction is also observed for ERK/AKT upon sustained exposure to BDNF in our experimental paradigm (figure 1F-G). This phenomenon is well known in response to a number of extracellular stimuli and can be explained by mechanisms related to cellular negative feedback regulation, receptor desensitization (e.g. phosphorylation or internalization), or cellular adaptation. The effect on pSer935, however, is peculiar as phosphorylation goes below the unstimulated level, as pointed by the reviewer. In contrast to ERK and AKT whose phosphorylation is almost absent under unstimulated conditions (Figure 1F-G), the stoichiometry of Ser935 phosphorylation under unstimulated conditions is high. This observation is consistent with MS determination of relative abundance of pSer935 (e.g. in whole brain LRRK2 is nearly 100% phosphorylated at Ser935, see Nirujogi et al., Biochem J 2021).  Thus we hypothesized that the modest increase in phosphorylation driven by BDNF likely reflects a saturation or ceiling effect, indicating that the phosphorylation level is already near its maximum under resting conditions. Prolonged BDNF stimulation would bring phosphorylation down below pre-stimulation levels, through negative feedback mechanisms (e.g. phosphatase activity) explained above. To test this hypothesis, we conducted an experiment in conditions where LRRK2 is pretreated for 90 minutes with MLi-2 inhibitor, to reduce basal phosphorylation of S935. After MLi-2 washout, we stimulated with BDNF at different time points. We used GFP-LRRK2 stable lines for this experiment, since the ceiling effect was particularly evident (Figure S1A) and this model has been used for the interactomic study. As shown below (and incorporated in Fig. S1B in the manuscript), LRRK2 responds robustly to BDNF stimulation both in terms of pSer935 and pRABs. Phosphorylation peaks at 5-15 mins, while it decreases to unstimulated levels at 60 and 180 minutes. Notably, while the peak of pSer935 at 5-15 mins is similar to the untreated condition (supporting that Ser935 is nearly saturated in unstimulated conditions), the phosphorylation of RABs during this time period exceeds unstimulated levels. These findings support the notion that, under basal conditions, RAB phosphorylation is far from saturation. The antibodies used to detect RAB phosphorylation are the following: RAB10 Abcam # ab230261 e RAB8 (pan RABs) Abcam # ab230260.

      Given the robust response of RAB10 phosphorylation upon BDNF stimulation, we further investigated RAB10 phosphorylation during BDNF stimulation in naïve SH-SY5Y cells. We confirmed that the increase in pSer935 is coupled to increase in pT73-RAB10. Also in this case, RAB10 phosphorylation does not go below the unstimulated level, which aligns with the  low pRAB10 stoichiometry in brain (Nirujogi et al., Biochem J 2021). This experiment adds the novel and exciting finding that BDNF stimulation increases LRRK2 kinase activity (RAB phosphorylation) in neuronal cells. 

      Note that new supplemental figure 1 now includes: A) a comparison of LRRK2 pS935 and total protein levels before and after RA differentiation; B) differentiated GFP-LRRK2 SH-SY5Y (unstimulated, BDNF, MLi-2, BDNF+MLi-2); C) the kinetic of BDNF response in differentiated GFP-LRRK2 SH-SY5Y.

      (b) Drebrin coIP itself looks like a very strong result, as does the increase after BDNF, but this was only demonstrated with a GFP over-expression construct despite several mouse and neuron models being employed elsewhere and available for copIP of endogenous LRRK2. Also, the coIP is only demonstrated in one direction. Similarly, the decrease in drebrin levels in mice is not assessed in the other model systems, coIP wasn't done, and mRNA transcripts are not quantified (even though others were). Drebrin phosphorylation state is not examined.  

      We appreciate the Reviewer suggestions and provided additional experimental evidence supporting the functional relevance of LRRK2-drebrin interaction.

      (1) As suggested, we performed qPCR and observed that 1 month-old KO midbrain and cortex express lower levels of Dbn1 as compared to WT brains (Figure 5G). This result is in agreement with the western blot data (Figure 5H). 

      (2)To further validate the physiological relevance of LRRK2-drebrin interaction we performed two experiments:

      i) Western blots looking at pSer935 and pRab8 (pan Rab) in Dbn1 WT and knockout brains. As reported and quantified in Figure 2I, we observed a significant decrease in pSer935 and a trend decrease in pRab8 in Dbn1 KO brains. This finding supports the notion that Drebrin forms a complex with LRRK2 that is important for its activity, e.g. upon BDNF stimulation. 

      ii) Reverse co-immunoprecipitation of YFP-drebrin full-length, N-terminal domain (1-256 aa) and C-terminal domain (256-649 aa) (plasmids kindly received from Professor Phillip R. Gordon-Weeks, Worth et al., J Cell Biol, 2013) with Flag-LRRK2 co-expressed in HEK293T cells. As shown in supplementary Fig. S2C, we confirm that YFP-drebrin binds LRRK2, with the Nterminal region of drebrin appearing to be the major contributor to this interaction. This result is important as the N-terminal region contains the ADF-H (actin-depolymerising factor homology) domain and a coil-coil region known to directly bind actin (Shirao et al., J Neurochem 2017; Koganezawa et al., Mol Cell Neurosci. 2017). Interestingly, both full-length Drebrin and its truncated C-terminal construct cause the same morphological changes in Factin, indicating that Drebrin-induced morphological changes in F-actin are mediated by its N-terminal domains rather than its intrinsically disordered C-terminal region (Shirao et al., J Neurochem, 2017; Koganezawa et al., Mol Cell Neurosci. 2017). Given the role of LRRK2 in actin-cytoskeletal dynamics and its binding with multiple actin-related protein binding (Fig. 2 and Meixner et al., Mol Cell Proteomics. 2011; Parisiadou and Cai, Commun Integr Biol 2010), these results suggest the possibility that LRRK2 controls actin dynamics by competing with drebrin binding to actin and open new avenues for futures studies.

      (3) To address the request for examining drebrin phosphorylation state, we decided to perform another phophoproteomic experiment, leveraging a parallel analysis incorporated in our latest manuscript (Chen et al., Mol Theraphy 2025). In this experiment, we isolated total striatal proteins from WT and G2019S KI mice and enriched the phospho-peptides. Unlike the experiment presented in Fig. 7, phosphopeptides were enriched from total striatal lysates rather than synaptosomal fractions, and phosphorylation levels were normalized to the corresponding total protein abundance. This approach was intended to avoid bias toward synaptic proteins, allowing for the analysis of a broader pool of proteins derived from a heterogeneous ensemble of cell types (neurons, glia, endothelial cells, pericytes etc.). We were pleased to find that this new experiment confirmed drebrin S339 as a differentially phosphorylated site, with a 3.7 fold higher abundance in G2019S Lrrk2 KI mice. The fact that this experiment evidenced an increased phosphorylation stoichiometry in G2019S mice rather than a decreased is likely due to the normalization of each peptide by its corresponding total protein. Gene ontology analysis of differentially phosphorylated proteins using stringent term size (<200 genes) showed post-synaptic spines and presynaptic active zones as enriched categories (Fig. 3F). A SynGO analysis confirms both pre and postsynaptic categories, with high significance for terms related to postsynaptic cytoskeleton (Fig. 3G). As pointed, this is particularly interesting as the starting material was whole striatal tissue – not synaptosomes as previously – indicating that most significant phosphorylation differences occur in synaptic compartments. This once again reinforces our hypothesis that LRRK2 has a prominent role in the synapse. Overall, we confirmed with an independent phosphoproteomic analysis that LRRK2 kinase activity influences the phosphorylation state of proteins related to synaptic function, particularly postsynaptic cytoskeleton. For clarity in data presentation, as mentioned by the Reviewers, we removed Figure 7 and incorporated this new analysis in figure 3, alongside the synaptic cluster analysis. 

      Altogether, three independent OMICs approaches – (i) experimental LRRK2 interactomics in neuronal cells, (ii) a literature-based LRRK2 synaptic/cytoskeletal interactor cluster, and (iii) a phospho-proteomic analysis of striatal proteins from G2019S KI mice (to model LRRK2 hyperactivity) – converge to synaptic actin-cytoskeleton as a key hub of LRRK2 neuronal function.

      (c) The large differences in the CRISPR KO cells in terms of BDNF responses are not seen in the primary neurons of KO mice, suggesting that other differences between the two might be responsible, rather than the lack of LRRK2 protein. 

      Considering that some variability is expected for these type of cultures and across different species, any difference in response magnitude and kinetics could be attributed to the levels of TrKB  and downstream components expressed by the two cell types. 

      We are confident that differentiated SH-SY5Y cells provide a reliable model for our study as we could translate the results obtained in SH-SY5Y cells in other models. However, to rule out the possibility that the more pronounced effect observed in SH-SY5Y KO cells as respect to Lrrk2 KO primary neurons was due to CRISPR off-target effect, we performed an off-target analysis. Specifically, we selected the first 8 putative off targets exhibiting a CDF (Cutting Frequency Determination) off-target-score >0.2. 

      As shown in supplemental file 1, sequence disruption was observed only in the LRRK2 ontarget site in LRRK2 KO SH-SY5Y cells, while the 8 off-target regions remained unchanged across the genotypes and relative to the reference sequence. 

      (d) No validation of hits in the G2019S mutant phosphoproteomics, and no other assays related to the rest of the paper/conclusions. Drebrin phosphorylation is different but unvalidated, or related to previous data sets beyond some discussion. The fact that LRRK2 binding occurs, and increases with BDNF stimulation, should be compared to its phosphorylation status and the effects of the G2019S mutation. 

      As illustrated in the response to point (b), we performed a new phosphoproteomics investigation – with total striatal lysates instead of striatal synaptosomes and normalization phospho-peptides over total proteins – and found that S339 phosphorylation increases when LRRK2 kinase activity increases (G2019S). To address the request of validating drebrin phosphorylation, the main limitation is that there are no available antibodies against Ser339. While we tried phos-Tag gels in striatal lysates, we could not detect any reliable and specific signal with the same drebrin antibody used for western blot (Thermo Fisher Scientific: MA120377) due to technical limitations of the phosTag method. We are confident that phosphorylation at S339 has a physiological relevance, as it was identified 67 times across multiple proteomic discovery studies and they are placed among the most frequently phosphorylated sites in drebrin (https://www.phosphosite.org/proteinAction.action?id=2675&showAllSites=true).

      To infer a possible role of this phosphorylation, we looked at the predicted pathogenicity of using AlphaMissense (Cheng et al., Science 2023). included as supplementary figure (Fig. S3), aminoacid substitutions within this site are predicted not to be pathogenic, also due to the low confidence of the AlphaFold structure. 

      Ser339 in human drebrin is located just before the proline-rich region (PP domain) of the protein. This region is situated between the actin-binding domains and the C-terminal Homerbinding sequences and plays a role in protein-protein interactions and cytoskeletal regulation (Worth et al., J Cell Biol, 2013). Of interest, this region was previously shown to be the interaction site of adafin (ADFN), a protein involved in multiple cytoskeletal-related processes, including synapse formation and function by regulating puncta adherentia junctions, presynaptic differentiation, and cadherin complex assembly, which are essential for hippocampal excitatory synapses, spine formation, and learning and memory processes (Beaudoin, G. M., 3rd et al., J Neurosci, 2013). Of note, adafin is in the list of LRRK2 interacting proteins (https://www.ebi.ac.uk/intact/home), supporting a possible functional relevance of LRRK2-mediated drebrin phosphorylation in adafin-drebrin complex formation. This has been discussed in the discussion section.

      The aim of this MS analysis in G2019S KI mice – now included in figure 3 – was to further validate the crucial role of LRRK2 kinase activity in the context of synaptic regulation, rather than to discover and characterize novel substrates. Consequently, Figure 7 has been eliminated. 

      Reviewer #2 (Public Review):  

      Taken as a whole, the data in the manuscript show that BDNF can regulate PD-associated kinase LRRK2 and that LRRK2 modifies the BDNF response. The chief strength is that the data provide a potential focal point for multiple observations across many labs. Since LRRK2 has emerged as a protein that is likely to be part of the pathology in both sporadic and LRRK2 PD, the findings will be of broad interest. At the same time, the data used to imply a causal throughline from BDNF to LRRK2 to synaptic function and actin cytoskeleton (as in the title) are mostly correlative and the presentation often extends beyond the data. This introduces unnecessary confusion. There are also many methodological details that are lacking or difficult to find. These issues can be addressed. 

      We appreciate the Reviewer’s positive feedback on our study. We also value the suggestion to present the data in a more streamlined and coherent way. In response, we have updated the title to better reflect our overall findings: “LRRK2 Regulates Synaptic Function through Modulation of Actin Cytoskeletal Dynamics.” Additionally, we have included several experiments that we believe enhance and unify the study.

      (1) The writing/interpretation gets ahead of the data in places and this was confusing. For example, the abstract highlights prior work showing that Ser935 LRRK2 phosphorylation changes LRRK2 localization, and Figure 1 shows that BDNF rapidly increases LRRK2 phosphorylation at this site. Subsequent figures highlight effects at synapses or with synaptic proteins. So is the assumption that LRRK2 is recruited to (or away from) synapses in response to BDNF? Figure 2H shows that LRRK2-drebrin interactions are enhanced in response to BDNF in retinoic acid-treated SH-SY5Y cells, but are synapses generated in these preps? How similar are these preps to the mouse and human cortical or mouse striatal neurons discussed in other parts of the paper (would it be anticipated that BDNF act similarly?) and how valid are SHSY5Y cells as a model for identifying synaptic proteins? Is drebrin localization to synapses (or its presence in synaptosomes) modified by BDNF treatment +/- LRRK2? Or do LRRK2 levels in synaptosomes change in response to BDNF? The presentation requires re-writing to stay within the constraints of the data or additional data should be added to more completely back up the logic. 

      We thank the Reviewer for the thorough suggestions and comments. We have extensively revised the text to accurately reflect our findings without overinterpreting. In particular, we agree with the Reviewer that differentiated SH-SY5Y cells are not  identical to primary mouse or human neurons; however both neuronal models respond to BDNF. Supporting our observations, it is known that SH-SY5Y cells respond to BDNF.  In fact, a common protocol for differentiating SH-SY5Y cells involve BDNF in combination with retinoic acid (Martin et al., Front Pharmacol, 2022; Kovalevich et al., Methods in mol bio, 2013). Additionally, it has been reported that SH-SY5Y cells can form functional synapses (Martin et al., Front Pharmacol, 2022). While we are aware that BDNF, drebrin or LRRK2 can also affect non-synaptic pathways, we focused on synapses when moved to mouse models since: (i) MS and phosphoMS identified several cytoskeletal proteins enriched at the synapse, (ii) we and others have previously reported a role for LRRK2 in governing synaptic and cytoskeletal related processes; (iii) the synapse is a critical site that becomes dysfunctional in the early  stages of PD. We have now clarified and adjusted the text as needed. We have also performed additional experiments to address the Reviewer’s concern:

      (1) “Is the assumption that LRRK2 is recruited to (or away from) synapses in response to BDNF”? This is a very important point. There is consensus in the field that detecting endogenous LRRK2 in brain slices or in primary neurons via immunofluorescence is very challenging with the commercially available  antibodies (Fernandez et al., J Parkinsons Dis, 2022). We established a method in our previous studies to detect LRRK2 biochemically in synaptosomes (Cirnaru et al., Front Mol Neurosci, 2014; Belluzzi et al., Mol Neurodegener., 2016). While these data indicate LRRK2 is present in the synaptic compartments, it would be quite challenging to apply this method to the present study. In fact, applying acute BDNF stimulation in vivo and then isolate synaptosomes is a complex experiment beyond the timeframe of the revision due to the need of mouse ethical approvals. However, this is definitely an intriguing angle to explore in the future.

      (2)“Is drebrin localization to synapses (or its presence in synaptosomes) modified by BDNF treatment +/- LRRK2?” To try and address this question, we adapted a previously published assay to measure drebrin exodus from dendritic spines. During calcium entry and LTP, drebrin exits dendritic spines and accumulates in the dendritic shafts and cell body (Koganezawa et al., 2017). This facilitates the reorganization of the actin cytoskeleton (Shirao et al., 2017). Given the known role of drebrin and its interaction with LRRK2, we hypothesized that LRRK2 loss might affect drebrin relocalization during spine maturation.

      To test this, we treated DIV14 primary cortical neurons from Lrrk2 WT and KO mice with BDNF for 5, 15, and 24 hours, then performed confocal imaging of drebrin localization (Author response image 1). Neurons were transfected at DIV4 with GFP (cell filler) and PSD95 (dendritic spines) for visualization, and endogenous drebrin was stained with an anti-drebrin antibody. We then measured drebrin's overlap with PSD95-positive puncta to track its localization at the spine.

      In Lrrk2 WT neurons, drebrin relocalized from spines after BDNF stimulation, peaking at 15 minutes and showing higher co-localization with PSD95 at 24 hours, indicating the spine remodeling occurred. In contrast, Lrrk2 KO neurons showed no drebrin exodus. These findings support the notion that LRRK2's interaction with drebrin is important for spine remodeling via BDNF. However, additional experiments with larger sample sizes are needed, which were not feasible within the revision timeframe (here n=2 experiments with independent neuronal preparations, n=4-7 neurons analyzed per experiment). Thus, we included the relevant figure as Author response image 1 but chose not to add it in the manuscript (figure 3).

      Author response image 1.

      Lrrk2 affects drebrin exodus from dendritic spines. After the exposure to BDNF for different times (5 minutes, 15 minutes and 24 hours), primary neurons from Lrrk2 WT and KO mice have been transfected with GFP and PSD95 and stained for endogenous drebrin at DIV4. The amount of drebrin localizing in dentritic spines outlined by PSD95 has been assessed at DIV14. The graph shows a pronounced decrease in drebrin content in WT neurons during short time treatments and an increase after 24 hours. KO neurons present no evident variations in drebrin localization upon BDNF stimulation. Scale bar: 4 μm.<br />

      (2) The experiments make use of multiple different kinds of preps. This makes it difficult at times to follow and interpret some of the experiments, and it would be of great benefit to more assertively insert "mouse" or "human" and cell type (cortical, glutamatergic, striatal, gabaergic) etc. 

      We thank the Reviewer for pointing this out. We have now more clearly specified the cell type and species identity throughout the text to improve clarity and interpretation.

      (3) Although BDNF induces quantitatively lower levels of ERK or Akt phosphorylation in LRRK2KO preps based on the graphs (Figure 4B, D), the western blot data in Figure 4C make clear that BDNF does not need LRRK2 to mediate either ERK or Akt activation in mouse cortical neurons and in 4A, ERK in SH-SY5Y cells. The presentation of the data in the results (and echoed in the discussion) writes of a "remarkably weaker response". The data in the blots demand more nuance. It seems that LRRK2 may potentiate a response to BDNF that in neurons is independent of LRRK2 kinase activity (as noted). This is more of a point of interpretation, but the words do not match the images.  

      We thank the Reviewer for pointing this out. We have rephrased our data  presentation to better convey  our findings. We were not surprised to find that loss of LRRK2 causes only a reduction of ERK and AKT activation upon BDNF rather than a complete loss. This is because these pathways are complex and redundant and are activated by a number of cellular effectors. The fact that LRRK2 is one among many players whose function can be compensated by other signaling molecules is also supported by the phenotype of Lrrk2 KO mice that is measurable at 1 month but disappears with adulthood (4 and 18 months) (figure 5).

      Moreover, we removed the sentence “Of note, 90 mins of Lrrk2 inhibition (MLi-2) prior to BDNF stimulation did not prevent phosphorylation of Akt and Erk1/2, suggesting that LRRK2 participates in BDNF-induced phosphorylation of Akt and Erk1/2 independently from its kinase activity but dependently from its ability to be phosphorylated at Ser935 (Fig. 4C-D and Fig. 1B-C)” since the MLi-2 treatment prior to BDNF stimulation was not quantified and our new data point to an involvement of LRRK2 kinase activity upon BDNF stimulation.

      (4) Figure 4F/G shows an increase in PSD95 puncta per unit length in response to BDNF in mouse cortical neurons. The data do not show spine induction/dendritic spine density/or spine morphogenesis as suggested in the accompanying text (page 8). Since the neurons are filled/express gfp, spine density could be added or spines having PSD95 puncta. However, the data as reported would be expected to reflect spine and shaft PSDs and could also include some nonsynaptic sites. 

      The Reviewer is right. We have rephrased the text to reflect an increase in postsynaptic density (PSD) sites, which may include both spine and shaft PSDs, as well as potential nonsynaptic sites.

      (5) Experimental details are missing that are needed to fully interpret the data. There are no electron microscopy methods outside of the figure legend. And for this and most other microscopy-based data, there are few to no descriptions of what cells/sites were sampled, how many sites were sampled, and how regions/cells were chosen. For some experiments (like Figure 5D), some detail is provided in the legend (20 segments from each mouse), but it is not clear how many neurons this represents, where in the striatum these neurons reside, etc. For confocal z-stacks, how thick are the optical sections and how thick is the stack? The methods suggest that data were analyzed as collapsed projections, but they cite Imaris, which usually uses volumes, so this is confusing. The guide (sgRNA) sequences that were used should be included. There is no mention of sex as a biological variable. 

      We thank the Reviewer for pointing out this missing information. We have now included:

      (1) EM methods (page 24)

      (2) Methods for ICC and confocal microscopy now incorporates the Z-stack thickness (0.5 μm x 6 = 3 μm) on page 23.

      (3) Methods for Golgi-Cox staining now incorporates the Z-stack thickness and number of neurons and segments per neuron analyzed. 

      (4) The sex of mice is mentioned in the material and methods (page 17): “Approximately equal numbers of males and females were used for every experiment”.

      (6) For Figures 1F, G, and E, how many experimental replicates are represented by blots that are shown? Graphs/statistics could be added to the supplement. For 1C and 1I, the ANOVA p-value should be added in the legend (in addition to the post hoc value provided). 

      The blots relative to figure 1F,G and E are representative of several blots (at least n=5). The same redouts are part of figure 4 where quantifications are provided. We added the ANOVA p-value in the legend for figure 1C, 1I and 1K.

      (7) Why choose 15 minutes of BDNF exposure for the mass spec experiments when the kinetics in Figure 1 show a peak at 5 mins?  

      This is an important point. We repeated the experiment in GFP-LRRK2 SH-SY5Y cells (figure S1C) and included the 15 min time point. In addition to confirming that pSer935 increases similarly at 5 and 15 minutes, we also observed an increase in RAB phosphorylation at these time points. As mentioned in our response to Reviewer’s 1, we pretreated with MLi-2 for 90 minutes in this experiment to reduce the high basal phosphorylation stoichiometry of pSer935. 

      (8) The schematic in Figure 6A suggests that iPSCs were plated, differentiated, and cultured until about day 70 when they were used for recordings. But the methods suggest they were differentiated and then cryopreserved at day 30, and then replated and cultured for 40 more days. Please clarify if day 70 reflects time after re-plating (30+70) or total time in culture (70). If the latter, please add some notes about re-differentiation, etc. 

      We thank the reviewer for providing further clarity on the iPSC methodology. In the submitted manuscript 70DIV represents the total time in vitro and the process involved a cryostorage event at 30DIV, with a thaw of the cells and a further 40 days of maturation before measurement.  We have adjusted the methods in both the text and figure (new schematic) to clarify this.  The cryopreservation step has been used in other iPSC methods to great effect (Drummond et al., Front Cell Dev Biol, 2020). Due to the complexity and length of the iPSC neuronal differentiation process, cryopreservation represents a useful method with which to shorten and enhance the ability to repeat experiments and reduce considerable variation between differentiations. User defined differences in culture conditions for each batch of neurons thawed can usefully be treated as a new and separate N compared to the next batch of neurons.

      (9) When Figures 6B and 6C are compared it appears that mEPSC frequency may increase earlier in the LRRK2KO preps than in the WT preps since the values appear to be similar to WT + BDNF. In this light, BDNF treatment may have reached a ceiling in the LRRK2KO neurons.

      We thank the reviewer for his/her comment and observations about the ceiling effects. It is indeed possible that the loss of LRRK2 and the application of BDNF could cause the same elevation in synaptic neurotransmission. In such a situation, the increased activity as a result of BDNF treatment would be masked by the increased activity  observed as a result of LRRK2 KO. To better visualize the difference between WT and KO cultures and the possible ceiling effect, we merged the data in one single graph.  

      (10) Schematic data in Figures 5A and C and Figures 5B and E are too small to read/see the data. 

      We thank the Reviewer for this suggestion. We have now enlarged figure 5A and moved the graph of figure 5D in supplemental figure S5, since this analysis of spine morphology is secondary to the one shown in figure 5C.

      Reviewer #1 (Recommendations For The Authors): 

      Please forgive any redundancy in the comments, I wanted to provide the authors with as much information as I had to explain my opinion. 

      Primary mouse cortical neurons at div14, 20% transient increase in S935 pLRRK2 5min after BDNF, which then declines by 30 minutes (below pre-stim levels, and maybe LRRK2 protein levels do also). 

      In differentiated SHSY5Y cells there is a large expected increase in pERK and pAKT that is sustained way above pre-stim for 60 minutes. There is a 50% initial increase in pLRRK2 (but the blot is not very clear and no double band in these cells), which then looks like reduced well below pre-stim by 30 & 60 minutes. 

      We thank the Reviewer for bring up this important point. We have extensively addressed this issue in the public review rebuttal. In essence, the phosphorylation of Ser935 is near saturation under unstimulated conditions, as evidenced by its high basal stoichiometry, whereas Rab phosphorylation is far from saturation, showing an increase upon BDNF stimulation before returning to baseline levels. This distinction highlights that while pSer935 exhibits a ceiling effect due to its near-maximal phosphorylation at rest, pRab responds dynamically to BDNF, indicating low basal phosphorylation and a significant capacity for increase. Figure 1 in the rebuttal summarizes the new data collected. 

      GFP-fused overexpressed LRRK2 coIPs with drebrin, and this is double following 15 min BDNF. Strong result.

      We thank the Reviewer.

      BDNF-induced pAKT signaling is greatly impaired, and pERK is somewhat impaired, in CRISPR LKO SHSY5Y cells. In mouse primaries, both AKT and Erk phosph is robustly increased and sustained over 60 minutes in WT and LKO. This might be initially less in LKO for Akt (hard to argue on a WB n of 3 with huge WT variability), regardless they are all roughly the same by 60 minutes and even look higher in LKO at 60. This seems like a big disconnect and suggests the impairment in the SHSy5Y cells might have more to do with the CRISPR process than the LRRK2. Were the cells sequenced for off-target CRISPR-induced modifications?  

      Following the Reviewer suggestion – and as discussed in the public review section - we performed an off-target analysis. Specifically, we selected the first 8 putative off targets exhibiting a CDF (Cutting Frequency Determination) off-target-score >0.2. As shown in supplemental file 1, sequence disruption was observed only in the LRRK2 on-target site in LRRK2 KO SH-SY5Y cells, while the 8 off-target regions remained unchanged across the genotypes and relative to the reference sequence.  

      No difference in the density of large PSD-95 puncta in dendrites of LKO primary relative to WT, and the small (10%) increase seen in WT after BDNF might be absent in LKO (it is not clear to me that this is absent in every culture rep, and the data is not highly convincing). This is also referred to as spinogenesis, which has not been quantified. Why not is confusing as they did use a GFP fill... 

      The Reviewer is right that spinogenesis is not the appropriate term for the process analyzed. We replaced “spinogenesis” with “morphological alternation of dendritic protrusions” or “synapse maturation” which is correlated with the number of PSD95 positive puncta (ElHusseini et al., Science, 2000) . 

      There is a difference in the percentage of dendritic protrusions classified as filopodia to more being classified as thin spines in LKO striatal neurons at 1 month, which is not seen at any other age, The WT filopodia seems to drop and thin spine percent rise to be similar to LKO at 4 months. This is taken as evidence for delayed maturation in LKO, but the data suggest the opposite. These authors previously published decreased spine and increased filopodia density at P15 in LKO. Now they show that filopodia density is decreased and thin spine density increased at one month. How is that shift from increased to decreased filopodia density in LKO (faster than WT from a larger initial point) evidence of impaired maturation? Again this seems accelerated? 

      We agree with the Reviewer that the initial interpretation was indeed confusing. To adhere closely to our data and avoid overinterpretation – as also suggested by Reviewer 2 – we revised  the text and moved figure 5D to supplementary materials. In essence, our data point out to alterations in the structural properties of dendritic protrusions in young KO mice, specifically a reduction in  their size (head width and neck height) and a decrease in postsynaptic density (PSD) length, as observed with TEM. These findings suggest that LRRK2 is involved in morphological processes during spine development. 

      Shank3 and PSD95 mRNA transcript levels were reduced in the LKO midbrain, only shank3 was reduced in the striatum and only PSD was reduced in the cortex. No changes to mRNA of BDNF-related transcripts. None of these mRNA changes protein-validated. Drebrin protein (where is drebrin mRNA?) levels are reduced in LKO at 1&4 but not clearly at 18 months (seems the most robust result but doesn't correlate with other measures, which here is basically a transient increase (1m) in thin striatal spines).  

      As illustrated before, we performed qPCR for Dbn1 and found that its expression is significantly reduced in the cortex and midbrain and non-significantly reduced in the striatum (1 months old mice, a different cohort as those used for the other analysis in figure 5).  

      24h BDNF increases the frequency of mEPSCs on hIPSC-derived cortical-like neurons, but not LKO, which is already high. There are no details of synapse number or anything for these cultures and compares 24h treatment. BDNF increases mEPSC frequency within minutes PMC3397209, and acute application while recording on cells may be much more informative (effects of BDNF directly, and no issues with cell-cell / culture variability). Calling mEPSC "spontaneous electrical activity" is not standard.  

      We thank the reviewer for this point. We provided information about synapse number (Bassoon/Homer colocalization) in supplementary figure S7. The lack of response of LRRK2 KO cultures in terms of mEPSC is likely due to increase release probability as the number of synapses does not change between the two genotypes. 

      The pattern of LRRK2 activation is very disconnected from that of BDNF signalling onto other kinases. Regarding pLRRK2, s935 is a non-autophosph site said to be required for LRRK2 enzymatic activity, that is mostly used in the field as a readout of successful LRRK2 inhibition, with some evidence that this site regulates LRRK2 subcellular localization (which might be more to do with whether or not it is p at 935 and therefor able to act as a kinase). 

      The authors imply BDNF is activating LRRK2, but really should have looked at other sites, such as the autophospho site 1292 and 'known' LRRK2 substrates like T73 pRab10 (or other e.g., pRab12) as evidence of LRRK2 activation. One can easily argue that the initial increase in pLRRK2 at this site is less consequential than the observation that BDNF silences LRRK2 activity based on p935 being sustained to being reduced after 5 minutes, and well below the prestim levels... not that BDNF activates LRRK2. 

      As described above, we have collected new data showing that BDNF stimulation increases LRRK2 kinase activity toward its physiological substrates Rab10 and Rab8 (using a panphospho-Rab antibody) (Figure 1 and Figure S1). Additionally, we have also extensively commented the ceiling effect of pS935.

      BDNF does a LOT. What happens to network activity in the neural cultures with BDNF application? Should go up immediately. Would increasing neural activity (i.e., through depolarization, forskolin, disinhibition, or something else without BDNF) give a similar 20% increase in pS935 LRRK2? Can this be additive, or occluded? This would have major implications for the conclusions that BDNF and pLRRK2 are tightly linked (as the title suggests).  

      These are very valuable observations; however, they fall outside the scope and timeframe of this study. We agree that future research should focus on gaining a deeper mechanistic understanding of how LRRK2 regulates synaptic activity, including vesicle release probability and postsynaptic spine maturation, independently of BDNF.

      Figures 1A & H "Western blot analysis revealed a rapid (5 mins) and transient increase of Ser935 phosphorylation after BDNF treatment (Fig. 1B and 1C). Of interest, BDNF failed to stimulate Ser935 phosphorylation when neurons were pretreated with the LRRK2 inhibitor MLi-2" . The first thing that stands out is that the pLRRK2 in WB is not very clear at all (although we appreciate it is 'a pig' to work with, I'd hope some replicates are clearer); besides that, the 20% increase only at 5min post-BDNF stimulation seems like a much less profound change than the reduction from base at 60 and more at 180 minutes (where total LRRK2 protein is also going down?). That the blot at 60 minutes in H is representative of a 30% reduction seems off... makes me wonder about the background subtraction in quantification (for this there is much less pLRRK2 and more total LRRK2 than at 0 or 5). LRRK2 (especially) and pLRRK2 seem very sketchy in H. Also, total LRRK2 appears to increase in the SHSY5Y cell not the neurons, and this seems even clearer in 2 H. 

      To better visualize the dynamics of pS935 variation relative to time=0, we presented the data as the difference between t=0 and t=x. It clearly shows that pSe935 goes below prestimulation levels, whereas pRab10 does not. The large difference in the initial stoichiometry of these two phosphorylation is extensively discussed above.

      That MLi2 eliminates pLRRK2 (and seems to reduce LRRK2 protein?) isn't surprising, but a 90min pretreatment with MLi-2 should be compared to MLi-2's vehicle alone (MLi-2 is notoriously insoluble and the majority of diluents have bioactive effects like changing activity)... especially if concluding increased pLRRK2 in response to BDNF is a crucial point (when comparing against effects on other protein modifications such as pAKT). This highlights a second point... the changes to pERK and pAKT are huge following BDNF (nothing to massive quantities), whereas pLRRK2 increases are 20-50% at best. This suggests a very modest effect of BDNF on LRRK in neurons, compared to the other kinases. I worry this might be less consequential than claimed. Change in S1 is also unlikely to be significant... 

      These comments have been thoroughly addressed in the previous responses. Regarding fig. S1, we added an additional experiment (Figure S1C) in GFP-LRRK2 cells showing robust activation of LRRK2 (pS935, pRabs) at the timepoint of MS (15 min).

      "As the yields of endogenous LRRK2 purification were insufficient for AP-MS/MS analysis, we generated polyclonal SH-SY5Y cells stably expressing GFP-LRRK2 wild-type or GFP control (Supplementary Fig. 1)" . I am concerned that much is being assumed regarding 'synaptic function' from SHSY5Y cells... also overexpressing GFP-LRRK2 and looking at its binding after BDNF isn't synaptic function.  

      We appreciate the reviewer’s comment. We would like to clarify that the interactors enriched upon BDNF stimulation predominantly fall into semantic categories related to the synapse and actin cytoskeleton. While this does not imply that these interactors are exclusively synaptic, it suggests that this tightly interconnected network likely plays a role in synaptic function. This interpretation is supported by several lines of evidence: (1) previous studies have demonstrated the relevance of this compartment to LRRK2 function; (2) our new phosphoproteomics data from striatal lysate highlight enrichment of synaptic categories; and (3) analysis of the latest GWAS gene list (134 genes) also indicates significant enrichment of synapse-related categories. Taken together, these findings justify further investigation into the role of LRRK2 in synaptic biology, as discussed extensively in the manuscript’s discussion section.

      Figure 2A isn't alluded to in text and supplemental table 1 isn't about LRRK2 binding, but mEPSCs. 

      We have added Figure 2A and added supplementary .xls table 1, which refers to the excel list of genes with modulated interaction upon BDNF (uploaded in the supplemental material).

      We added the extension .xls also for supplementary table 2 and 3. 

      Figure 2A is useless without some hits being named, and the donut plots in B add nothing beyond a statement that "35% of 'genes' (shouldn't this be proteins?) among the total 207 LRRK2 interactors were SynGO annotated" might as well [just] be the sentence in the text. 

      We have now included the names of the most significant hits, including cytoskeletal and translation-related proteins, as well as known LRRK2 interactors. We decided to retain the donut plots, as we believe they simplify data interpretation for the reader, reducing the need to jump back and forth between the figures and the text.

      Validation of drebrin binding in 2H is great... although only one of 8 named hits; could be increased to include some of the others. A concern alludes to my previous point... there is no appreciable LRRK2 in these cells until GFP-LRRK2 is overexpressed; is this addressed in the MS? Conclusions would be much stronger if bidirectional coIP of these binding candidates were shown with endogenous (GFP-ve) LRRK2 (primaries or hIPSCs, brain tissue?) 

      To address the Reviewer’s concerns to the best of our abilities, we have added a blot in Supplemental figure S1A showing how the expression levels of LRRK2 increase after RA differentiation. Moreover, we have included several new data further strengthening the functional link between LRRK2 and drebrin, including qPCR of Dbn1 in one-month old Lrrk2 KO brains, western blots of Lrrk2 and Rab in Dbn1 KO brains, and co-IP with drebrin N- and Cterm domains. 

      Figures 3 A-C are not informative beyond the text and D could be useful if proteins were annotated. 

      To avoid overcrowding, proteins were annotated in A and the same network structure reported for synaptic and actin-related interactors. 

      Figure 4. Is this now endogenous LRRK2 in the SHSY5Y cells? Again not much LRRK2 though, and no pLRRK shown. 

      We confirm that these are naïve SH-SY5Y cells differentiated with RA and LRRK2 is endogenous. We did not assess pS935 in this experiment, as the primary goal was to evaluate pAKT and pERK1/2 levels. To avoid signal saturation, we loaded less total protein (30 µg instead of the 80 µg typically required to detect pS935). pS935 levels were extensively assessed in Figure 1. This experimental detail has now been added in the material and methods section (page 18).

      In C (primary neurons) There is very little increase in pLRRK2 / LRRK2 at 5 mins, and any is much less profound a change than the reduction at 30 & 60 mins. I think this is interesting and may be a more substantial consequence of BDNF treatment than the small early increase. Any 5 min increase is gone by 30 and pLRRK2 is reduced after. This is a disconnect from the timing of all the other pProteins in this assay, yet pLRRK2 is supposed to be regulating the 'synaptic effects'? 

      The first part of the question has already been extensively addressed. Regarding the timing, one possibility is that LRRK2 is activated upstream of AKT and ERK1/2, a hypothesis supported by the reduced activation of AKT and ERK1/2 observed in LRRK2 KO cells, as discussed in the manuscript, and in MLi-2 treated cells (Author response image 2). Concerning the synaptic effects, it is well established that synaptic structural and functional plasticity occurs downstream of receptor activation and kinase signaling cascades. These changes can be mediated by both rapid mechanisms (e.g., mobilization of receptor-containing endosomes via the actin cytoskeleton) and slower processes involving gene transcription of immediate early genes (IEGs). Since structural and functional changes at the synapse generally manifest several hours after stimulation, we typically assessed synaptic activity and structure 24 hours post-stimulation.

      Akt Erk1&2 both go up rapidly after BDNF in WT, although Akt seems to come down with pLRRK2. If they aren't all the same Akt is probably the most different between LKO and WT but I am very concerned about an n=3 for wb, wb is semi-quantitative at best, and many more than three replicates should be assessed, especially if the argument is that the increases are quantitively different between WT v KO (huge variability in WT makes me think if this were done 10x it would all look same). Moreover, this isn't similar to the LKO primaries  "pulled pups" pooled presumably. 

      Despite some variability in the magnitude of the pAKT/pERK response in naïve SH-SY5Y cells, all three independent replicates consistently showed a reduced response in LRRK2 KO cells, yielding a highly significant result in the two-way ANOVA test. In contrast, the difference in response magnitude between WT and LRRK2 KO primary cultures was less pronounced, which justified repeating the experiments with n=9 replicates. We hope the Reviewer acknowledges the inherent variability often observed in western blot experiments, particularly when performed in a fully independent manner (different cultures and stimulations, independent blots).

      To further strengthen the conclusion that this effect is reproducible and dependent on LRRK2 kinase activity upstream of AKT and ERK, we probed the membranes in figure 1H with pAKT/total AKT and pERK/total ERK. All things considered and consistent with our hypothesis, MLi-2 significantly reduced BDNF-mediated AKT and ERK1/2 phosphorylation levels (Author response image 2). 

      Author response image 2.

      Western blot (same experiments as in figure 1) was performed using antibodies against phospho-Thr202/185 ERK1/2, total ERK1/2 and phospho-Ser473 AKT, total AKT protein levels Retinoic acid-differentiated SH-SY5Y cells stimulated with 100 ng/mL BDNF for 0, 5, 30, 60 mins. MLi-2 was used at 500 nM for 90 mins to inhibit LRRK2 kinase activity.

      G lack of KO effect seems to be skewed from one culture in the plot (grey). The scatter makes it hard to read, perhaps display the culture mean +/- BDNF with paired bars. The fact that one replicate may be changing things is suggested by the weirdly significant treatment effect and no genotype effect. Also, these are GFP-filled cells, the dendritic masks should be shown/explained, and I'm very surprised no one counted the number (or type?) of protrusions, especially as the text describes this assay (incorrectly) as spinogenesis... 

      As suggested by the Reviewer we have replotted the results as bar graphs. Regarding the number of protrusions, we initially counted the number of GFP+ puncta in the WT and did not find any difference (Author response image 3). Due to our imaging setup (confocal microscopy rather than super-resolution imaging and Imaris 3D reconstruction), we were unable to perform a fine morphometric analysis. However, this was not entirely unexpected, as BDNF is known to promote both the formation and maturation of dendritic spines. Therefore, we focused on quantifying PSD95+ puncta as a readout of mature postsynaptic compartments. While we acknowledge that we cannot definitively conclude that each PSD95+ punctum is synaptically connected to a presynaptic terminal, the data do indicate an increase in the number of PSD95+ structures following BDNF stimulation.

      Author response image 3.

      GFP+ puncta per unit of neurite length (µm) in DIV14 WT primary neurons untreated or upon 24 hour of BDNF treatment (100 ng/ml). No significant difference were observed (n=3).

      Figure 5. "Dendritic spine maturation is delayed in Lrrk2 knockout mice". The only significant change is at 1 month in KO which shows fewer filopodia and increased thin spines (50% vs wt). At 4 months the % of thin spines is increased to 60% in both... Filopodia also look like 4m in KO at 1m... How is that evidence for delayed maturation? If anything it suggests the KO spines are maturing faster. "the average neck height was 15% shorter and the average head width was 27% smaller, meaning that spines are smaller in Lrrk2 KO brains" - it seems odd to say this before saying that actually there are just MORE thin spines, the number of mature "mushroom' is same throughout, and the different percentage of thin comes from fewer filopodia. This central argument that maturation is delayed is not supported and could be backwards, at least according to this data. Similarly, the average PSD length is likely impacted by a preponderance of thin spines in KO... which if mature were fewer would make sense to say delayed KO maturation, but this isn't the case, it is the fewer filopodia (with no PSD) that change the numbers. See previous comments of the preceding manuscript. 

      We agree that thin spines, while often considered more immature, represent an intermediate stage in spine development. The data showing an increase in thin spines at 1 month in the KO mice, along with fewer filopodia, could suggest a faster stabilization of these spines, which might indeed be indicative of premature maturation rather than delayed maturation. This change in spine morphology may indicate that the dynamics of synaptic plasticity are affected. Regarding the PSD length, as the Reviewer pointed out, the increased presence of thin spines in KO might account for the observed changes in PSD measurements, as thin spines typically have smaller PSDs. This further reinforces the idea that the overall maturation process may be altered in the KO, but not necessarily delayed. 

      We rephrase the interpretation of these data, and moved figure 5D as supplemental figure S4.

      "To establish whether loss of Lrrk2 in young mice causes a reduction in dendritic spines size by influencing BDNF-TrkB expression" - there is no evidence of this.  

      We agree and reorganized the text, removing this sentence.  

      Shank and PSD95 mRNA changes being shown without protein adds very little. Why is drebrin RNA not shown? Also should be several housekeeping RNAs, not one (RPL27)? 

      We measured Dbn1 mRNA, which shows a significant reduction in midbrain and cortex. Moreover we have now normalized the transcript levels against the geometrical means of three housekeeping genes (RPL27, actin, and GAPDH) relative abundance.

      Drebrin levels being lower in KO seems to be the strongest result of the paper so far (shame no pLRRK2 or coIP of drebrin to back up the argument). DrebrinA KO mice have normal spines, what about haploinsufficient drebrin mice (LKO seem to have half derbrin, but only as youngsters?)  

      As extensively explained in the public review, we used Dbn1 KO mouse brains and were able to show reduced Lrrk2 activity.

      Figure 6. hIPSC-derived cortical neurons. The WT 'cortical' neurons have a very low mEPSC frequency at 0.2Hz relative to KO. Is this because they are more or less mature? What is the EPSC frequency of these cells at 30 and 90 days for comparison? Also, it is very very hard to infer anything about mEPSC frequency in the absence of estimates of cell number and more importantly synapse number. Furthermore, where are the details of cell measures such as capacitance, resistance, and quality control e.g., Ra? Table s1 seems redundant here, besides suggesting that the amplitude is higher in KO at base. 

      We agree that the developmental trajectory of iPSC-derived neurons is critical to accurately interpreting synaptic function and plasticity. In response, we have included additional data now presented in the supplementary figure S7 and summarize key findings below:

      At DIV50, both WT and LRRK2 KO neurons exhibit low basal mEPSC activity (~0.5 Hz) and no response to 24 h BDNF stimulation (50 ng/mL).

      At DIV70 WT neurons show very low basal activity (~0.2 Hz), which increases ~7.5-fold upon BDNF treatment (1.5 Hz; p < 0.001), and no change in synapse number. KO neurons display elevated basal activity (~1 Hz) similar to BDNF-treated WT neurons, with no further increase upon BDNF exposure (~1.3 Hz) and no change in synapse number.

      At DIV90, no significant effect of BDNF in both WT and KO, indicating a possible saturation of plastic responses. The lack of BDNF response at DIV90 may be due to endogenous BDNF production or culture-based saturation effects. While these factors warrant further investigation (e.g., ELISA, co-culture systems), they do not confound the key conclusions regarding the role of LRRK2 in synaptic development and plasticity:

      LRRK2 Enables BDNF-Responsive Synaptic Plasticity. In WT neurons, BDNF induces a significant increase in neurotransmitter release (mEPSC frequency) with no reduction in synapse number. This dissociation suggests BDNF promotes presynaptic functional potentiation. KO neurons fail to show changes in either synaptic function or structure in response to BDNF, indicating that LRRK2 is required for activity-dependent remodeling.

      LRRK2 Loss Accelerates Synaptic Maturation. At DIV70, KO neurons already exhibit high spontaneous synaptic activity equivalent to BDNF-stimulated WT neurons. This suggests that LRRK2 may act to suppress premature maturation and temporally gate BDNF responsiveness, aligning with the differences in maturation dynamics observed in KO mice (Figure 5).  

      As suggested by the reviewer we reported the measurement of resistance and capacitance for all DIV (Table 1, supplemental material). A reduction in capacitance was observed in WT neurons at DIV90, which may reflect changes in membrane complexity. However, this did not correlate with differences in synapse number and is unlikely to account for the observed differences in mEPSC frequency. To control for cell number between groups, cell count prior to plating was performed (80k/cm2; see also methods) on the non-dividing cells to keep cell number consistent.

      The presence of BDNF in WT seems to make them look like LKO, in the rest of the paper the suggestion is that the LKO lack a response to BDNF. Here it looks like it could be that BDNF signalling is saturated in LKO, or they are just very different at base and lack a response.

      Knowing which is important to the conclusions, and acute application (recording and BDNF wash-in) would be much more convincing.

      We agree with the Reviewer’s point that saturation of BDNF could influence the interpretation of the data if it were to occur. However, it is important to note that no BDNF exists in the media in base control and KO neuronal culture conditions. This is  different from other culture conditions and allows us to investigate the effects of  BDNF treatment. Thus, the increased mEPSC frequency observed in KO neurons compared to WT neurons is defined only by the deletion of the gene and not by other extrinsic factors which were kept consistent between the groups. The lack of response or change in mEPSC frequency in KO is proposed to be a compensatory mechanism due to the loss of LRRK2. Of Note, LRRK2 as a “synaptic break” has already been described (Beccano-Kelly et al., Hum Mol Gen, 2015). However, a comprehensive analysis of the underlying molecular mechanisms will  require future studies beyond  with the scope of this paper.

      "The LRRK2 kinase substrates Rabs are not present in the list of significant phosphopeptides, likely due to the low stoichiometry and/or abundance" Likely due to the fact mass spec does not get anywhere near everything. 

      We removed this sentence in light of the new phosphoproteomic analysis.

      Figure 7 is pretty stand-alone, and not validated in any way, hard to justify its inclusion?  

      As extensively explained we removed figure 7 and included the new phospho-MS as part of figure. 3

      Writing throughout shows a very selective and shallow use of the literature.  

      We extensively reviewed the citations.

      "while Lrrk1 transcript in this region is relatively stable during development" The authors reference a very old paper that barely shows any LRRK1 mRNA, and no protein. Others have shown that LRRK1 is essentially not present postnatally PMC2233633. This isn't even an argument the authors need to make. 

      We thank the reviewer and included this more appropriate citation. 

      Reviewer #2 (Recommendations For The Authors): 

      Cyfip1 (Fig 3A) is part of the WAVE complex (page 13). 

      We thank the reviewer and specified it.

      The discussion could be more focused. 

      We extensively revised the discussion to keep it more focused.

      Note that we updated the GO ontology analyses to reflect the updated information present in g:Profiler.

      References.

      Nirujogi, R. S., Tonelli, F., Taylor, M., Lis, P., Zimprich, A., Sammler, E., & Alessi, D. R. (2021). Development of a multiplexed targeted mass spectrometry assay for LRRK2phosphorylated Rabs and Ser910/Ser935 biomarker sites. The Biochemical journal, 478(2), 299–326. https://doi.org/10.1042/BCJ20200930

      Worth, D. C., Daly, C. N., Geraldo, S., Oozeer, F., & Gordon-Weeks, P. R. (2013). Drebrin contains a cryptic F-actin-bundling activity regulated by Cdk5 phosphorylation. The Journal of cell biology, 202(5), 793–806. https://doi.org/10.1083/jcb.201303005

      Shirao, T., Hanamura, K., Koganezawa, N., Ishizuka, Y., Yamazaki, H., & Sekino, Y. (2017). The role of drebrin in neurons. Journal of neurochemistry, 141(6), 819–834. https://doi.org/10.1111/jnc.13988

      Koganezawa, N., Hanamura, K., Sekino, Y., & Shirao, T. (2017). The role of drebrin in dendritic spines. Molecular and cellular neurosciences, 84, 85–92. https://doi.org/10.1016/j.mcn.2017.01.004

      Meixner, A., Boldt, K., Van Troys, M., Askenazi, M., Gloeckner, C. J., Bauer, M., Marto, J. A., Ampe, C., Kinkl, N., & Ueffing, M. (2011). A QUICK screen for Lrrk2 interaction partners--leucine-rich repeat kinase 2 is involved in actin cytoskeleton dynamics. Molecular & cellular proteomics: MCP, 10(1), M110.001172. https://doi.org/10.1074/mcp.M110.001172

      Parisiadou, L., & Cai, H. (2010). LRRK2 function on actin and microtubule dynamics in Parkinson disease. Communicative & integrative biology, 3(5), 396–400. https://doi.org/10.4161/cib.3.5.12286

      Chen, C., Masotti, M., Shepard, N., Promes, V., Tombesi, G., Arango, D., Manzoni, C., Greggio, E., Hilfiker, S., Kozorovitskiy, Y., & Parisiadou, L. (2024). LRRK2 mediates haloperidol-induced changes in indirect pathway striatal projection neurons. bioRxiv : the preprint server for biology, 2024.06.06.597594. https://doi.org/10.1101/2024.06.06.597594

      Cheng, J., Novati, G., Pan, J., Bycroft, C., Žemgulytė, A., Applebaum, T., Pritzel, A.,Wong, L. H., Zielinski, M., Sargeant, T., Schneider, R. G., Senior, A. W., Jumper, J., Hassabis, D., Kohli, P., & Avsec, Ž. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (New York, N.Y.), 381(6664), eadg7492. https://doi.org/10.1126/science.adg7492

      Beaudoin, G. M., 3rd, Schofield, C. M., Nuwal, T., Zang, K., Ullian, E. M., Huang, B., & Reichardt, L. F. (2012). Afadin, a Ras/Rap effector that controls cadherin function, promotes spine and excitatory synapse density in the hippocampus. The Journal of neuroscience : the official journal of the Society for Neuroscience, 32(1), 99–110. https://doi.org/10.1523/JNEUROSCI.4565-11.2012

      Fernández, B., Chittoor-Vinod, V. G., Kluss, J. H., Kelly, K., Bryant, N., Nguyen, A. P. T., Bukhari, S. A., Smith, N., Lara Ordóñez, A. J., Fdez, E., Chartier-Harlin, M. C., Montine, T. J., Wilson, M. A., Moore, D. J., West, A. B., Cookson, M. R., Nichols, R. J., & Hilfiker, S. (2022). Evaluation of Current Methods to Detect Cellular Leucine-Rich Repeat Kinase 2 (LRRK2) Kinase Activity. Journal of Parkinson's disease, 12(5), 1423–1447. https://doi.org/10.3233/JPD-213128

      Cirnaru, M. D., Marte, A., Belluzzi, E., Russo, I., Gabrielli, M., Longo, F., Arcuri, L., Murru, L., Bubacco, L., Matteoli, M., Fedele, E., Sala, C., Passafaro, M., Morari, M., Greggio, E., Onofri, F., & Piccoli, G. (2014). LRRK2 kinase activity regulates synaptic vesicle trafficking and neurotransmitter release through modulation of LRRK2 macromolecular complex. Frontiers in molecular neuroscience, 7, 49. https://doi.org/10.3389/fnmol.2014.00049

      Belluzzi, E., Gonnelli, A., Cirnaru, M. D., Marte, A., Plotegher, N., Russo, I., Civiero, L., Cogo, S., Carrion, M. P., Franchin, C., Arrigoni, G., Beltramini, M., Bubacco, L., Onofri, F., Piccoli, G., & Greggio, E. (2016). LRRK2 phosphorylates pre-synaptic Nethylmaleimide sensitive fusion (NSF) protein enhancing its ATPase activity and SNARE complex disassembling rate. Molecular neurodegeneration, 11, 1. https://doi.org/10.1186/s13024-015-0066-z

      Martin, E. R., Gandawijaya, J., & Oguro-Ando, A. (2022). A novel method for generating glutamatergic SH-SY5Y neuron-like cells utilizing B-27 supplement. Frontiers in pharmacology, 13, 943627. https://doi.org/10.3389/fphar.2022.943627

      Kovalevich, J., & Langford, D. (2013). Considerations for the use of SH-SY5Y neuroblastoma cells in neurobiology. Methods in molecular biology (Clifton, N.J.), 1078, 9–21. https://doi.org/10.1007/978-1-62703-640-5_2

      Drummond, N. J., Singh Dolt, K., Canham, M. A., Kilbride, P., Morris, G. J., & Kunath, T. (2020). Cryopreservation of Human Midbrain Dopaminergic Neural Progenitor Cells Poised for Neuronal Differentiation. Frontiers in cell and developmental biology, 8, 578907. https://doi.org/10.3389/fcell.2020.578907

      Tao, X., Finkbeiner, S., Arnold, D. B., Shaywitz, A. J., & Greenberg, M. E. (1998). Ca2+ influx regulates BDNF transcription by a CREB family transcription factor-dependent mechanism. Neuron, 20(4), 709–726. https://doi.org/10.1016/s0896-6273(00)810107

      El-Husseini, A. E., Schnell, E., Chetkovich, D. M., Nicoll, R. A., & Bredt, D. S. (2000). PSD95 involvement in maturation of excitatory synapses. Science (New York, N.Y.), 290(5495), 1364–1368.

      Glebov OO, Cox S, Humphreys L, Burrone J. Neuronal activity controls transsynaptic geometry. Sci Rep. 2016 Mar 8;6:22703. doi: 10.1038/srep22703. Erratum in: Sci Rep. 2016 May 31;6:26422. doi: 10.1038/srep26422. PMID: 26951792; PMCID: PMC4782104.

      Beccano-Kelly DA, Volta M, Munsie LN, Paschall SA, Tatarnikov I, Co K, Chou P, Cao LP, Bergeron S, Mitchell E, Han H, Melrose HL, Tapia L, Raymond LA, Farrer MJ, Milnerwood AJ. LRRK2 overexpression alters glutamatergic presynaptic plasticity, striatal dopamine tone, postsynaptic signal transduction, motor activity and memory. Hum Mol Genet. 2015 Mar 1;24(5):1336-49. doi: 10.1093/hmg/ddu543. Epub 2014 Oct 24. PMID: 25343991.

    1. eLife Assessment

      This manuscript presents a valuable and insightful contribution to the understanding of how Legionella pneumophila remodels its vacuolar niche through coordinated ubiquitination mechanisms. The identification of Rab5 as a target of both canonical and phosphoribosyl ubiquitination, and the demonstration of a detergent-resistant ubiquitin "cloud" surrounding the LCV, represent significant advances in the field. The findings are supported by rigorous experimental design, robust quantitative analyses, and clear mechanistic insight, meeting a standard of evidence that is compelling and exceeds current state-of-the-art approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In the submitted manuscript, Steinbach et al describe the formation of a detergent-resistant "cloud" around the Legionella-containing vacuole (LCV) that functions as a protective barrier. The authors show that formation of the "cloud" barrier is contingent upon the phosphoribosyl-ubiquitination activity of the SidE/SdeABC effector family, and is temporally regulated, with the assembly and subsequent disassembly of the "cloud" coinciding with replication and vacuolar expansion. The authors postulate a model of "cloud" barrier formation that relies upon a wave of initial ubiquitination by the SidC effector family, after which the SidE/SdeABC family expands the ubiquitination and forms cross-links that render the ubiquitin cloud resistant to harsh detergents. Additionally, Steinbach et al. also demonstrate that Rab5 is recruited to the LCV and remains associated for a considerable period.

      Strengths:

      This manuscript is very well written, with clear justification provided for experiments that make it very easy to follow along with the experimental logic. The figures have clearly been designed with much thought and are easy to interpret. Steinbach et al have also done a commendable job of addressing the previous reviewers' comments, even though some may suggest that some of these comments could be viewed as slightly unreasonable. This work would be of interest to both the Legionella and ubiquitin fields. Legionella researchers would potentially be interested to explore the proposed barrier model as the function for the ubiquitin "cloud," whereas ubiquitin researchers may be interested in exploring the mechanisms underlying SidE's crosslinking ability.

      Weaknesses:

      While the work is important and describes the physical nature of the ubiquitin cloud on the Legionella vacuole, it is somewhat descriptive in nature and does not dig deeply into what purpose this cloud serves. This is a complicated topic that will certainly stimulate additional research in this area.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript "Canonical and phosphoribosyl ubiquitination coordinate to stabilize a proteinaceous structure surrounding the Legionella-containing vacuole" by Steinbach et al. is well written and presents strong evidence that satisfactorily supports the main hypothesis and research objectives. The authors have clearly demonstrated the presence of cloud-like, detergent-resistant GTPase Rab5 surrounding the LCV, and formation of the structure is dependent on the SidE family of effectors. The study provides insights into the relevant (associated with described phenotype) ubiquitination pathways. The findings advance our understanding of Legionella pneumophila vacuole remodeling during intracellular infection and open directions for future research to establish broader implications of this structure on Legionella pathogenesis.

      Strengths:

      The manuscript convincingly demonstrates the presence of a cloud-like, detergent-resistant GTPase Rab5 surrounding the LCV through elegant microscopy. The experimental evidence about the dependence of the observed phenotype on the SidE family of effectors is compelling and presented with strong scientific rigor. The introduction is well-written, and the discussion is thorough and satisfactory. The article is thought-provoking and shows preliminary evidence for ubiquitin-mediated protection and spatial organization of the LCV.

      Weaknesses:

      The manuscript is well-organized and detailed, and it is hard to find weaknesses under the set goals of the research. A few weaknesses are that the molecular determinants or the regulatory mechanisms that drive selective versus non-selective incorporation of host proteins into this structure are unclear, and, as the authors mentioned, further work is required to establish the precise biophysical basis of the detergent resistance and expansive morphology of the ubiquitinated GTPase "cloud". Currently, the function or purpose of the structure is completely speculative. The effects or importance of the structure on bacterial replication is also not established in the current study. Figure 2D, right panel, Western blot results, the authors suggested the signal present in all four lanes between 37 and 25 kDa is 'nonspecific', which is probably a 'too intense' signal to be called so. Mass spec analysis would be interesting in order to identify sources of such intense signals. With these few limitations, the research presented in this manuscript is experimentally rigorous and opens avenues for future research.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Mukherjee and colleagues extended earlier studies on the coordination of the SidC and SidE effector families on the generation of a unique ubiquitin layer on the surface of the vacuoles containing the bacterial pathogen Legionella pneumophila (LCV).

      Strengths:

      The main strength of the manuscript is the identification of the small GTPase Rab5 as a major "carrier" of these differently modified ubiquitin and ubiquitin chains, which was nicely quantified.

      Weaknesses:

      (1) The results are mostly descriptive, based on mechanistic studies from earlier works.

      (2) The majority of the work was dedicated to the characterization of the unique ubiquitin layer on the LCV. One important question was ignored: what is the role of Rab5 in this process? Is the GTPase activity of Rab5 required for its ubiquitination by SidC and SidE? The authors should create a Rab5 KO cell line, complement the line with different mutants of Rab5, and examine their ubiquitination and association with the LCV.

      (3) The finding that Rab5 is associated with the LCV supports the notion that the LCV has characteristics of endo- or/late endosomes. The positioning of the LCV in the endocytic pathway should be discussed in the context of earlier studies (e.g.,PMID: 38739652; PMID: 11067875; PMID: 11067875).

    1. eLife Assessment

      Sanchez-Vasquez et al establish an innovative approach to induce aneuploidy in preimplantation embryos. This important study extends the author's previous publications evaluating the consequences of aneuploidy in the mammalian embryo. In this work, the authors investigate the developmental potential of aneuploid embryos and characterize changes in gene expression profiles under normoxic and hypoxic culture conditions. Using a solid methodology they identify sensitivity to Hif1alpha loss in aneuploid embryos, and in further convincing experiments they assess how levels of DNA damage and DNA repair are altered under hypoxic and normoxic conditions.

    2. Reviewer #1 (Public review):

      Summary:

      This paper developed a model of chromosome mosaicism by using a new aneuploidy-inducing drug (AZ3146), and compared this to their previous work where they used reversine, to demonstrate the fate of aneuploid cells during murine preimplantation embryo development. They found that AZ3146 acts similarly to reversine in inducing aneuploidy in embryos, but interestingly showed that the developmental potential of embryos is higher in AZ3146-treated vs. reversine-treated embryos. This difference was associated with changes in HIF1A, p53 gene regulation, DNA damage, and fate of euploid and aneuploid cells when embryos were cultured in a hypoxic environment.

      Strengths:

      In the current study, the authors investigate the fate of aneuploid cells in the preimplantation murine embryo using a specific aneuploidy-inducing compound to generate embryos that were chimeras of euploid and aneuploid cells. The strength of the work is that they investigate the developmental potential and changes in gene expression profiles under normoxic and hypoxic culture conditions. Further, they also assessed how levels of DNA damage and DNA repair are altered in these culture conditions. They also assessed the allocation of aneuploid cells to the divergent cell lineages of the blastocyst stage embryo.

    3. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1(Public review):

      We deeply appreciate the reviewer comments on our manuscript. Following up the revisions, our manuscript has been improved thanks to their insightful remarks. We have proceeded with all the required changes.  

      Weaknesses:

      The authors have still not addressed the inconsistent/missing description for sample size, the appropriate number of * for each figure panel, and the statistical tests used.

      Description of sample size, specific P value and statistical test used has been added it both in the main text, figures and figure legends.

      The authors assign 5% oxygen as hypoxia. This is not the case as the in vivo environment is close to this value. 5% is normoxia. Clinical IVF/embryo culture occurs at 5% O2. Please adjust your narrative around this.

      We define in our manuscript “normoxia” as the standard atmospheric oxygen levels in tissue culture incubators, which range from about 20–21% oxygen. Our definition of hypoxia is 5% concentration of oxygen, taking into consideration the standard levels of oxygen in the IVF clinics. Physiological oxygen in mouse varies from ~1.5% to 8% (Alva et al 2022). Considering that these levels of oxygen are the standard levels in tissue culture practices, a paragraph has been added to the discussion and materials and methods for further clarification   

      Reviewer #2 (Public review):

      Weakness:

      Given that this is a study on the induction of aneuploidy, it would be meaningful to assess aneuploidy immediately after induction, and then again before implantation. This is also applicable to the competition experiments on page 7/8. What is shown is the competitiveness of treated cells. Because the publication centers around aneuploidy, inclusion of such data in the main figure at all relevant points would strengthen it. There is some evaluation of karyotypes only in the supplemental - why? Would be good not to rely on a single assay that the authors appear to not give much importance.

      This is an excellent point. However, due to the stochasticity of the arising of aneuploidies when embryos are treated with AZ3146 and reversine (Bolton et al 2016), every treatment is likely to generate different levels of aneuploidy. Due to this, and to the technical limitations of generating single-cell genomic DNA sequencing at the blastocyst stage, we were unable to determine the karyotype of all cells after different conditions. Nevertheless, Regin et al 2024 (eLife) showed similar results on the overall transcriptome changes of different dosages of aneuploidy: high dosage embryos overexpress p53, like reversine-treated embryos; meanwhile, low dosage embryos overexpress the hypoxic pathway, including HIF1A, similar to embryos treated with AZ3146.  

      Reviewer #1 (Recommendations for the authors):

      Corrections required before final publishing:

      Please ensure that the number of asterisks is in alignment with standard convention (* <0.05; ** <0.01; *** <0.001; **** <0.0001). If you want to describe an exact P -vale it should be presented as P = 0.0004. line 108 *** is <0.0004. line 263 * P<0.0044

      Same issue appears in lines 697, 711, 722, 753, 685

      Specific values have been added in the figures and modified in the text. 

      Line 199: "...viable E9.5 embryos" missing "Figure S1D"

      Modified in manuscript

      Line 120: "...decidua" please add "Figure S1C"

      Modified in manuscript

      Line 126-127: Please add a description for the results (morula) in Fig 1D, e.g., It appears that YH2Ax persists from 8-cell to morula when treated with Reversine but not AZ3146"

      At the morula stage, the levels of γH2A.X in reversine- and AZ3146-treated embryos are similar (Fig. 1E). However, at the blastocyst stage, high levels of γH2A.X are maintained in reversine-treated embryos and reduced in AZ3146-treated embryos, suggesting some level of DNA repair between the morula-to-blastocyst stages (Fig. S2A). In contrast, in hypoxia, the levels of γH2A.X are low in the three treatments at the morula stages, suggesting that DNA repair can be enhanced under hypoxic conditions. Similar results have been reported in somatic cells (Marti et al., 2021; Pietrzak et al., 2018).

      Line 213: PARP1 levels were also similar under all conditions; but Fig3E, top right shows PARP1 was significantly lower with Reversine treatment; also please correct me if i am wrong, but does the phrase "all conditions" cross reference yH2AX and PARP1 between Fig 3 and Fig 1 to show the impact of hypoxia? Because from my understanding Fig 1 was done in 20% oxygen, but Fig 3 was done in 5% oxygen – hypoxia.

      This is correct. Modification in the manuscript has been performed for clarification

      Line 264: extra forward dash? "Reversine/AZ3146/ aggregation"

      Modified in manuscript

      Line 644: you don't have a control for IDF treatment, so how did you differentiate between impact of aneuploid drugs vs IDF treatment alone? Would the impact observed be due to compounding effect of aneuploidy drugs + IDF?

      This is a great observation. We previously demonstrated that IDF-1174 treatments in embryos do not affect pre-implantation development (Fig. S3).

      Line 681: change their behaviour is a vague statement. Be specific.

      Modified in manuscript

      Line 676 missing bracket "E)"

      Modified in manuscript

      Line 680: "...significantly on" should be "for"

      Modified in manuscript

      Line 682-685: "...hypoxia favours the survival of reversine-induced aneuploid cells." does it? the statement before this says in Rev/AZ chimeras, AZ blastomeres contribute similarly to reversine-blastomeres to the TE and PE but significantly increase contributions to the EPI.Wouldn't this mean hypoxia favours survival of AZ aneuploid cells in EPI?

      In normoxic conditions, AZ3146 treated cells in Rev/AZ chimeras contributed mostly to the EPI and TE but not PE. In contrast, in normoxic conditions, Rev-treated cells contributed similarly to all the lineages. This result seems to be due to a better survival of Rev-treated cells under normoxic conditions (Fig. 4D-E)

      Line 720: (b) shows blastocyst staining from what group? DMSO? Rev/AZ? Or are the 3 blastocysts shown here, 3 separate examples of Reversine-treated blastocysts? Would require labelling Fig S2B, and adding a short description in the corresponding figure legend

      Figure (B) shows the expression pattern of PARP1 at the blastocyst stage. Modified in manuscript

      Figure 2, Figure S3 and Figure S6: were these experiments performed at 5% or 20% O2, please add detail.

      Modified in manuscript

      Reviewer #2 (Recommendations for the authors):

      Lines 45-46 understanding of reduction of aneuploidy should mention/discuss the paper of attrition/selection, of the kind by the Brivanlou lab for instance, or others. As well as allocation to specific lineages, including the authors' work.

      A section in the discussion has been added in response to this recommendation. Comparison between models is debatable.

      The response does not clarify whether other papers were cited instead, or the authors own work that has shown preferential allocation to TE.

    1. eLife Assessment

      This important study provides a novel approach for delineating subcortical-cortical white matter bundles. The authors provide convincing evidence by harnessing state-of-the-art methods and cross-species data. Together, this effort will be of interest to scientists across multiple subfields and accelerate progress in a biologically critical but methodologically challenging area.

    2. Reviewer #1 (Public review):

      The authors note that it is challenging to perform diffusion MRI tractography consistently in both humans and macaques, particularly when deep subcortical structures are involved. The scientific advance described in this paper is effectively an update to the tracts that the XTRACT software supports. The changes to XTRACT are soundly motivated in theory (based on anatomical tracer studies) and practice (changes in seeding/masking for tractography).

    3. Reviewer #2 (Public review):

      Summary:

      In this article, Assimopoulos et al. expand the FSL-XTRACT software to include new protocols for identifying cortical-subcortical tracts with diffusion MRI, with a focus on tracts connecting to the amygdala and striatum. They show that the amygdalofugal pathway and divisions of the striatal bundle/external capsule can be successfully reconstructed in both macaques and humans while preserving large-scale topographic features previously defined in tract tracing studies. The authors set out to create an automated subcortical tractography protocol, and they accomplish this for a subset of specific subcortical connections.

      Strengths:

      The main strength of the current study is the translation of established anatomical knowledge to a tractography protocol for delineating cortical-subcortical tracts that are difficult to reconstruct. Diffusion MRI-based tractography is highly prone to false positives; thus, constraining tractography outputs by known anatomical priors is important. The authors used existing tracing literature to create anatomical constraints for tracking specific cortical-subcortical connections and refined their protocol through an iterative process and in collaboration with multiple neuroanatomists. Key additional strengths include 1) the creation of a protocol that can be applied to both macaque and human data; 2) demonstration that the protocol can be applied to be high quality data (3 shells, > 250 directions, 1.25 mm isotropic, 55 minutes) and lower quality data (2 shells, 100 directions, 2 mm isotropic, 6.5 minutes); and 3) validation that the anatomy of cortical-subcortical tracts derived from the new method are more similar in monozygotic twins than in siblings and unrelated individuals.

      Overall Appraisal:

      This new method will accelerate research on anatomically validated cortical-subcortical white matter pathways. The work has utility for diffusion MRI researchers across fields.

      Editors' note:

      Both reviewers were satisfied with the responses to their feedback.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Summary:

      The authors note that it is challenging to perform diffusion MRI tractography consistently in both humans and macaques, particularly when deep subcortical structures are involved. The scientific advance described in this paper is effectively an update to the tracts that the XTRACT software supports. The claims of robustness are based on a very small selection of subjects from a very atypical dMRI acquisition (n=50 from HCP-Adult) and an even smaller selection of subjects from a more typical study (n=10 from ON-Harmony).

      Strengths:

      The changes to XTRACT are soundly motivated in theory (based on anatomical tracer studies) and practice (changes in seeding/masking for tractography), and I think the value added by these changes to XTRACT should be shared with the field. While other bundle segmentation software typically includes these types of changes in release notes, I think papers are more appropriate.

      We would like to thank the reviewer for their assessment and we appreciate the comments for improving our manuscript. We have added new results, sampling from a larger cohort with a typical dMRI protocol (N=50 from UK Biobank), as well as showcasing examples from individual subject reconstructions (Supplementary figures S6, S7). We also demonstrate comparisons against another approach that has been proposed for extracting parts of the cortico-striatal bundle in a bundle segmentation fashion, as the reviewer suggests (see comment and Author response image 1 below). 

      We would also like to take the opportunity to summarise the novelty of our contribuIons, as detailed in the Introduction, which we believe extend beyond a mere software update; this is a byproduct of this work rather than the aim. 

      i) We devise for the first Ime standard-space protocols for 21 challenging cortico-subcortical bundles for both human and macaque and we interrogate them in a comprehensive manner.

      ii) We demonstrate robustness of these protocols using criteria grounded on neuroanatomy, showing that tractography reconstructions follow topographical principles known from tracers both in WM and GM and for both species. We also show that these protocols capture individual variability as assessed by respecting family structure in data from the HCP twins.

      iii) We use high-resolution dMRI data (HCP and post-mortem macaque) to showcase feasibility of these reconstructions, and we show that reconstructions are also plausible with more conventional data, such as the ones from the UK Biobank.

      iv) We further showcase robustness and the value of cross-species mapping by using these tractography reconstructions to predict known homologous grey matter (GM) regions across the two species, both in cortex and subcortex, on the basis of similarity of grey matter areal connection patterns to the set of proposed white matter bundles.

      Weaknesses

      (2) The demonstration of the new tracts does not include a large number of carefully selected scans and is only compared to the prior methods in XTRACT. The small n and limited statistical comparisons are insufficient to claim that they are better than an alternative. Qualitatively, this method looks sound.

      We appreciate the suggestion for larger sample size, so we performed the same analysis using 50 randomly drawn UK Biobank subjects, instead of ON-Harmony, matching the N=50 randomly drawn HCP subjects (detailed explanation in the comment below, Main text Figure 4A; Supplementary Figures S4). We also generated results using the full set of N=339 HCP unrelated subjects (Supplementary Figure S5 compares 10, 50 and 339 unrelated HCP subjects). We provide further details in the relevant point (3) below. 

      With regards to comparisons to other methods, there are not really many analogous approaches that we can compare against. In our knowledge there are no previous cross-species, standard space tractography protocols for the tracts we considered in this study (including Muratoff, amygdalofugal, different parts of extreme an external capsules, along with their neighbouring tracts). We therefore i) directly compared against independent neuroanatomical knowledge and patterns (Figures 2, 3, 5), ii) confirmed that patterns against data quality and individual variability that the new tracts demonstrate are similar to patterns observed for the more established cortical tracts (Figure 4), iii) indirectly assessed efficacy by performing a demanding task, such as homologue identification on the basis of the tracts we reconstruct (Figures 6, 7). 

      We need to point out that our approach is not “bundle segmentation”, in the sense of “datadriven” approaches that cluster streamlines into bundles following full-brain tractography. The latter is different in spirit and assigns a label to each generated streamline; as full-brain tractography is challenging (Maier-Hein, Nature Comms 2017), we follow instead the approach of imposing anatomical constraints to miIgate for some of these challenges as suggested in (MaierHein, 2017).

      Nevertheless, we used TractSeg (one of the few alternatives that considers corticostriatal bundles) to perform some comparisons. The Author response image below shows average path distributions across 10 HCP subjects for a few bundles that we also reconstruct in our paper (no temporal part of striatal bundle is generated by Tractseg). We can observe that the output for each tract is highly overlapping across subjects, indicating that there is not much individual variability captured. We also see the reduced specificity in the connectivity end-points of the bundles. 

      Author response image 1.

      Comparison between 10-subject average for example subcortical tracts using TractSeg and XTRACT. We chose example bundles shared between our set and TractSeg. Per subject TractSeg produces a binary mask rather than a path distribution per tract. Furthermore, the mask is highly overlapping across subjects. Where direct correspondence was not possible, we found the closest matching tract. Specifically, we used ST_PREF for STBf, and merged ST_PREC with ST_POSTC to match StBm. There was no correspondence for the temporal part of StB.

      We subsequently performed the twinness test using both TractSeg and XTRACT (Author response image 2), as a way to assess whether aspects of individual variability can be captured. Due to heritability of brain organisation features, we anticipate that monozygotic twins have more similar tract reconstructions compared to dizygoIc twins and subsequently non-twin siblings. This pattern is reproduced using our proposed approach, but not using TractSeg that provides a rather flat pattern.  

      Author response image 2.

      Violin plots of the mean pairwise Pearson’s correlations across tracts between 72 monozygotic (MZ) twin pairs, 72 dizygotic (DZ) twin pairs, 72 non-twin sibling pairs, and 72 unrelated subject pairs from the Human Connectome Project, using Tractseg (left) and XTRACT (right). About 12 cortico-subcortical tracts were considered, as closely matched as possible between the two approaches. For Tractseg we considered: 'CA', 'FX', 'ST_FO', 'ST_M1S1' (merged ‘ST_PREC’ and ‘ST_POSTC’ to approximate the sensorimotor part of our striatal bundle), 'ST_OCC', 'ST_PAR', 'ST_PREF',  'ST_PREM', 'T_M1S1' (merged ‘T_PREC’ and ‘T_POSTC’ to approximate the sensorimotor part of our striatal bundle), 'T_PREF', 'T_PREM', 'UF'. For XTRACT we considered: 'ac', 'fx', 'StB<sub>f</sub>', 'StB<sub>m</sub>', 'StB<sub>p</sub>', 'StB<sub>t</sub>, 'EmC<sub>f</sub>', 'EmC<sub>p</sub>', 'EmC<sub>t</sub>', 'MB', 'amf', 'uf'. Showing the mean (μ) and standard deviation (σ) for each group. There were no significant di^erences between groups using TractSeg.

      Taken together, these results indicate as a minimum that the different approaches have potentially different aims. Their different behaviour across the two approaches can be desirable and beneficial for different applications (for instance WM ROI segmentation vs connectivity analysis) but makes it challenging to perform like-to-like comparisons.

      (3) “Subject selection at each stage is unclear in this manuscript. On page 5 the data are described as "Using dMRI data from the macaque (𝑁 = 6) and human brain (𝑁 = 50)". Were the 50 HCP subjects selected to cover a range of noise levels or subject head motion? Figure 4 describes 72 pairs for each of monozygotic, dizygotic, non-twin siblings, and unrelated pairs - are these treated separately? Similarly, NH had 10 subjects, but each was scanned 5 times. How was this represented in the sample construction?”

      We appreciate the suggestions and we agree that some of the choices in terms of group sizes may have been confusing. Short answer is we did not perform any subject selection, subjects were randomly drawn from what we had available. The 72 twin pairs are simply the maximum number of monozygotic twin pairs available in the HCP cohort, so we used 72 pairs in all categories to match this number in these specific tests. The N=6 animals are good quality post-mortem dMRI data that have been acquired in the past and we cannot easily expand. For the rest of the points, we have now made the following changes:

      We have replaced our comparison to the ON-Harmony dataset (10 subjects) with a comparison to 50 unrelated UK Biobank subjects (to match the 50 unrelated HCP subject cohort used throughout). Updated results can be seen in Figure 4A and Supplementary Figure S4. This allows a comparison of tractography reconstruction between high quality and more conventional quality data for the same N.

      We looked at QC metrics to ensure our chosen cohorts were representaIve of the full cohorts we had available. The N=50 unrelated HCP cohort and N=50 unrelated UKBiobank cohorts we used in the study captured well the range of the full 339 unrelated HCP cohort and N=7192 UKBiobank cohort in terms of absolute/relative moion (Author response image 3A and 3B respectively). A similar pattern was observed in terms of SNR and CNR ranges Author response image 4).

      We generated tractography reconstructions for single subjects, corresponding to the 10th percentile (P<sub>10</sub>), median and 90th percentile (P90) of the distributions with respect to similarity to the cohort average maps. These are now shown in Supplementary Figures S6, S7. We also checked the QC metrics for these single subjects and confirmed that average absolute subject moIon was highest for the P<sub>10</sub>, followed by the P<sub>50</sub> and lowest for the P<sub>90</sub> subject, capturing a range of within cohort data quality.

      We generated reconstructions for an even larger HCP cohort (all 339 unrelated HCP subjects) and these look very similar to the N=50 reconstructions (Supplementary Figure S5).

      Author response image 3.

      Subsets chosen from the HCP and UKB reflect similar range of average motion (relative and absolute) to the corresponding full cohorts. (A) Absolute and relative motion comparison between N=50 and N=339 unrelated HCP subjects. (B) Absolute and relative motion comparison between N=50 and N=7192 super-healthy UKB subjects.  

      Author response image 4.

      Average SNR and CNR values show similar range between the N=50 UKB subset and the full UK Biobank cohort of N=7192.

      (4) In the paper, the authors state "the mean agreement between HCP and NH reconstructions was lower for the new tracts, compared to the original protocols (𝑝 < 10^−10). This was due to occasionally reconstructing a sparser path distribution, i.e., slightly higher false negative rate," - how can we know this is a false negative rate without knowing the ground truth?

      We are sorry for the terminology, we have corrected this, as it was confusing. Indeed, we cannot call it false negaIve, what we meant is that reconstructions from lower resolution data for these bundles ended up being in general sparser than the ones from the high-resolution data, potentially missing parts of the tract. We have now revised the text accordingly.

      Reviewer #2 Public Review:

      (5) Summary:

      In this article, Assimopoulos et al. expand the FSL-XTRACT software to include new protocols for identifying cortical-subcortical tracts with diffusion MRI, with a focus on tracts connecting to the amygdala and striatum. They show that the amygdalofugal pathway and divisions of the striatal bundle/external capsule can be successfully reconstructed in both macaques and humans while preserving large-scale topographic features previously defined in tract tracing studies. The authors set out to create an automated subcortical tractography protocol, and they accomplished this for a subset of specific subcortical connections for users of the FSL ecosystem.

      Strengths:

      A main strength of the current study is the translation of established anatomical knowledge to a tractography protocol for delineating cortical-subcortical tracts that are difficult to reconstruct. Diffusion MRI-based tractography is highly prone to false positives; thus, constraining tractography outputs by known anatomical priors is important. Key additional strengths include 1) the creation of a protocol that can be applied to both macaque and human data; 2) demonstration that the protocol can be applied to be high quality data (3 shells, > 250 directions, 1.25 mm isotropic, 55 minutes) and lower quality data (2 shells, 100 directions, 2 mm isotropic, 6.5 minutes); and 3) validation that the anatomy of cortical-subcortical tracts derived from the new method are more similar in monozygotic twins than in siblings and unrelated individuals.

      We thank the Reviewer for the globally posiIve evaluaIon of this work and the perInent comments that have helped us to improve the paper.

      Weaknesses

      (6) Although this work validates the general organizational location and topographic organization of tractography-derived cortical-subcortical tracts against prior tract tracing studies (a clear strength), the validation is purely visual and thus only qualitative. Furthermore, it is difficult to assess how the current XTRACT method may compare to currently available tractography approaches to delineating similar cortical-subcortical connections. Finally, it appears that the cortical-subcortical tractography protocols developed here can only be used via FSL-XTRACT (yet not with other dMRI software), somewhat limiting the overall accessibility of the method.

      We agree that a more quanItative comparison against gold standard tracing data would be ideal. However, there are practical challenges that prohibit such a comparison at this stage: i) Access to data. There are no quantifiable, openly shared, large scale/whole brain tracing data available. The Markov study provided the only openly available weighted connectivity matrices measured by tracers in macaques (Markov, Cereb Cortex 2014), which are only cortico-cortical and do not provide the white matter routes, they only quantify the relative contrast in connection terminals. ii) 2D microscopy vs 3D tractography. The vast majority of tracing data one can find in neuroanatomy labs is on 2D microscopy slices with restricted field of view, which is also the case for the data we had access to for this study. This complicates significantly like-to-like comparisons against 3D whole-brain tractography reconstructions. iii) Quantifiability is even tricky in the case of gold standard axonal tracing, as it depends on nuisance factors, e.g. injection site, injection size, injection uniformity and coverage, which confound the gold-standard measurements, but are not relevant for tractography. For these reasons, a number of high-profile NIH BRAIN CONNECTS Centres (for instance hXps://connects.mgh.harvard.edu/, hXps://mesoscaleconnecIvity.org/) are resourced to address these challenges at scale in the coming years and provide the tools to the community to perform such quantitative comparisons in the future.  

      In terms of comparison with other approaches, we have performed new tests and detail a response to a similar comment (2) from Reviewer 1.

      Finally, our protocols have been FSL-tested, but have nothing that is FSL specific. We cannot speak of performance when used with other tools, but there is nothing that prohibits translation of these standard space protocols to other tools. In fact, the whole idea behind XTRACT was to generate an approach open to external contributions for bundle-specific delineation protocols, both for humans and for non-human species. A number of XTRACT extensions that have been published over the last 5 years for other NHP species (Roumazeilles et al. (2020); Bryant et al. (2020); Wang et al. (2025)) and similar approaches have been used in commercial packages (Boshkovski et al, 2106, ISMRM 2022).

      Recommendations To the Authors:

      (7) Superiority of the FSL-XTRACT approach to delineating cortical-subcortical tracts. The Introduction of the article describes how "Tractography protocols for white matter bundles that reach deeper subcortical regions, for instance the striatum or the amygdala, are more difficult to standardize" due to the size, proximity, complexity, and bottlenecks associated with corticalsubcortical tracts. It would be helpful for the authors to better describe how the analytic approach adopted here overcomes these various challenges. What does the present approach do differently than prior efforts to examine cortical-subcortical connectivity? 

      There have not been many prior efforts to standardise cortico-subcortical connecIvity reconstructions, as we overview in the Introduction. As outlined in (Schilling et al. (2020),  hXps://doi.org/10.1007/s00429-020-02129-z), tractography reconstructions can be highly accurate if we guide them using constraints that dictate where pathways are supposed to go and where they should not go. This is the philosophy behind XTRACT and all the proposed protocols, which provide neuroanatomical constraints across different bundles. At the same time these constraints are relatively coarse so that they are species-generalisable. We have clarified that in Discussion. The approach we took was to first identify anatomical constraints from neuroanatomy literature for each tract of interest independently, derive and test these protocols in the macaque, and then optimise in an iterative fashion until the protocols generalise well to humans and until, when considering groups of bundles, the generated reconstructions can follow topographical principles known from tract tracing literature. This process took years in order to perform these iterations as meticulously as we could. We have modified the first sections in Methods to reflect this better (3rd paragraph of 1st Methods section), as well as modified the third and second to last paragraphs of the Introduction (“We propose an approach that addresses these challenges…”).

      (8) Relatedly, it is difficult to fully evaluate the utility of the current approach to dissecting cortical-subcortical tracts without a qualitative or quantitative comparison to approaches that already exist in the field. Can the authors show that (or clarify how) the FSL-XTRACT approach is similar to - or superior to - currently available methods for defining cortical-striatal and amygdalofugal tracts (e.g., methods they cite in the Introduction)?”

      From the limited similar approaches that exist, we did perform some comparisons against TractSeg, please see Reply to Comment 2 from Reviewer 1. We have also expanded the relevant text in the introduction to clarify the differences:

      “…However, these either uIlise labour-intensive single-subject protocols (22,26), are not designed to be generalisable across species (42, 43), or are based mostly on geometrically-driven parcellaIons that do not necessarily preserve topographical principles of connecIons (40). We propose an approach that addresses these challenges and is automated, standardised, generalisable across two species and includes a larger set of cortico-subcortical bundles than considered before, yielding tractography reconstructions that are driven by neuroanatomical constraints.”

      (9) Future applications of the tractography protocol:

      It would be helpful for the authors to describe the contexts in which the automated tractography approach developed here can (and cannot) be applied in future studies. Are future applications limited to diffusion data that has been processed with FSL's BEDPOSTX and PROBTRACKX? Can FSL-XTRACT take in diffusion data modelled in other software (e.g., with CSD in mrtrix or with GQI in DSI Studio)? Can the seed/stop/target/exclusion ROIs be applied to whole-brain tractography generated in other software? Integration with other software suites would increase the accessibility of the new tract dissection protocols.

      We have added some text in the Discussion to clarify this point. Our protocols have been FSLtested, but have nothing that is FSL specific. We cannot speak of performance of other tools, but there is nothing that prohibits translaIon of these standard space protocols to other tools. As described before, the protocols are recipes with anatomical constraints including regions the corresponding white matter pathways connect to and regions they do not, constructed with cross-species generalisability in mind. In fact a number of other packages (even commercial) have adopted the XTRACT protocols with success in the past, so we do not see anything in principle that prohibits these new protocols to be similarly adopted. 

      We cannot comment on the protocols’ relevance for segmenIng whole-brain tractograms, as these can induce more false posiIves than tractography reconstructions from smaller seed regions and may require stricter exclusions.    

      (10) It was great to see confirmation that the XTRACT approach can be successfully applied in both high-quality diffusion data from the HCP and in the ON-Harmony data. Given the somewhat degraded performance in the lower quality dataset (e.g., Figure 4A), can the authors speak to the minimum data requirements needed to dissect these new cortical-subcortical tracts? Will the approach work on single-shell, low b data? Is there a minimum voxel resolution needed? Which tracts are expected to perform best and worst in lower-quality data?

      Thank you for these comments, even if we have not really tried in lower (spaIal and angular) resolution data, given the proximity of the tracts considered, as well as the small size of some bundles, we would not recommend lower resolution than those of the UK Biobank protocol. In general, we would consider the UK Biobank protocol (2mm, 2 shells) as the minimum and any modern clinical scanner can achieve this in 6-8 minutes. We hence evaluated performance from high quality HCP to lower quality UK Biobank data, covering a considerable range (scan Ime from 55 minutes down to 6 minutes). 

      In terms of which tract reconstructions were more reproducible for UKBiobank data, the tracts with lowest correlations across subjects (Figure 4) were the anterior commissure (AC) and the temporal part of the Extreme Capsule (EmC<sub>t</sub>), while the highest correlations were for the Muratoff Bundle (MB) and the temporal part of the Striatal Bundle (StB<sub>t</sub>). Interestingly, for the HCP data, the temporal part of the Extreme Capsule (EmC<sub>t</sub>) and the Muratoff Bundle were also the tracts with the lowest/highest correlations, respectively. Hence, certain tract reconstructions were consistently more variable than others across subjects, which may hint to also being more challenging to reconstruct. We have now clarified these aspects in the corresponding Results section. 

      (11) Anatomical validation of the new cortical-subcortical tracts

      I really appreciated the use of prior tract tracing findings to anatomically validate the corticalsubcortical tractography outputs for both the cortical-striatal and amygdalofugal tracts. It struck me, however, that the anatomical validation was purely qualitative, focused on the relative positioning or the topographical organization of major connections. The anatomical validation would be strengthened if profiles of connectivity between cortical regions and specific subcortical nuclei or subcortical subdivisions could be quantitatively compared, if at all possible. Can the differential connectivity shown visually for the putamen in Figure 3 be quantified for the tract tracing data and the tractography outputs? Does the amygdalofugal bundle show differential/preferential connectivity across amygdala nuclei in tract tracing data, and is this seen in tractography?

      We appreciate the comment, please see Reply to your comment 6 above. In addiIon to the challenges described there, we do not have access to terminal fields other than in the striatum and these ones are 2D, so we make a qualitaIve comparison of the relevant connecIvity contrasts. We expect that a number of currently ongoing high-profile BRAIN CONNECTS Centres (such as the LINC and the CMC) will be addressing such challenges in the coming years and will provide the tools and data to the community to perform such quanItaIve comparisons at scale.  

      (12) I believe that all visualizations of the macaque and human tractography showed groupaveraged maps. What do these tracts look like at the individual level? Understanding individual-level performance and anatomical variation is important, given the Discussion paragraph on using this method to guide neuromodulation.

      We now demonstrate some representative examples of individual subject reconstructions in Supplementary Figures S6, S7, ranking subjects by the average agreement of individual tract reconstructions to the mean and depicting the 10th percentile, median and 90th percentile of these subjects. We have also shown more results in Author response images 1-2, generated by TractSeg, to indicate how a different bundle segmentation approach would handle individual variability compared to our approach.

      (13) Connectivity-based comparisons across species:

      Figures 5 and 6 of the manuscript show that, as compared to using only cortico-cortical XTRACT tracts, using the full set of XTRACT tracts (with new cortical-subcortical tracts) allows for more specific mapping of homologous subcortical and cortical regions across humans and macaques. Is it possible that this result is driven by the fact that the "connectivity blueprints" for the subcortex did not use an intermediary GM x WM matrix to identify connection patterns, whereas the connectivity blueprints for the cortex did? I was surprised that a whole brain GM x WM connectivity matrix was used in the cortical connectivity mapping procedure, given known problems with false positives etc., when doing whole brain tractography - especially aHer such anatomical detail was considered when deriving the original tracts. Perhaps the intermediary step lowers connectivity specificity and accuracy overall (as per Figure 9), accounting for the poorer performance for cortico-cortical tracts?

      The point is well-taken, however it cannot drive the results in Figures 5 and 6. Before explaining this further, let us clarify the raIonale of using the GMxWM connecIvity matrix, which we have published quite extensively in the past for cortico-cortical connecIons (Mars, eLife 2018 - Warrington, Neuroimage 2020 - Roumazeilles, PLoS Biology 2020 - Warrington, Science Advances 2022 – Bryant, J Neuroscience 2025). 

      Having established the bodies of the tract using the XTRACT protocols, we use this intermediate step of multiplying with a GM x WM connectivity matrix to estimate the grey matter projections of the tracts. The most obvious approach of tracking towards the grey matter (i.e. simply find where tracts intersect GM) has the problem that one moves through bottlenecks in the cortical gyrus and after which fibres fan out. Most tractography algorithms have problems resolving this fanning. However, we take the opposite approach of tracking from the grey matter surface towards the white matter (GMxWM connectivity matrix), thus following the direction in which the fibres are expected to merge, rather than to fan out. We then multiply the GMxWM tractrogram with that of the body of the tract to identify the grey matter endpoints of the tract. This avoids some of the major problems associated with tracking towards the surface. In fact, using this approach improves connectivity specificity towards the cortex, rather than the opposite. We provide some indicative results here for a few tracts:

      Author response image 5.

      Connectivity profiles for example cortico-cortical tracts with and without using the intermediary GMxWM matrix. Tracts considered are the Superior Longitudinal Fasciculus 1 (SLF<sub>1</sub>), Superior Longitudinal Fasciculus 2 (SLF<sub>2</sub>), the Frontal Aslant (FA) and the Inferior Fronto-Occipital Fasciculus (IFO). We see that the surface connectivity patterns without using the GMxWM intermediary matrix are more diffuse (effect of “fanning out” gyral bias), with reduced specificity, compared to whenusing the GMxWM matrix

      Tracking to/from subcortical nuclei does not have the same tractography challenges as tracking towards the cortex and in fact we found that using the intermediary GMxWM matrix is less favourable for subcortex (Figure 9), which is why we opted for not using it. 

      Regardless of how cortical and subcortical connectivity patterns are obtained, the results in Figures 5 and 6 utilise only cortical connectivity patterns. Hence, no matter what tracts are considered (cortico-cortical or cortico-subcortical) to build the connectivity patterns, these results have been obtained by always using the intermediate step of multiplying with the GMxWM connectivity matrix (i.e. it is not the case that cortical features are obtained with the intermediate step and subcortical features without, all of them have the intermediate step applied, as the connectivity patterns comprise of cortical endpoints). Figure 9 is only applicable for subcortical endpoints that play no role in the comparisons shown in Figures 5 and 6. We hope this clarifies this point.

      (14) Methodological clarifications:

      The Methods describe how anatomical masks used in tractography were delineated in standard macaque space and then translated to humans using "correspondingly defined landmarks". Can the authors elaborate as to how this translation from macaques to humans was accomplished?

      For a given tract, our process for building a protocol involved looking into the wider anatomical literature, including the standard white matter atlas of Schmahmann and Pandya (2006) and numerous anatomy papers that are referenced in the protocol description, to determine the expected path the tract was meant to take in white matter and which cortical and subcortical regions are connected. This helped us define constraints and subsequently the corresponding masks. The masks were created through the combination of hand-drawn ROIs and standard space atlases. We firstly started with the macaque where tracer literature is more abundant, but, importantly, our protocol definitions have been designed such that the same protocol can be applied to the human and macaque brain. All choices were made with this aspect in mind, hence corresponding landmarks between the two brains were considered in the mask definition (for instance “the putamen”, “a sub-commissural white matter mask”, the “whole frontal pole” etc, as described in the protocol descriptions).

      The protocols have not been created by a single expert but have been collated from multiple experts (co-authors SA, SW, DF, KB, SH, SS drove this aspect) and the final definitions have been agreed upon by the authors. 

      (15) The article heavily utilizes spatial path distribution maps/normalized path distributions, yet does not describe precisely what these are and how they were generated. Can the authors provide more detail, along with the rationale for using these with Pearson's correlations to compare tracts across subjects (as opposed to, e.g., overlap sensitivity/specificity or the Jaccard coefficient)?

      We have now clarified in text how these plots are generated, particularly when compared using correlation values. We tried Jaccard indices on binarized masks of the tracts and these gave similar trends to the correlations reported in Figure 4 (i.e. higher similarities within that across cohorts). We however feel that correlations are better than Jaccard indices, as the latter assume binary masks, so they focus on spatial overlap ignoring the actual values of the path distributions, we hence kept correlations in the paper.

      Reviewing Editor Comments

      “The reviewers had broadly convergent comments and were enthusiastic about the work. As further detailed by Reviewer 3 (see below), if the authors choose to pursue revisions, there are several elements that have the potential to enhance impact.”

      Thank you, we have replied accordingly and aimed to address most of the comments of the Reviewers.   

      “Comparison to existing methods. How does this approach compare to other approaches cited by the authors?”

      Please see replies to Comment 2 of Reviewer 1 and Comment 7 of Reviewer 2. Briefly, we have now generated new results and clarified aspects in the text. 

      “Minimum data requirements. How broadly can this approach be used across scan variation? How does this impact data from individual participants? Displaying individual participants may help, in addition to group maps.”

      Please see replies to Comment 10 of Reviewer2 on minimum data requirements and individual parIcipants, as well as to Comment 3 of Reviewer 1 on the actual groups considered. Briefly, we have generated new figures and regenerated results using UKBiobank data. 

      Softare. What are the sofware requirements? Is the approach interoperable with other methods?”

      Please see Reply to Comment 9 of Reviewer 2. Our protocols can be used to guide tractography using other types of data as they comprise of guiding ROIs for a given tract. So, although we have not tested them beyond FSL-XTRACT, we believe they can be useful with other tractography packages as well, as there is nothing FSL-specific in these anatomically-informed recipes. 

      “Comparisons with tract tracing. To the degree possible, quantitative comparisons with tract tracing data would bolster confidence in the method.”

      Please see Replies to Comments 6 and 11 of Reviewer 2. Briefly, we appreciate the comment and it is something we would love to do, but there are no data readily available that would allow such quanItaIve comparison in a meaningful way. This is a known challenge in the tractography field, which is why NIH has invested in two 5 year Centres to address it. Our approach will provide a solid starIng point for opImising and comparing further cortico-subcortical tractography reconstructions against microscopy and tracers in the same animal and at scale.

    1. eLife Assessment

      This valuable study presents an analysis of evolutionary conservation in intrinsically disordered regions, identified as key drivers of phase separation, leveraging a protein language model. The strength of evidence is convincing, but a clearer justification of the methods and analyses is needed to fully support the main claims.

    2. Reviewer #1 (Public review):

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. While the paper is relatively easy to read overall, my main comment is that the authors could perhaps make it clearer which observations are new, and which support previous work using related approaches. Further, while the link to phase separation is interesting, it is not completely clear which data supports the statements made, and this could also be made clearer.

      Major comments:

      (1) With respect to putting the work in a better context of what has previously been done before, this is not to say that there is not new information in it, but what the authors do is somewhat closely related to work by others. I think it would be useful to make those links more directly. Some examples:

      (1a) Alderson et al (reference 71) analysed in detail the conservation of IDRs (via pLDDT, which is itself related to conservation) to show, for example, that conserved residues fold upon binding. This analysis is very similar to the analysis used in the current study (using ESM2 as a different measure of conservation). Thus, the approach (pages 7-8) described as "This distinction allows us to classify disordered regions into two types: "flexible disordered" regions, which show high ESM2 scores and greater mutational tolerance, and "conserved disordered" regions, which display low ESM2 scores, indicating varying levels of mutational constraint despite a lack of stable folding." is fundamentally very similar to that used by Alderson et al. Thus, the result that "Given that low ESM2 scores generally reflect mutational constraint in folded proteins, the presence of region a among disordered residues suggests that certain disordered amino acids are evolutionarily conserved and likely functionally significant" is in some ways very similar to the results of that paper.

      (1b) Dasmeh et al (https://doi.org/10.1093/genetics/iyab184), Lu et al (https://doi.org/10.1371/journal.pcbi.1010238) and Ho & Huang (https://doi.org/10.1002/pro.4317) analysed conservation in IDRs, including aromatic residues and their role in phase separation

      (1c) A number of groups have performed proteomewide saturation scans using pLMs, including variants of the ESM family, including Meier (reference 89, but cited about something else) and Cagiada et al (https://doi.org/10.1101/2024.05.21.595203) that analysed variant effects in IDRs using a pLM. Thus, I think statements such as "their applicability to studying the fitness and evolutionary pressures on IDRs has yet to be established" should possibly be qualified.

      (2) On page 4, the authors write, "The conserved residues are primarily located in regions associated with phase separation." These results are presented as a central part of the work, but it is not completely clear what the evidence is.

      (3) It would be useful with an assessment of what controls the authors used to assess whether there are folded domains within their set of IDRs.

    3. Reviewer #2 (Public review):

      This manuscript uses the ESM2 language model to map the evolutionary fitness landscape of intrinsically disordered regions (IDRs). The central idea is that mutational preferences predicted by these models could be useful in understanding eventual IDR-related behavior, such as disruption of otherwise stable phases. While ESM2-type models have been applied to analyze such mutational effects in folded proteins, they have not been used or verified for studying IDRs. Here, the authors use ESM2 to study membraneless organelle formation and the related fitness landscape of IDRs.

      Through this, their key finding in this work is the identification of a subset of amino acids that exhibit mutation resistance. Their findings reveal a strong correlation between ESM2 scores and conservation scores, which if true, could be useful for understanding IDRs in general. Through their ESM2-based calculations, the authors conclude that IDRs crucial for phase separation frequently contain conserved sequence motifs composed of both so-called sticker and spacer residues. The authors note that many such motifs have been experimentally validated as essential for phase separation.

      Unfortunately, I do not believe that the results can be trusted. ESM2 has not been validated for IDRs through experiments. The authors themselves point out its little use in that context. In this study, they do not provide any further rationale for why this situation might have changed. Furthermore, they mention that experimental perturbations of the predicted motifs in in vivo studies may further elucidate their functional importance, but none of that is done here. That some of the motifs have been previously validated does not give any credibility to the use of ESM2 here, given that such systems were probably seen during the training of the model.

      I believe that the authors should revamp their whole study and come up with a rigorous, scientific protocol where they make predictions and test them using ESM2 (or any other scientific framework).

    4. Reviewer #3 (Public review):

      Summary:

      This is a very nice and interesting paper to read about motif conservation in protein sequences and mainly in IDRs regions using the ESM2 language model. The topic of the paper is timely, with strong biological significance. The paper can be of great interest to the scientific community in the field of protein phase transitions and future applications using the ESM models. The ability of ESM2 to identify conserved motifs is crucial for disease prediction, as these regions may serve as potential drug targets. Therefore, I find these findings highly significant, and the authors strongly support them throughout the paper. The work motivates the scientific community towards further motif exploration related to diseases.

      Strengths:

      (1) Revealing conserved regions in IDRs by the ESM-2 language model.

      (2) Identification of functionally significant residues within protein sequences, especially in IDRs.

      (3) Findings supported by useful analyses.

      Weaknesses:

      (1) Lack of examples demonstrating the potential biological functions of these conserved regions

      (2) Very limited discussion of potential future work and of limitations.

    1. eLife Assessment

      This important study explored a number of issues related to citations in the peer review process. An analysis of more than 37000 peer reviews at four journals found that: i) during the first round of review, reviewers were less likely to recommend acceptance if the article under review cited the reviewer's own articles; ii) during the second and subsequent rounds of review, reviewers were more likely to recommend acceptance if the article cited the reviewer's own articles; iii) during all rounds of review, reviewers who asked authors to cite the reviewer's own articles (a practice known as 'coercive citation') were less likely to recommend acceptance. However, when an author agreed to cite work by the reviewer, the reviewer was more likely to recommend acceptance of the revised article. The evidence is convincing, but article would benefit from a clearer presentation of the results and a more nuanced discussion of the motivations of reviewers.

    2. Reviewer #1 (Public Review):

      Summary:

      The work used open peer reviews and followed them through a succession of reviews and author revisions. It assessed whether a reviewer had requested the author include additional citations and references to the reviewers' work. It then assessed whether the author had followed these suggestions and what the probability of acceptance was based on the authors decision.

      Strengths and weaknesses:

      The work's strengths are the in-depth and thorough statistical analysis it contains and the very large dataset it uses. The methods are robust and reported in detail. However, this is also a weakness of the work. Such thorough analysis makes it very hard to read! It's a very interesting paper with some excellent and thought provoking references but it needs to be careful not to overstate the results and improve the readability so it can be disseminated widely. It should also discuss more alternative explanations for the findings and, where possible, dismiss them.

    3. Reviewer #2 (Public Review):

      Summary:

      This article examines reviewer coercion in the form of requesting citations to the reviewer's own work as a possible trade for acceptance and shows that, under certain conditions, this happens.

      Strengths:

      The methods are well done and the results support the conclusions that some reviewers "request" self-citations and may be making acceptance decisions based on whether an author fulfills that request.

      Weaknesses:

      The author needs to be more clear on the fact that, in some instances, requests for self-citations by reviewers is important and valuable.

    4. Reviewer #3 (Public Review):

      Summary:

      In this article, Barnett examines a pressing question regarding citing behavior of authors during the peer review process. In particular, the author studies the interaction between reviewers and authors, focusing on the odds of acceptance, and how this may be affected by whether or not the authors cited the reviewers' prior work, whether the reviewer requested such citations be added, and whether the authors complied/how that affected the reviewer decision-making.

      Strengths:

      The author uses a clever analytical design, examining four journals that use the same open peer review system, in which the identities of the authors and reviewers are both available and linkable to structured data. Categorical information about the approval is also available as structured data. This design allows a large scale investigation of this question.

      Weaknesses:

      My concerns pertain to the interpretability of the data as presented and the overly terse writing style.

      Regarding interpretability, it is often unclear what subset of the data are being used both in the prose and figures. For example, the descriptive statistics show many more Version 1 articles than Version 2+. How are the data subset among the different possible methods?

      Likewise, the methods indicate that a matching procedure was used comparing two reviewers for the same manuscript in order to control for potential confounds. However, the number of reviews is less than double the number of Version 1 articles, making it unclear which data were used in the final analysis. The methods also state that data were stratified by version. This raises a question about which articles/reviews were included in each of the analyses. I suggest spending more space describing how the data are subset and stratified. This should include any conditional subsetting as in the analysis on the 441 reviews where the reviewer was not cited in Version 1 but requested a citation for Version 2. Each of the figures and tables, as well as statistics provided in the text should provide this information, which would make this paper much more accessible to the reader. [Note from editor: Please see "Editorial feedback" for more on this]

      Finally, I would caution against imputing motivations to the reviewers, despite the important findings provided here. This is because the data as presented suggest a more nuanced interpretation is warranted. First, the author observes similar patterns of accept/reject decisions whether the suggested citation is a citation to the reviewer or not (Figs 3 and 4). Second, much of the observed reviewer behavior disappears or has much lower effect sizes depending on whether "Accept with Reservations" is considered an Accept or a Reject. This is acknowledged in the results text, but largely left out of the discussion. The conditional analysis on the 441 reviews mentioned above does support a more cautious version of the conclusion drawn here, especially when considered alongside the specific comments left by reviewers that were mentioned in the results and information in Table S.3. However, I recommend toning the language down to match the strength of the data.

    5. Reviewer #4 (Public Review):

      Summary:

      This work investigates whether a citation to a referee made by a paper is associated with a more positive evaluation by that referee for that paper. It provides evidence supporting this hypothesis. The work also investigates the role of self citations by referees where the referee would ask authors to cite the referee's paper.

      Strengths:

      This is an important problem: referees for scientific papers must provide their impartial opinions rooted in core scientific principles. Any undue influence due to the role of citations breaks this requirement. This work studies the possible presence and extent of this.

      Barring a few issues discussed below, the methods are solid and well done. The work uses a matched pair design which controls for article-level confounding and further investigates robustness to other potential confounds.

      It is surprising that even in these investigated journals where referee names are public, there is prevalence of such citation-related behaviors.

      Weaknesses:

      Some overall claims are questionable:

      "Reviewers who were cited were more likely to approve the article, but only after version 1" It also appears that referees who were cited were less likely to approve the article in version 1. This null or slightly negative effect undermines the broad claim of citations swaying referees. The paper highlights only the positive results while not including the absence (and even reversal) of the effect in version 1 in its narrative.

      "To the best of our knowledge, this is the first analysis to use a matched design when examining reviewer citations" Does not appear to be a valid claim based on the literature reference [18]

      It will be useful to have a control group in the analysis associated to Figure 5 where the control group comprises matched reviews that did not ask for a self citation. This will help demarcate words associated with approval under self citation (as compared to when there is no self citation). The current narrative appears to suggest an association of the use of these words with self citations but without any control.

      More discussion on the recommendations will help: For the suggestion that "the reviewers initially see a version of the article with all references blinded and no reference list" the paper says "this involves more administrative work and demands more from peer reviewers". I am afraid this can also degrade the quality of peer review, given that the research cannot be contextualized properly by referees. Referees may not revert back to all their thoughts and evaluations when references are released afterwards.

    6. Author response:

      There was a common theme across the reviews to provide a more cautious interpretation and to consider the key question of whether peer reviewers who include citations are being purely self-serving or are highlighting important missing context. I will include a suggested new text analysis to cover this and will expand the discussion on this key question. Reviewers highlighted some confusion around the sample sizes for the different analyses, and I will clarify all sample sizes in the next version.

    1. eLife Assessment

      This is a useful study, describing transcriptome-based PPGL subtypes and exploring the mutations, immune correlates, and disease progression of cases in each subtype. The cohort is a reasonable size, and a second cohort is included from TCGA. The identification of driver mutations in PPGL is incomplete, and this compromises characterisation for prognostic purposes. This is a reasonable starting point from which to further elucidate PPGL subtypes.

    2. Reviewer #1 (Public review):

      This study presents an exploration of PPGL tumour bulk transcriptomics and identifies three clusters of samples (labeled as subtypes C1-C3). Each subtype is then investigated for the presence of somatic mutations, metabolism-associated pathways and inflammation correlates, and disease progression.

      The proposed subtype descriptions are presented as an exploratory study. The proposed potential biomarkers from this subtype are suitably caveated, and will require further validation in PPGL cohorts together with a mechanistic study.

      The first section uses WGCNA (a method to identify clusters of samples based on gene expression correlations) to discover three transcriptome-based clusters of PPGL tumours.

      The second section inspects a previously published snRNAseq dataset, and labels some of the published cells as subtypes C1, C2, C3 (Methods could be clarified here), among other cells labelled as immune cell types. Further details about how the previously reported single-nuclei were assigned to the newly described subtypes C1-C3 require clarification.

      The tumour samples are obtained from multiple locations in the body (Figure 1A). It will be important to see further investigation of how the sample origin is distributed among the C1-C3 clusters, and whether there is a sample-origin association with mutational drivers and disease progression.

    3. Reviewer #2 (Public review):

      Summary:

      A study that furthers the molecular definition of PPGL (where prognosis is variable) and provides a wide range of sub-experiments to back up the findings. One of the key premises of the study is that identification of driver mutations in PPGL is incomplete and that compromises characterisation for prognostic purposes. This is a reasonable starting point on which to base some characterisation based on different methods.

      Strengths:

      The cohort is a reasonable size, and a useful validation cohort in the form of TCGA is used. Whilst it would be resource-intensive (though plausible given the rarity of the tumour type) to perform RNAseq on all PPGL samples in clinical practice, some potential proxies are proposed.

      Weaknesses:

      The performance of some of the proxy markers for transcriptional subtype is not presented.

      There is limited prognostic information available.

    4. Author response:

      Reviewer #1 (Public Review):

      This study presents an exploration of PPGL tumour bulk transcriptomics and identifies three clusters of samples (labeled as subtypes C1-C3). Each subtype is then investigated for the presence of somatic mutations, metabolism-associated pathways and inflammation correlates, and disease progression. The proposed subtype descriptions are presented as an exploratory study. The proposed potential biomarkers from this subtype are suitably caveated and will require further validation in PPGL cohorts together with a mechanistic study.

      The first section uses WGCNA (a method to identify clusters of samples based on gene expression correlations) to discover three transcriptome-based clusters of PPGL tumours. The second section inspects a previously published snRNAseq dataset, and labels some of the published cells as subtypes C1, C2, C3 (Methods could be clarified here), among other cells labelled as immune cell types. Further details about how the previously reported single-nuclei were assigned to the newly described subtypes C1-C3 require clarification.

      Thank you for your valuable suggestion. In response to the reviewer’s request for further clarification on “how previously published single-nuclei data were assigned to the newly defined C1-C3 subtypes,” we have provided additional methodological details in the revised manuscript (lines 103-109). Specifically, we aggregated the single-nucleus RNA-seq data to the sample level by summing gene counts across nuclei to generate pseudo-bulk expression profiles. These profiles were then normalized for library size, log-transformed (log1p), and z-scaled across samples. Using genesets scores derived from our earlier WGCNA analysis of PPGLs, we defined transcriptional subtypes within the Magnus cohort (Supplementary Figure. 1C). We further analyzed the single-nucleus data by classifying malignant (chromaffin) nuclei as C1, C2, or C3 based on their subtype scores, while non-malignant nuclei (including immune, stromal, endothelial, and others) were annotated using canonical cell-type markers (Figure. 4A).

      The tumour samples are obtained from multiple locations in the body (Figure 1A). It will be important to see further investigation of how the sample origin is distributed among the C1-C3 clusters, and whether there is a sample-origin association with mutational drivers and disease progression.

      Thank you for your valuable suggestion. In the revised manuscript (lines 74-79), Figure. 1A, Table S1 and Supplementary Figure. 1A, we harmonized anatomic site annotations from our PPGL cohort and the TCGA cohort and analyzed the distribution of tumor origin (adrenal vs extra-adrenal) across subtypes. The site composition is essentially uniform across C1-C3—approximately 75% pheochromocytoma (PC) and 25% paraganglioma (PG)—with only minimal variation. Notably, the proportion of extra-adrenal origin (paraganglioma origin) is slightly higher in the C1 subtype (see Supplementary Figure 1A), which aligns with the biological characteristics of tumors from this anatomical site, which typically exhibit more aggressive behavior.

      Reviewer #2 (Public Review):

      A study that furthers the molecular definition of PPGL (where prognosis is variable) and provides a wide range of sub-experiments to back up the findings. One of the key premises of the study is that identification of driver mutations in PPGL is incomplete and that compromises characterisation for prognostic purposes. This is a reasonable starting point on which to base some characterisation based on different methods. The cohort is a reasonable size, and a useful validation cohort in the form of TCGA is used. Whilst it would be resource-intensive (though plausible given the rarity of the tumour type) to perform RNA-seq on all PPGL samples in clinical practice, some potential proxies are proposed.

      We sincerely thank the reviewer for their positive assessment of our study’s rationale. We fully agree that RNA sequencing for all PPGL samples remains resource-intensive in current clinical practice, and its widespread application still faces feasibility challenges. It is precisely for this reason that, after defining transcriptional subtypes, we further focused on identifying and validating practical molecular markers and exploring their detectability at the protein level.

      In this study, we validated key markers such as ANGPT2, PCSK1N, and GPX3 using immunohistochemistry (IHC), demonstrating their ability to effectively distinguish among molecular subtypes (see Figure. 5). This provides a potential tool for the clinical translation of transcriptional subtyping, similar to the transcription factor-based subtyping in small cell lung cancer where IHC enables low-cost and rapid molecular classification.

      It should be noted that the subtyping performance of these markers has so far been preliminarily validated only in our internal cohort of 87 PPGL samples. We agree with the reviewer that larger-scale, multi-center prospective studies are needed in the future to further establish the reliability and prognostic value of these markers in clinical practice.

      The performance of some of the proxy markers for transcriptional subtype is not presented.

      We agree with your comment regarding the need to further evaluate the performance of proxy markers for transcriptional subtyping. In our study, we have in fact taken this point into full consideration. To translate the transcriptional subtypes into a clinically applicable classification tool, we employed a linear regression model to compare the effect values (β values) of candidate marker genes across subtypes (Supplementary Figure. 1D-F). Genes with the most significant β values and statistical differences were selected as representative markers for each subtype.

      Ultimately, we identified ANGPT2, PCSK1N, and GPX3—each significantly overexpressed in subtypes C1, C2, and C3, respectively, and exhibiting the most pronounced β values—as robust marker genes for these subtypes (Figure. 5A and Supplementary Figure. 1D-F). These results support the utility of these markers in subtype classification and have been thoroughly validated in our analysis. 

      There is limited prognostic information available.

      Thank you for your valuable suggestion. In this exploratory revision, we present the available prognostic signal in Figure. 5C. Given the current event numbers and follow-up time, we intentionally limited inference. We are continuing longitudinal follow-up of the PPGL cohort and will periodically update and report mature time-to-event analyses in subsequent work.

    1. eLife Assessment

      This valuable study presents an analysis of evolutionary conservation in intrinsically disordered regions, identified as key drivers of phase separation, leveraging a protein language model. The strength of evidence presented is convincing overall, though the theoretical grounding could benefit from further development.

    2. Reviewer #1 (Public review):

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. This is an interesting study that supports, complements and extends previous related analyses on the conservation and mutational tolerance of disordered regions, with a particular focus on disordered regions in proteins that are found in condensates.

    3. Reviewer #2 (Public review):

      This manuscript uses the ESM2 language model to map the evolutionary fitness landscape of intrinsically disordered regions (IDRs). The central idea is that mutational preferences predicted by these models could be useful in understanding eventual IDR-related behavior, such as disruption of otherwise stable phases. While ESM2-type models have been applied to analyze such mutational effects in folded proteins, they have not been used or verified for studying IDRs. Here, the authors use ESM2 to study membraneless organelle formation and the related fitness landscape of IDRs.

      Through this, their key finding in this work is the identification of a subset of amino acids that exhibit mutation resistance. Their findings reveal a strong correlation between ESM2 scores and conservation scores, which if true, could be useful for understanding IDRs in general. Through their ESM2-based calculations, the authors conclude that IDRs crucial for phase separation frequently contain conserved sequence motifs composed of both so-called sticker and spacer residues. The authors note that many such motifs have been experimentally validated as essential for phase separation.

      Comments on revisions:

      Unfortunately my concerns about lack of theoretical grounding and validation (especially critical in lack of theoretical grounding) persist. The argument about correlation between ESM2 scores and MSA conservation is circular. Protein language models already encode residue‑level conservation, so agreement with conservation does not establish new predictive power. For IDRs, conservation is a poor surrogate for function because many functions are mediated by short, degenerate SLiMs that are frequently gained and lost. Sequence‑only predictions therefore need orthogonal (preferably experimental or at the least in silico) tests. Finally, without a family‑level holdout (e.g., cluster de‑duplication at low identity) and prospective tests, overlap with known motifs cannot rule out training‑data memorization/near‑duplicates.

    4. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. While the paper is relatively easy to read overall, my main comment is that the authors could perhaps make it clearer which observations are new, and which support previous work using related approaches. Further, while the link to phase separation is interesting, it is not completely clear which data supports the statements made, and this could also be made clearer.

      We thank the reviewer for their thoughtful evaluation of our manuscript and for the supportive comments. As outlined in the responses below, we have made substantial revisions to clarify the novel observations presented in our study and to strengthen the connection between sequence conservation and phase separation.

      Comment 1: With respect to putting the work in a better context of what has previously been done before, this is not to say that there is not new information in it, but what the authors do is somewhat closely related to work by others. I think it would be useful to make those links more directly.

      We have addressed the specific comments as outlined below.

      Comment 1a: Alderson et al (reference 71) analysed in detail the conservation of IDRs (via pLDDT, which is itself related to conservation) to show, for example, that conserved residues fold upon binding. This analysis is very similar to the analysis used in the current study (using ESM2 as a different measure of conservation). Thus, the result that "Given that low ESM2 scores generally reflect mutational constraint in folded proteins, the presence of region a among disordered residues suggests that certain disordered amino acids are evolutionarily conserved and likely functionally significant" is in some ways very similar to the results of that (Alderson et al) paper .

      We thank the reviewer for the comment. However, we would like to clarify that our findings show subtle but important differences from those reported by Alderson et al. Specifically, Alderson et al. used AlphaFold2 predictions to identify IDRs that undergo disorder-to-order transitions, which the authors termed as conditionally folded IDRs. These regions could potentially be functionally important, assuming that function of IDRs necessitate folding.

      We argue, however, that, the validity of this structure-function relationship for IDRs remains to be tested. In our opinion, The most direct way to evaluate the functional significance is via evaluating the evolutionary conservation.

      As shown in Author response image 1, the correlation between pLDDT scores and the conservation score, while noticable, is significantly weaker than that between the ESM2 score and the conservation score.

      Author response image 1.

      Comparison of the correlation between AlphaFold2 pLDDT scores and conservation scores with the correlation between ESM2 scores and conservation scores. Calculations were performed using proteins in the MLO-hProt dataset. (A) Correlation between the mean AlphaFold2 pLDDT scores and conservation scores for various amino acids. Pearson correlation coefficients (r) are indicated in the figure legends. The four panels on the right present analogous correlation plots for amino acids grouped by structural order, as defined by their pLDDT scores. (B) Similar as in part A but for ESM2 scores.

      Therefore, we believe that ESM2 score is a better indicator than AlphaFold2 pLDDT score for functional relevance.

      Furthermore, for the human IDRs, we explicitly selected amino acids with pLDDT scores ≤ 70.

      These would be classified as structureless, disordered amino acids, according to the study by Alderson et al. Nevertheless, as shown in Figures 2 and 3 of the main text, our analyses still identifies conserved regions. Therefore, these regions may function via distinct mechanisms than the disorder to order transition.

      We now discuss the novelty of our work in the context of existing studies in the newly added Conclusions and Discussion: Related Work, as quoted below.

      “Numerous studies have sought to identify functionally relevant amino acid groups within IDRs [cite]. For instance, using multiple sequence alignment, several groups have identified evolutionarily conserved residues that contribute to phase separation [cite]. Alderson et al. employed AlphaFold2 to detect disordered regions with a propensity to adopt structured conformations, suggesting potential functional relevance [alderson et al].

      In contrast, our approach based on ESM2 is more direct: it identifies conserved residues without relying on alignment or presupposing that functional significance requires folding into stable 3D structures. Notably, many of the conserved residues identified in our analysis exhibit low pLDDT scores (Figure 2), implying potential functional roles independent of stable conformations.”

      Comment 1b: Dasmeh et al, Lu et al and Ho & Huang analysed conservation in IDRs, including aromatic residues and their role in phase separation.

      We thank the reviewer for bringing these works to our attention! We now explicitly discuss these studies in both the Discussion section as mentioned above and in the Introduction as quoted below.

      “Evolutionary analysis of IDRs is challenging due to difficulties in sequence alignment [cite], though several studies have attempted alignment of disordered proteins with promising results [Dasmeh et al, Lu et al and Ho & Huang].”

      Comment 1c: A number of groups have performed proteomewide saturation scans using pLMs, including variants of the ESM family, including Meier (reference 89, but cited about something else) and Cagiada et al (https://doi.org/10.1101/2024.05.21.595203) that analysed variant effects in IDRs using a pLM. Thus, I think statements such as "their applicability to studying the fitness and evolutionary pressures on IDRs has yet to be established" should possibly be qualified.

      We added a new paragraph in the Introduction to discuss the application of protein language models to IDRs and cited the suggested references.

      “While protein language models have been widely applied to structured proteins [cite], it is important to emphasize that these models themselves are not inherently biased toward folded domains. For example, the Evolutionary Scale Model (ESM2) [cite] is trained as a probabilistic language model on raw protein sequences, without incorporating any structural or functional annotations. Its unsupervised learning paradigm enables ESM2 to capture statistical patterns of residue usage and evolutionary constraints without relying on explicit structural information. Thus, the success of ESM2 in modeling the mutational landscapes of folded proteins [cite] reflects the model’s ability to learn sequence-level constraints imposed by natural selection — a property that is equally applicable to IDRs if those regions are also under functional selection. Indeed, protein language models are increasingly been used to analyze variant effects in IDRs [cite].”

      Comment 2: On page 4, the authors write, "The conserved residues are primarily located in regions associated with phase separation." These results are presented as a central part of the work, but it is not completely clear what the evidence is.

      We thank the reviewer this insightful comment. We realized that our wording is not as precise as we should have been. We meant to state that the regions associated with phase separation are significantly enriched in these conserved residues. This is a significant finding and indicates that phase separation could be a source of evolutionary pressure in dictating IDP sequence conservation. However, we do not intend to suggest that phase separation is the only evolutionary pressure.

      The sentence has been revised to

      “Notably, regions associated with phase separation are significantly enriched in these conserved residues.”

      We further replaced the section title "Conserved, Disordered Residues Localize in Regions Driving Phase Separation" with "Regions Driving Phase Separation Are Enriched with Conserved, Disordered Residues" to further clarify our findings and avoid overinterpretation.

      Finally, we revised the following sentence in the discussion

      “Notably, these conserved, disordered residues are predominantly located in regions actively involved in phase separation, contributing to the formation of membraneless organelles.”

      to

      “Notably, regions actively involved in phase separation are enriched with these conserved, disordered residues, supporting their potential role in the formation of membraneless organelles.”

      The submitted manuscript provides clear evidence supporting the enrichment of conserved residues in MLO-driving IDRs. Specifically, Figures 4A and 4C demonstrate that these IDRs exhibit a substantially higher fraction of conserved residues compared to other IDRs involved in phase separation.

      In this analysis, the nMLO-hIDR group serves as a baseline, representing the distribution of conservation in disordered regions lacking MLO-related functions. In contrast, IDRs from MLOassociated groups show a pronounced lower shift in their median and interquartile ranges, indicating stronger evolutionary constraints. Within the dMLO cohort, the degree of conservation follows a distinct gradient: driving residues exhibit the highest levels of conservation, followed by participant residues, with non-participant residues showing values closer to the nMLO baseline. This pattern reflects the relative functional importance of each group in phase separation, with conservation levels corresponding to their roles in MLO scaffolding.

      To further support this, we computed, for each IDR, the fraction of conserved amino acids. As shown in Figure S11B, for IDRs that actively contribute to phase separation, the fraction is indeed higher than those not involved in phase separation. This analysis is now included in SI.

      During the revision, we explicitly evaluated whether conserved residues are preferentially located in regions associated with phase separation. To this end, for each protein in the MLO-hProt dataset, we calculated the probability p of finding conserved residues within regions contributing to phase separation. These regions include both "driving" and "participating" segments as defined in Figure 4 of the main text.

      Figure S11A presents the distribution of p across all proteins. For comparison, we also include the distribution of 1− p, representing the probability of finding conserved residues in regions not associated with phase separation. On average, p exceeds 0.5, suggesting a tendency for conserved residues to be more frequently located in phase-separating regions. However, the difference between the two distributions is not statistically significant. This result may be due to the generally low density of conserved residues in IDRs, which makes the estimation of p challenging for individual proteins. Additionally, some conserved sites may be involved in functions unrelated to phase separation.

      We added the following text to the Discussion section of the main text.

      “We emphasize that the results presented in Figure 4 do not directly demonstrate that conserved residues are preferentially located in regions associated with phase separation. Although these regions are more enriched in conserved amino acids, their total sequence length can be smaller than that of non-phase-separating regions. As a result, the absolute number of conserved residues may still be higher outside phase-separating regions. To quantitatively assess this, we calculated, for each protein in the MLO-hProt dataset, the probability p of finding conserved residues within regions contributing to phase separation. These regions include both "driving" and "participating" segments, as defined in Figure 4 of the main text. Figure S11 shows the distribution of p across all proteins. For comparison, we also present the distribution of 1− p, which reflects the probability of finding conserved residues in non-phase-separating regions. While the average value of p exceeds 0.5, indicating a trend toward conserved residues being more frequently located in phase-separating regions, the difference between the two distributions is not statistically significant. Future studies with expanded datasets may be necessary to clarify this trend.”

      Comment 3: It would be useful with an assessment of what controls the authors used to assess whether there are folded domains within their set of IDRs.

      We acknowledge that our previous labeling may have caused some confusion. Protein sequences used in Figures 2 and 3 include both folded and disordered domains. Results presented in these figures were constructed using full-length protein sequences to highlight the similarities and differences in ESM2 scores between folded and disordered domains.

      In contrast, the analyses presented in Figures 4 and 5 focus exclusively on IDRs to examine their role in phase separation.

      To prevent further confusion, we have renamed the dataset used in Figures 2 and 3 as MLO-hProt, emphasizing that the analysis pertains to entire protein sequences. The term MLO-hIDR is now reserved for a new dataset that includes only disordered residues, as used in Figures 4 and 5, and corresponding SI Figures.

      For the dMLO-IDR dataset, all except one amino acid (P40967, residue G592) are annotated as disordered in the MobiDB database (https://mobidb.org/). This database characterizes disordered regions based on a combination of predictive algorithms and experimental data. As illustrated in Figure S5A, 25.5% of the proteins in the dataset have direct experimental evidence supporting their disorderedness. These experimental annotations are derived from a diverse range of techniques (Figure S5B). For the remaining proteins, disorder was predicted by one or more computational tools. Although not all tools were applied to every protein, each protein in the dataset was identified as disordered by at least one method.

      For human proteins, IDRs were identified based on AlphaFold2 pLDDT scores, using a threshold of 70. As established in prior studies [1, 2], the pLDDT score provides a quantitative measure of local structural confidence, with lower values indicating greater structural disorder. IDRs associated with conditional folding or disorder-to-order transitions generally exhibit high pLDDT values (e.g., >70).

      Author response image 2 shows a violin plot of AlphaFold2 pLDDT scores for the various MLO-hIDR groups. The consistently low scores support the conclusion that these regions are structurally disordered.

      We also cross-checked the MLO-hIDR regions against the MobiDB database. As shown in Figure S6, approximately 76% of the proteins in the dataset are predicted to contain disordered regions. Among the non-labeled segments with pLDDT scores ≤ 70, the majority are relatively short, with segments of 1–5 residues accounting for approximately 80%.

      Author response image 2.

      AlphaFold pLDDT scores of hIDRs in different MLO-related groups.

      In addition to renaming the dataset, we also revised the manuscript to highlight the validation of disorderedness in section of Results: Regions Driving Phase Separation Are Enriched with Conserved, Disordered Residues.

      “The presence of evolutionarily conserved disordered residues raises the question of their functional significance. To explore this, we identified disordered regions of MLO-hProt using a pLDDT score less than 70 and partitioned these regions into two categories: drivers (dMLO-hIDR), which actively drive phase separation, and clients (cMLO-hIDR), which are present in MLOs under certain conditions but do not promote phase separation themselves [cite]. Additionally, IDRs from human proteins not associated with MLOs, termed nMLO-hIDR, were included as a control. To enhance statistical robustness, we extended our dataset by incorporating driver proteins from additional species [cite], resulting in the expanded dMLO-IDR dataset. Beyond the pLDDT-based classification, the majority of residues in these datasets are also predicted to be disordered by various computational tools and supported by experimental evidence (Figures S5 and S6).”

      Recommendation 1: The authors use the terms "evolutionary fitness of IDRs" (abstract and p. 5, for example), "fitness of amino acids" (p. 4), and "quantify the fitness of particular residues at specific sites" (p. 5). It is not clear what is meant by fitness in this context.

      We thank the reviewer for pointing out the ambiguity in the term fitness. To enhance clarity, we have replaced “fitness" with “mutational tolerance" to more directly emphasize the evolutionary conservation of specific residues.

      Recommendation 2: The authors write (p. 6) "Previous studies have demonstrated a strong correlation between ESM2 scores and changes in free energy related to protein structure stability". While that may be true, it might be worth noting that ESM2 scores report on the effects of mutations and function more broadly than stability because these models have previously been shown to capture conservation effects beyond stability.

      We fully agree with the reviewer’s comment and have revised the main text accordingly. Specifically, the referenced sentence has been revised and relocated, as shown below.

      “Our analysis demonstrated that HP1_α_’s structured domains consistently yield low ESM2 scores, reflecting strong mutational constraints characteristic of folded regions. These constraints are further evident in the local LLR predictions, as shown in Figure 2B, where we illustrate the folded region G120-T130. Given the functional importance of preserving the 3D of structured domains, mutations with greater detrimental effects are likely to disrupt protein folding substantially. This interpretation is consistent with previous studies reporting a significant correlation between ESM2 LLRs and changes in free energy associated with protein structural stability [cite].”

      Recommendation 3: p. 10: The authors write "To exclude sequences that no longer qualify as homologs, we filtered for sequences with at least 20% identity to the reference". How did they decide on 20% and why? And over which residues are these 20% calculated.

      We apologize for the earlier lack of clarity. Sequence alignment was performed using the full-length protein sequences, encompassing both folded and disordered regions. For each sequence, we calculated the percent identity by counting the number of positions, denoted as n, at which the amino acid matches the reference. The percent identity was then computed as n/N, where N represents the total length of the aligned reference sequence. This total includes residues in folded and disordered regions, as well as gap positions introduced during alignment.

      We updated the Methods section of the main text to clarify.

      “We performed multi-sequence alignment (MSA) analysis using HHblits from the HH-suite3 software suite [citations], a widely used open-source toolkit known for its sensitivity in detecting sequence similarities and identifying protein folds. HHblits builds MSAs through iterative database searches, sequentially incorporating matched sequences into the query MSA with each iteration. Sequence alignment was performed using the full-length protein sequences, encompassing both folded and disordered regions.

      ...

      To refine alignment quality by focusing on closely related homologs, we filtered out sequences with ≤ 20% identity to the query, excluding weakly related sequences where only short segments show similarity to the reference. For each sequence, we calculated the percent identity by counting the number of positions, denoted as n, at which the amino acid matches the reference. The percent identity was then computed as n/N, where N represents the total length of the aligned reference sequence. This total includes residues in folded and disordered regions, as well as gap positions introduced during alignment.”

      We selected a 20% sequence identity threshold to balance inclusion of true homologs with exclusion of distant matches that may not share functional relevance. To determine this cutoff, we compared identity thresholds of 0%, 10%, 20%, and 40% and examined the resulting distributions of conservation and ESM2 scores across aligned residues for MLO-hProt dataset (Author response image 3). Thresholds of 10%, 20%, and 40% produced qualitatively similar results, with a consistent correspondence between low ESM2 scores and high conservation. Lower thresholds introduced highly divergent sequences that added noise to the alignment, resulting in reduced overall conservation scores. In contrast, higher thresholds excluded homologs with potentially meaningful conservation, particularly in disordered regions where conservation scores tend to be relatively low.

      Author response image 3.

      Histograms of the ESM2 score and the conservation score, presented in a format consistent with Figure 3B of the main text. The conservation scores were computed using aligned sequences with identity thresholds of ≥0, ≥10%, ≥20%, and ≥40% (left to right). Contour lines represent different levels of −log_P_(CS,ESM2), where P is the joint probability density of conservation score (CS) and ESM2 score. Contours are spaced at 0.5-unit intervals, highlighting regions of distinct density.

      Recommendation 4: In their description of "motif" searching algorithm (p. 20) I think that the search algorithm would give a different result whether the search is performed N->C or C->N (because the first residue (i) needs to have a score <0.5 but the last (j) could have a score >0.5 as long as the average is below 0.5. Is that correct? And if so, why did they choose an asymmetric algorithm? .

      We thank the reviewer for highlighting the asymmetry in our motif-search algorithm.

      To investigate this issue, we repeated the algorithm starting from the C-terminus and compared the resulting motifs with those obtained from the N-terminal scan. We found that the two sets of motifs overlap entirely: each motif identified from the C-terminal direction has a corresponding counterpart from the N-terminal scan. However, the motifs are not identical. The directionality of the search introduces additional amino acids—referred to here as peripheral residues—at the motif boundaries, which differ between the two sets.

      As shown in Author response image 4, the number of peripheral residues is small relative to the total motif length.

      To eliminate asymmetry and ambiguity, we have revised our method to perform bidirectional scans—from both the N- and C-termini—and define each motif as the overlapping region identified by both directions. This approach emphasizes the conserved core and avoids the inclusion of spurious terminal residues. The updated procedure is described in Methods: Motif Identification.

      “To identify motifs within a given IDR, we implemented the following iterative procedure. Starting from either the N– or C–terminus of the sequence, we first locate the initial residue i whose ESM2 score falls within 0.5. From i, residues are sequentially appended…”

      Author response image 4.

      Number of peripheral residues and their relative length to the full-motif length identified from both sides. (A). The unique motifs identified from N-to-C terminal direction. (B) The unique motifs identified from C-to-N terminal direction.

      “…in the direction toward the opposite terminus until the segment’s average ESM2 score exceeds 0.5; the first residue to breach this threshold is denoted j. The segment (i,i+1,..., j−1) is then recorded as a candidate motif. This process repeats starting from j until the end of the IDR is reached.

      We perform this full procedure independently from both termini and designate the final motif as the intersection of the two candidate-motif sets. This bidirectional overlap strategy excludes terminal residues that might transiently satisfy the average-score criterion only due to adjacent low-scoring regions, thereby isolating the conserved core of each motif. All other residues—those not included in either directional pass—are classified as non-motif regions, minimizing peripheral artifacts.”

      Accordingly, we have updated the Supplementary material: ESM2_motif_with_exp_ref.csv for the new identified motifs commonly exited from both N-terminal and C-terminal searches. Minor changes were observed in the set of motifs as being discussed, but these do not affect the main conclusions. Figures 5C, 5D, and S6 have been revised accordingly.

      Reviewer #2:

      Summary:

      Unfortunately, I do not believe that the results can be trusted. ESM2 has not been validated for IDRs through experiments. The authors themselves point out its little use in that context. In this study, they do not provide any further rationale for why this situation might have changed. Furthermore, they mention that experimental perturbations of the predicted motifs in in vivo studies may further elucidate their functional importance, but none of that is done here. That some of the motifs have been previously validated does not give any credibility to the use of ESM2 here, given that such systems were probably seen during the training of the model.

      We thank the reviewer for their detailed and thoughtful critique of our manuscript. We recognize the importance of careful model validation, especially in the context of IDRs, and appreciate the opportunity to clarify the scope and rationale of our study. Below, we respond point-by-point to the main concerns.

      (1) The use of ESM2 is not validated for IDRs, and the authors provide no rationale for its applicability in this context.

      We thank the reviewer for raising this important point.

      First, we emphasize that ESM2 is a probabilistic language model trained entirely on amino acid sequences, without any structural supervision. The model does not receive any input about protein structure — folded or disordered — during training. Instead, it learns to estimate the likelihood of each amino acid at a given position, conditioned on the surrounding sequence context. This makes ESM2 agnostic to whether a sequence is folded or disordered; the model’s capacity to identify patterns of residue usage arises solely from the statistics of natural sequences.

      As such, ESM2 is not inherently biased toward folded proteins, even though previous studies have demonstrated its usefulness in identifying conserved and functionally constrained residues in structured domains [3–9]. These findings support the broader utility of language models for uncovering evolutionary constraints — and by extension, suggest that similar signatures could exist in IDRs, particularly if they are under functional selection.

      Indeed, if certain residues or motifs in IDRs are conserved due to their importance in biological processes (e.g., phase separation), we would expect such selection to be reflected in sequence-based features, which ESM2 is designed to detect. The model’s applicability to IDRs, then, is a natural extension of its core probabilistic architecture.

      To further evaluate this, we carried out an independent in silico validation using multiple sequence alignments (MSAs). This analysis allowed us to compute the evolutionary conservation of individual amino acids without any reliance on ESM2. We then compared these conservation scores to ESM2 scores and found a strong correlation between the two. This provides direct, quantitative support for the idea that ESM2 is capturing biologically meaningful sequence constraints — even in disordered regions.

      While we agree that experimental testing would ultimately provide the most compelling validation, we believe that our MSA-based comparison constitutes a strong and arguably ideal computational validation of the model’s predictions. It offers an orthogonal measure of evolutionary pressure that confirms the biological plausibility of ESM2 scores.

      We added the following text in the introduction to highlight the applicability of ESM2 to IDRs.

      “While protein language models have been widely applied to structured proteins, it is important to emphasize that these models themselves are not inherently biased toward folded domains. For example, the Evolutionary Scale Model (ESM2) [cite] is trained as a probabilistic language model on raw protein sequences, without incorporating any structural or functional annotations. It operates by estimating the likelihood of observing a given amino acid at a particular position, conditioned on the entire surrounding sequence context. This unsupervised learning paradigm enables ESM2 to capture statistical patterns of residue usage and evolutionary constraints without relying on explicit structural information. Thus, the success of ESM2 in modeling fitness landscapes of folded proteins reflects the model’s ability to learn sequence-level constraints imposed by natural selection — a property that is equally applicable to IDRs if those regions are also under functional selection. Indeed, protein language models are increasingly been used to analyze variant effects in IDRs [cite].”

      (2) There is no experimental validation of the ESM2-based predictions in this study.

      We agree that experimental validation would provide definitive support for the utility of ESM2 in IDRs, and we explicitly state this as a limitation in the revised manuscript as quoted below.

      “Limitations: Despite the promising findings, our study has several limitations. Most notably, our analysis is purely computational, relying on ESM2-derived predictions and sequence-based conservation without accompanying experimental validation. While the strong correlation between ESM2 scores and evolutionary conservation provides compelling evidence that the identified motifs are functionally constrained, the precise biological roles of these motifs remain uncharacterized. ESM2 is well-suited for highlighting regions under selective pressure, but it does not provide mechanistic insights into how conserved motifs contribute to specific molecular functions such as phase separation, molecular recognition, or dynamic regulation. Determining these roles will require targeted experimental investigations, including mutagenesis and biophysical characterization.”

      In addition, we revised the manuscript title from “Protein Language Model Identifies Disordered, Conserved Motifs Driving Phase Separation" to “Protein Language Model Identifies Disordered, Conserved Motifs Implicated in Phase Separation". This revision softens the original claim to better reflect the absence of direct experimental evidence for the motifs’ role in phase separation.

      However, we also emphasize that the goal of our study is not to claim definitive predictive power, but rather to explore whether ESM2-derived mutational profiles align with known biological features of IDRs — and in doing so, to generate new, testable hypotheses.

      In addition, while no in vivo experiments were performed, our study does include an in silico validation step, as detailed in the response to the previous comment. The strong correlation between ESM2 scores and conservation scores provides direct support for the utility of ESM2 in identifying residues under evolutionary constraint in disordered regions.

      (3) The overlap between predicted motifs and known ones may be due totraining data leakage.

      We respectfully clarify that training data leakage is not possible in this case, as ESM2 is trained using unsupervised learning on raw protein sequences alone. The model has no access to experimental annotations, functional labels, or knowledge of which motifs are involved in phase separation. It only models statistical sequence patterns derived from evolutionarily observed proteins.

      Therefore, any agreement between ESM2-derived predictions and previously validated motifs arises not from memorization of experimental data, but from the model’s ability to learn meaningful sequence constraints from the natural distribution of proteins.

      (4) The authors should revamp the study with a testable predictive framework.

      We respectfully suggest that a full revamp is not necessary or appropriate in this context.

      As outlined in our previous responses, we believe that certain misunderstandings about the nature and capabilities of ESM2 may have influenced the reviewer’s assessment.

      Importantly, both Reviewer 1 and Reviewer 3 express strong support for the significance and novelty of this work, and recommend publication following minor revisions.

      In this context, we believe the manuscript provides a useful contribution as a first step toward understanding disordered regions using language models, and that it has value even in the absence of direct experimental testing. We have now better positioned the manuscript in this light, clarified limitations, and suggested concrete next steps for follow-up research.

      We hope these clarifications and revisions address the reviewer’s concerns, and we thank them again for helping us strengthen the framing, rigor, and clarity of our study.

      Reviewer #3:

      Summary:

      This is a very nice and interesting paper to read about motif conservation in protein sequences and mainly in IDRs regions using the ESM2 language model. The topic of the paper is timely, with strong biological significance. The paper can be of great interest to the scientific community in the field of protein phase transitions and future applications using the ESM models. The ability of ESM2 to identify conserved motifs is crucial for disease prediction, as these regions may serve as potential drug targets. Therefore, I find these findings highly significant, and the authors strongly support them throughout the paper. The work motivates the scientific community towards further motif exploration related to diseases.

      Strengths:

      (1) Revealing conserved regions in IDRs by the ESM-2 language model.

      (2) Identification of functionally significant residues within protein sequences, especially in IDRs.

      (3) Findings supported by useful analyses.

      We appreciate the reviewer’s thoughtful words and support for our work.

      Weaknesses:

      (1) Lack of examples demonstrating the potential biological functions of these conserved regions.

      As detailed in the Response to Recommendation 6, we conducted additional analyses to connect the identified conserved regions with their biological functions.

      (2) Very limited discussion of potential future work and of limitations.

      We have substantially revised the Conclusions and Discussion section to provide a detailed analysis of the study’s limitations and to propose several directions for future research, as elaborated in our Response to Recommendation 5 below.

      Recommendation 1: The authors describe the ESM2 score such that lower scores are associated with conserved residues, stating that "lower scores indicate higher mutational constraint and reduced flexibility, implying that these residues are more likely essential for protein function, as they exhibit fewer permissible mutational states." However, when examining intrinsically disordered regions (IDRs), which are known to drive phase separation, I observe that the ESM2 score is relatively high (Figure 3C, pLDDT < 50, and Supplementary Figure S2). Could the authors clarify how this relatively high score aligns with the conservation of motifs that drive phase separation?

      We thank the reviewer for this insightful comment. We would like to clarify that most amino acids in the IDRs are not conserved, even for IDRs that contribute to phase separation. Only a small set of amino acids in these IDRs, which we term as motifs, are evolutionarily conserved with low ESM2 scores. Therefore, the ESM2 scores exhibit bimodal distribution at high and low values, as shown in Figures 4A and 4C of the manuscript. When averaged over all the amino acids, the mean ESM2 scores, plotted in Figure 3C, are relatively high due to dominant population of non-conserved amino acids.

      Recommendation 2: The authors mention: "We first analyzed the relationship between ESM2 and pLDDT scores for human Heterochromatin Protein 1 (HP1, residues 1-191)". I appreciate this example as a demonstration of amino acid conservation in IDRs. However, it is questionable whether the authors could provide some more examples to support amino acid conservation particularly within the IDRs along with lower ESM2 score (e.g, Could the authors provide some additional examples of "conserved disordered" regions in various proteins which are associated with relatively low ESM2 score as appear in Figure 2A).

      We thank the reviewer for this valuable suggestion. We want to kindly noted that the conserved residues on IDRs are prevalent as indicated in Figures 2D and 3B. To further illustrate the prevalence of “conserved disordered” regions, we generated ESM2 versus pLDDT score plots for the full dMLO–hProt dataset (82 proteins) in Figure S2. In these plots, residues with pLDDT ≤ 70 are highlighted in blue to denote structural disorder (dMLO-hIDR), and these disordered residues with ESM2 score ≤ 1.5 are shown in purple to indicate conserved disordered segments.

      Recommendation 3: Could the authors plot a Violin conservation score plot for Figure 4A to emphasise the relationship between ESM2 scores and conservation scores of disordered residues?

      We thank the reviewer for this suggestion. We included a violin plot illustrating the distribution of conservation scores for disordered residues across all four IDR groups, shown in Author response image 5. Consistent with the findings in Figure 4A, the phase separation drivers (dMLO-hIDR and dMLOIDR) exhibit a higher proportion of conserved amino acids compared to the client group (cMLOhIDR).

      We also note that the nMLO-hIDR group may contain conserved residues due to functions unrelated to MLO formation, which could contribute to the higher observed levels of conservation in this group.

      Author response image 5.

      Violin plots illustrating the distribution of conservation scores for disordered residues across the nMLO–hIDR, cMLO–hIDR, dMLO–hIDR, and dMLO–IDR datasets. Pairwise statistical comparisons were conducted using two-sided Mann–Whitney U tests on the conservation score distributions (null hypothesis: the two groups have equal medians). P-values indicate the probability of observing the observed rank differences under the null hypothesis. Statistical significance is denoted as follows: ***: p < 0.001; **: p < 0.01; *:p < 0.05;

      Recommendation 4: It will be appreciated if the authors could add to Figure 4 Violin plots, a statistical comparison between the different groups.

      We thank the reviewer for this valuable suggestion. We included the p-values for Figures 4A and 4C to quantify the statistical significance of differences in the distributions.

      Most comparisons are highly significant (p < 0.001), while the largest p-value (p = 0.089) between the conservation score of driving and non-participating groups (Figure 4C) still suggests a marginally significant trend.

      Recommendation 5: Could the authors expand more on potential future research directions using ESM2, given its usefulness in identifying conserved motifs? Specifically, how do the authors envision conserved motifs will contribute to future discoveries/applications/models using ESM (e.g, discuss the importance of conserved motifs, especially in IDRs motifs, in protein phase transition prediction in relation to diseases).

      We thank the reviewer for this insightful comment. To further assess the functional relevance of the conserved motifs, we incorporated pathogenic variant data from ClinVar [10, 11] to evaluate mutational impacts. As shown in Figure S12A and B, a substantial number of pathogenic variants in MLO-hProt proteins are associated with low ESM2 LLR values. This pattern holds for both folded and disordered residues.

      Moreover, we observed that variants located within motifs are more frequently pathogenic compared to those outside motifs (Figure S12C). In the main text, motifs were defined only for driver proteins; however, the available variant data for this subset are limited (6 data points). To improve statistical power, we extended motif identification to include both client and driver human proteins, following the same methodology described in the main text. Consistent with previous findings, variants within motifs in this expanded set are also more likely to be pathogenic. These results further support the functional importance of both low ESM2-scoring residues and the conserved motifs in which they reside.

      The following text was added in the Discussion section of the manuscript to discuss these results and outline future research directions.

      “Several promising directions could extend this work, both to refine our mechanistic understanding and to explore clinical relevance. One avenue is testing the hypothesis that conserved motifs in scaffold proteins act as functional stickers, mediating strong intermolecular interactions. This could be evaluated computationally via free energy calculations or experimentally via interaction assays. Deletion of such motifs in client proteins may also reduce their partitioning into condensates, illuminating their roles in molecular recruitment.

      To explore potential clinical implications, we analyzed pathogenicity data from Clin-Var [10, 11]. As shown in Figure S12A, single-point mutations with low LLR values—indicative of constrained residues—are enriched among clinically reported pathogenic variants, while benign variants typically exhibit higher LLR values. Moreover, mutations within conserved motifs are significantly more likely to be pathogenic than those in non-motif regions (Figure S12B). These findings highlight the potential of ESM2 as a first-pass screening tool for identifying clinically relevant residues and suggest that the conserved motifs described here may serve as priorities for future studies, both mechanistic and therapeutic.”

      Moreover, the functional significance of conserved motifs, particularly their implications in disease and pathology, warrants further investigation. As an initial analysis, we incorporated ClinVar pathogenic variant data [citation] to assess mutational effects within our datasets. As illustrated in Figure R12A, single-point mutations with low LLR values are enriched among clinically reported pathogenic variants, whereas benign variants are more commonly associated with higher LLR values. Notably, mutations within conserved motifs are substantially more likely to be pathogenic compared to those in non-motif regions. These findings highlight the potential of ESM2 as a firstpass tool for identifying residues of clinical relevance. The conserved motifs identified here may be prioritized in future studies aimed at elucidating their biological roles and evaluating their viability as therapeutic targets.

      Recommendation 6: The authors mention: "Our findings provide strong evidence for evolutionary pressures acting on specific IDRs to preserve their roles in scaffolding phase separation mechanisms, emphasizing the functional importance of entire motifs rather than individual residues in MLO formation." They also present a word cloud of functional motifs in Figure 5D. Although it makes sense that evolutionarily conserved motifs, especially within the IDRs regions, act as functional units, I think there is no direct evidence for such functionality (e.g., examples of biological pathways associated with IDRs and phase separation). Hence, there is no justification to write in the figure caption: "ESM2 Identifies Functional Motifs in driving IDRs" unless the authors provide some examples of such functionality. This will even make the paper stronger by establishing a clear connection to biological pathways, and hence these motifs can serve as potential drug targets.

      We thank the reviewer for this insightful suggestion. We have replaced “functional motifs" with “conserved motifs" in the figure caption.

      Identifying the precise biological pathways associated with the conserved motifs is a complex task and a comprehensive investigation lies beyond the scope of this study. Nonetheless, as an initial effort, we explored the potential functions of these motifs using annotations available in DisProt (https://disprot.org/).

      DisProt is the leading manually curated database dedicated to IDPs, providing both structural and functional annotations. Expert curators compile experimentally validated data, including definitions of disordered regions, associated functional terms, and supporting literature references. Author response image 6 presents a representative DisProt entry for DNA topoisomerase 1 (UniProt ID: P11387), illustrating its structural and biological annotation.

      For each motif, we located the corresponding DisProt entry and assigned a functional annotation based on the annotated IDR from which the motif originates. We emphasize that this functional assignment should be regarded as an approximation. Because experimental annotations often pertain to the entire IDR, regions outside the motif may also contribute to the reported function.

      Nevertheless, the annotations provide valuable insights.

      Author response image 6.

      Screenshot of information provided by the DisProt database. Detailed annotations of biological functions and structural features, along with experimental references, are accessible via mouse click.

      Approximately 50% of ESM2-predicted IDR motifs lack functional annotations. Among those that are annotated, motifs from the dMLO-IDR dataset are predominantly associated with “molecular condensate scaffold activity,” followed by various biomolecular binding functions (Author response image 7A). These findings support the role of these motifs in MLO formation.

      For comparison, we applied the same identification procedure (described in Methods: Motif Identification) to motifs from the nMLO-hIDR dataset. In contrast to the dMLO-IDR motifs, these exhibit a broader range of annotated functions related to diverse cellular processes. Collectively, these results suggest that motifs identified by ESM2 are aligned with biologically relevant functions captured in current databases.

      Finally, as illustrated in Figure S12 and discussed in the Response to Recommendation 5, variants occurring within identified motifs are more likely to be pathogenic than those in non-motif regions, further underscoring their functional importance.

      Author response image 7.

      Biological functions of ESM2-predicted motifs. (A) Distribution of biological functions associated with all identified motifs from dMLO-IDR driving groups. (B) Distribution of biological functions associated with all identified motifs from nMLO-hIDR groups.

      Recommendation 7: In Figure 2C the authors present FE (I assume this is free energy), some discussion about the difference in the free energy referring to the "a" region is missing (i.e. both "Folded" and "Disordered" regions are associated with low ESM score but with low and high free energy (FE), respectively.

      We thank the reviewer for the comments. FE indeed abbreviates free energy. To improve clarify and avoid confusion, we have updated all figure captions by replacing “FE” with “−logP” to explicitly denote the logarithm of probability in the contour density plots.

      We used “a" in Figures 2C and 2D to refer to regions with low ESM2 scores, which appears a local minimum in both plots. Since most residues in folded regions are conserved, region a has lower free energy than region b in Figure 2C. On the other hand, as most residues in disordered regions are not conserved, as we elaborated in Response to Recommendation 1, region a has lower population and higher free energy than region b.

      To avoid confusion, we have replaced “a" and “b" in Figure 2D with “I" and “II".

      Recommendation 8: Figure S2: It would be useful to plot the same figure for structured and disordered regions as well.

      We are not certain we fully understood this comment, as we believe the requested analysis has already been addressed. In Figure S2, we used the AlphaFold2 pLDDT score to represent the structural continuum of different protein regions, where residues with pLDDT > 70 (red and lightred bars) are classified as structured, while those with pLDDT ≤ 70 (blue and light-blue bars) are classified as disordered.

      Minor suggestion 1: Could the authors clarify the meaning of the abbreviation "FE" in the colorbar of the contour line? I assume this is free energy.

      We have updated all contour density plot figure captions by replacing “FE” with “−logP” to explicitly denote the logarithm of probability.

      Minor suggestion 2: In Figure 2A - do the authors mean "Conserved folded" instead of just "Folded"? If so, could the authors indicate this?

      We thank the reviewer for this comment. The ESM2 scores indeed suggest that, within folded regions, there may be multiple distinct groups exhibiting varying degrees of evolutionary conservation. However, as our primary focus is on IDRs, we chose not to investigate these distinctions further.

      Figure 2A illustrates a randomly selected folded region based on AlphaFold2 pLDDT scores.

      References

      (1) Ruff, K. M.; Pappu, R. V. AlphaFold and Implications for Intrinsically Disordered Proteins. Journal of Molecular Biology 2021, 433, 167208.

      (2) Alderson, T. R.; Pritišanac, I.; Kolaric, Ð.; Moses, A. M.; Forman-Kay, J. D. Systematic´ Identification of Conditionally Folded Intrinsically Disordered Regions by AlphaFold2. Proceedings of the National Academy of Sciences of the United States of America, 120, e2304302120.

      (3) Brandes, N.; Goldman, G.; Wang, C. H.; Ye, C. J.; Ntranos, V. Genome-Wide Prediction of Disease Variant Effects with a Deep Protein Language Model. Nature Genetics 2023, 55, 1512–1522.

      (4) Lin, Z. et al. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. 2023.

      (5) Zeng, W.; Dou, Y.; Pan, L.; Xu, L.; Peng, S. Improving Prediction Performance of General Protein Language Model by Domain-Adaptive Pretraining on DNA-binding Protein. Nature Communications 2024, 15, 7838.

      (6) Gong, J. et al. THPLM: A Sequence-Based Deep Learning Framework for Protein Stability Changes Prediction upon Point Variations Using Pretrained Protein Language Model. Bioinformatics 2023, 39, btad646.

      (7) Lin, W.; Wells, J.; Wang, Z.; Orengo, C.; Martin, A. C. R. Enhancing Missense Variant Pathogenicity Prediction with Protein Language Models Using VariPred. Scientific Reports 2024, 14, 8136.

      (8) Saadat, A.; Fellay, J. Fine-Tuning the ESM2 Protein Language Model to Understand the Functional Impact of Missense Variants. Computational and Structural Biotechnology Journal 2025, 27, 2199–2207.

      (9) Chu, S. K. S.; Narang, K.; Siegel, J. B. Protein Stability Prediction by Fine-Tuning a Protein Language Model on a Mega-Scale Dataset. PLOS Computational Biology 2024, 20, e1012248.

      (10) Landrum, M. J.; Lee, J. M.; Riley, G. R.; Jang, W.; Rubinstein, W. S.; Church, D. M.; Maglott, D. R. ClinVar: Public Archive of Relationships among Sequence Variation and Human Phenotype. Nucleic Acids Research 2014, 42, D980–D985.

      (11) Landrum, M. J. et al. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence. Nucleic Acids Research 2018, 46, D1062–D1067.

    1. eLife Assessment

      This investigation presents a valuable contribution by elucidating the genetic determinants of growth and fitness across multiple clinical strains of Mycobacterium intracellulare, an understudied non-tuberculous mycobacterium. Employing transposon sequencing (Tn-seq), the authors identify a core set of 131 genes essential for bacterial viability, offering a solid foundation for anti-mycobacterial drug discovery. However, there are minor but nonetheless significant concerns about data organization, which need to be addressed for greater scientific impact.

    2. Reviewer #1 (Public review):

      Summary:

      In this descriptive study, Tateishi et al. report a Tn-seq based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Weaknesses:

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is yet to be comprehensively investigated. However, this reviewer thinks such an investigation would require a complex experimental design and perhaps forms an independent study.